Eurasip Journal on Applied Signal Processing Distributed Coding of Highly Correlated Image Sequences with Motion-compensated Temporal Wavelets

— This paper discusses robust coding of visual content for a distributed multimedia system. We investigate how scene analysis at the decoder can improve the coding efficiency. The distributed system encodes independently two correlated video signals and reconstructs them jointly at a central decoder. The video signals are captured from a dynamic scene where each signal is temporally decorrelated by a motion-compensated Haar wavelet. The two cameras operate independently, however, the central decoder is able to exploit the coded information from all cameras to achieve the best reconstruction of the correlated video signals. The coding system utilizes nested lattice codes for the transform coefficients and exploits side information at the decoder. In addition, a disparity analysis of the first image pair of the sequences is performed at the decoder. The efficiency of the decoder is improved by disparity compensation of one video signal at the decoder. When compared to decoding without side information, decoding of the quantized transform coefficients with side information reduces bit-rate. Additional bit-rate savings can be obtained with disparity compensation at the decoder and decoding of transform coefficients with side information. Further, we address the theoretical problem of distributed coding of video signals in the presence of correlated video side information. The correlated video signals are disparity compensated and originate from cameras that monitor the same scene from different view points. We utilize a motion-compensated spatiotemporal transform to decorrelate the video signals. One camera signal will provide the disparity-compensated side information to improve the coding of the second camera signal. We derive the optimal motion-compensated spatiotemporal transform for video coding with video side information at high rates. For our video signal model, we can show that the motion-compensated Haar wavelet is an optimal transform at high rates. Given the correlation of the video side information, we also investigate the theoretical bit-rate reduction for the distributed coding scheme. Interestingly, the efficiency of multi-view side information is dependent on the level of temporal decorrelation: For a given correlation-SNR of the side information, bit-rate savings due to side information are decreasing with improved temporal decorrelation.


I. INTRODUCTION
R OBUST coding of visual content is not just a necessity for multimedia systems with heterogeneous networks and diverse user capabilities.It is also the key for video systems that utilize distributed compression.Just consider the The authors are with the Signal Processsing Institute, Swiss Federal Institute of Technology Lausanne, CH-1015 Lausanne, Switzerland (e-mail: mflierl@ieee.org).
problem of distributed coding of multi-view image sequences.In such a scenario, a dynamic scene is captured by several spatially distributed video cameras and reconstructed at a single central decoder.Ideally, each encoder associated with a camera operates independently and transmits robustly its content to the central decoder.But as each encoder has a priori no specific information about its potential contribution to the reconstruction of the dynamic scene at the central decoder, a highly flexible representation of the visual content is required.In this work, we use a motion-compensated lifted wavelet transform to generate highly scalable bitstreams that can be processed in a coordinated fashion by the central decoder.Moreover, the central decoder receives images of the scene from different view-points and is able to perform an analysis of the scene.This analysis helps the central receiver to decode more reliably the incoming robust bitstreams.That is, the decoder is able of content-aware decoding which improves the coding efficiency of the distributed multimedia system that we discuss in the following.
Our distributed system shall capture a dynamic scene with spatially distributed video cameras and reconstruct it at a single central decoder.Scene information that is acquired by more than one camera can be coded efficiently if the correlation among camera signals is exploited.In one possible compression scenario, encoders of the sensor signals are connected and compress the camera signals jointly.In an alternative compression scenario, each encoder operates independently but relies on a joint decoding unit that receives all coded camera signals.This is also known as distributed source coding.A special case of this scenario is source coding with side information.Wyner and Ziv [1] showed that for certain cases the encoder does not need the side information to which the decoder has access to achieve the rate distortion bound.Practical coding schemes for our application may utilize a combination of both scenarios and may permit a limited communication between the encoders.But both scenarios have in common that they achieve the same rate distortion bound for certain cases.
Each camera of our system [2] is associated with an encoder utilizing a motion-compensated temporal wavelet transform [3]- [5].With that we are able to exploit the temporal correlation of each image sequence.In addition, this wavelet transform provides a scalable representation that permits the desired robust coding of video signals.Inter-view correlation between the camera signals cannot be exploited as signals from neighboring cameras are not directly available at each encoder.This constraint will be handled by distributed source coding principles.Therefore, the subband coefficients of the wavelet transform are represented by syndromes that are suitable for distributed source coding.A constructive practical framework for the problem of compressing correlated distributed sources using syndromes is presented in [6]- [8].To increase the robustness of the syndrome representation, we additionally use nested lattice codes [9].Syndrome-based distributed source coding is a principle and several techniques can be employed.For example, [8] investigates memoryless and trellis-based coset construction.For binary sources, turbo codes [10] or low-density parity-check (LDPC) codes [11] increase coding efficiency.Improvements are also possible for non-binary sources [12]- [14].
A transform-based approach to distributed source coding for multimedia systems seems promising.The work in [15]- [18] discusses a framework for the distributed compression of vector sources: First, a suitable distributed Karhunen-Loeve transform is applied and, second, each component is handled by standard distributed compression techniques.That is, each encoder applies a suitable local transform to its input and encodes the resulting components separately in a Wyner-Ziv fashion, i.e., treating the compressed description of all other encoder as side information available to the decoder.Similar to that framework, Wyner-Ziv quantization and transform coding of noisy sources at high rates is also investigated in [19], [20].An application to this framework is the transform-based Wyner-Ziv codec for video frames [21].In the present article, we capture the efficiency of video coding with video side information based on a high rate approximation.For motioncompensated spatiotemporal transform coding of video with video side information, we derive the optimal transform at high rates, the conditional Karhunen-Loeve transform [22], [23].
For our video signal model, we can show that the motioncompensated Haar wavelet is an optimal transform at high rates.
The coding of multiple views of a dynamic scene is just one part of the problem.The other part addresses which view-point shall be captured by a camera.Therefore, the underlying problem of our application is sampling and coding of the plenoptic function.The plenoptic function was introduced by Adelson and Bergen [24].It corresponds to the function representing the intensity and chromaticity of the light observed from every position and direction in the 3-d space, at every time.The structure of the plenoptic function determines the correlation in the visual information retrieved from the cameras.This correlation can be estimated using geometrical information such as the position of the cameras and some bounds on the location of the objects [25], [26].
In the present work, two cameras observe the dynamic scene from different view-points.Knowing the relative camera position, we are able to compensate the disparity of the reference view-point given the current view-point.With that, we increase the correlation of the intensity values between the disparitycompensated reference view-point and the current view-point which lowers the transmission bit-rate for a given distortion.Obviously, the higher the correlation between the disparity-compensated reference view-point and the view-point to be encoded, the lower is the transmission bit-rate for a given distortion.As the relative camera positions are not known a priori at the decoder, the first image pair of the two view-points is analyzed and disparity values are estimated.Using these disparity estimates, the decoder can exploit more efficiently the robust representation of the Wyner-Ziv video encoder.
As the present article discusses distributed source coding of highly correlated image sequences, we mention related works of applied research on distributed image coding.For example, [27] enhances analog image transmission systems using digital side information, [28] discusses Wyner-Ziv coding of inter-pictures in video sequences, and [29] investigates distributed compression of light field images.In [30], an uplinkfriendly multimedia coding paradigm (PRISM) is proposed.The paradigm is based on distributed source coding principles and renders multimedia systems more robust to transmission losses.Also taking advantage of this paradigm, [31] proposes Wyner-Ziv coding of motion pictures.
The article is organized as follows: Section II outlines our distributed coding scheme for two view-points of a dynamic scene.We discuss the utilized motion-compensated temporal transform, the coset-encoding of transform coefficients with nested lattice codes, decoding with side information, and enhancing the side information by disparity compensation.Section III provides experimental rate distortion results for decoding of video signals with side information.Moreover, it discusses the relation between the level of temporal decorrelation and the efficiency of decoding with side information.Section IV studies the efficiency of video coding with video side information.Based on a model for transform coded video signals, we address the rate distortion problem with video side information and determine the conditional Karhunen-Loeve transform to obtain performance bounds.The theoretical study verifies the trade-off between the level of temporal decorrelation and the efficiency of decoding with side information.

II. DISTRIBUTED CODING SCHEME
We start with an outline of our distributed coding scheme for two view-points of a dynamic scene.We utilize an asymmetric coding scheme, that is, the first view-point signal is coded with conventional source coding principles, i.e., side information cannot improve decoding of the first view-point, and the second view-point signal is coded with distributed source coding principles, i.e., side information improves decoding of the second view-point.The first view-point signal is used as video side information to improve decoding of the second view-point signal.

A. Motion-Compensated Temporal Transform
Each encoder in Fig. 1 exploits the correlation between successive pictures by employing a motion-compensated temporal transform for groups of K pictures (GOP).We perform a dyadic decomposition with a motion-compensated Haar wavelet as depicted in Fig. 2. The temporal transform provides K output pictures that are decomposed by a spatial 8 × 8 DCT.The motion information that is required for the motion-compensated wavelet transform is estimated in each decomposition level depending on the results of the lower level.The correlation of motion information between two image sequences is not exploited yet, i.e., coded motion vectors are not part of the side information.Fig. 2 shows the Haar wavelet with motion-compensated lifting steps.The even frames of the video sequence s 2k are used to predict the odd frames s 2k+1 with the estimated motion vector d2k,2k+1 .The prediction step is followed by an update step which uses the negative motion vector as an approximation.We use a blocksize of 16 × 16 and half-pel accurate motion compensation with bi-linear interpolation in the prediction step and select the motion vectors such that they minimize a Lagrangian cost function based on the squared error in the high-band h k .Additional scaling factors in low-and high-band are necessary to normalize the transform.
Encoder 1 in Fig. 1 encodes the side information for Decoder 2 and does not employ distributed source coding principles yet.A scalar quantizer is used to represent the DCT coefficients of all temporal bands.The quantized coefficients are simply run-level encoded.On the other hand, Encoder 2 is designed for distributed source coding and uses nested lattice codes to represent the DCT coefficients of all temporal bands.

B. Nested Lattice Codes for Transform Coefficients
The 8×8 DCT coefficients of Encoder 2 are represented by a 1-dimensional nested lattice code [9].Further, we construct cosets in a memoryless fashion [8].Consider the 64 transform coefficients c i of the 8 × 8 DCT at Encoder 2. The correlation between the i-th transform coefficient c i at Encoder 2 and the i-th transform coefficient of the side information z i depends strongly on the coefficient index i.In general, the correlation between corresponding DC coefficients (i = 0) is very high, whereas the correlation between corresponding high-frequency coefficients decreases rapidly.To encounter the problem of varying correlation, we adapt the transmission rate R T X to each transform coefficient.For weakly correlated coefficients, a higher transmission rate has to be chosen.
Adapting the transmission rate to the actual correlation is accomplished with nested lattice codes [9].The idea of nested lattices is, roughly, to generate diluted versions of the original coset code.As we use uniform scalar quantization, we consider the 1-dimensional lattice.Fig. 4 depicts the fine code C 0 in the Euclidean space with minimum distance Q. C 1 , C 2 , and C 3 are nested codes with the ν-th coset C µ,ν of C µ relative to C 0 .The nested codes are coarser and the union of their cosets gives the fine code C 0 , i.e. ν C 1,ν = C 0 .
- The binary representation of the quantized transform coefficients determines its coset representation in the nested lattice.
If the transmission rate for a coefficient is R T X = µ, then the µ least significant bits of the binary representation determine the ν-th coset C µ,ν .For highly correlated coefficients, the number of required cosets and, hence, the transmission rate is small.To achieve efficient entropy coding of the binary representation of all 64 transform coefficients, we define bitplanes.Each bit-plane is run-length encoded and transmitted to Decoder 2 upon request.

C. Decoding with Side Information
At Encoder 2, the quantized transform coefficients are represented with 10 bit-planes, where 9 are used for encoding the absolute value, and one is used for the sign.Encoder 2 is able to provide the full bit-planes, independent of any side information at the Decoder 2. Encoder 2 is also able to receive a bit-plane mask to weight the current bit-plane.The masked bit-plane is run-length encoded and transmitted to Decoder 2.
Given the side information at Decoder 2, masked bit-planes are requested from Encoder 2. For that, Decoder 2 sets the bitplane mask to indicate the bits that are required from Encoder 2. Dependent on the received bit-plane mask, Encoder 2 transmits the weighted bit-plane utilizing run-length encoding.Decoder 2 attempts to decode the already received bit-planes with the given side information.In case of decoding error, Decoder 2 generates a new bit-plane mask and requests a further weighted bit-plane.
Decoder 2 has the following options for each mask bit: If a bit in the bit-plane is not needed, the mask value is 0. The mask value is 1 if the bit is required for error-free decoding.If the information at the decoder is not sufficient for this decision, the mask is set to 2 and the encoded transform coefficient that is used as side information is transmitted to Encoder 2. With this side information z i for the i-th transform coefficient c i , Encoder 2 is able to determine its best transmission rate µ = R T X [i] and coset C µ,ν .This information is incorporated into the current bit-plane and transmitted to Decoder 2: Bits that are not needed for error-free decoding are marked with 0. Further, 1 indicates that the bit is needed and its value is 0, and 2 indicates that the bit is needed with value 1.
Decoder 2 aims to estimate the i-th transform coefficient ĉi based on the current transmission rate µ = R T X [i], the partially received coset C µ,ν , and the side information With increasing number of received bit-planes, i.e. increasing transmission rate R T X [i], this estimate gets more accurate and stays definitely constant for rates beyond the critical transmission rate R * T X [i].Therefore, a simple decoding algorithm is as follows: An additional bit is required if the estimated coefficient changes its value when the transmission rate increases by 1.An unchanged value for an estimated coefficient is just a necessary condition for having achieved the critical transmission rate.This condition is not sufficient for errorfree decoding and, in this case, Encoder 2 has to determine the critical transmission rate to resolve any ambiguity.
Note that Decoder 2 receives the coded information in bitplane units, starting with the plane of least significant bits.
With each new bit-plane, Decoder 2 utilizes a coarser lattice where the number of cosets as well as the minimum Euclidean distance increases exponentially.

D. Disparity-Compensated Side Information
To improve the efficiency of Decoder 2, the side information from Decoder 1 is disparity compensated in the image domain.If the camera positions are unknown, the coding system estimates the disparity information from sample frames.During this calibration process, the side information for Decoder 2 is less correlated and Encoder 2 has to transmit at a higher bitrate.Our system utilizes block-based estimates of the disparity values which are constant for all corresponding image pairs in the stereoscopic sequence.We estimate the disparity from the first pair of images in the sequences.The right image is subdivided horizontally into 4 segments and vertically into 6 segments.For each of the 24 blocks in the right image, we estimate half-pel accurate disparity vectors.Intensity values for half-pel positions are obtained by bilinear interpolation.Assuming that the camera positions are unaltered in time, the disparity information is used in the image domain to improve the side information in the transform domain.

III. EXPERIMENTAL RESULTS
For the experiments, we select the stereoscopic MPEG-4 sequences Funfair and Tunnel in QCIF resolution.We divide each view with 224 frames at 30 fps into groups of K = 32 pictures.The GOPs of the left view are encoded with Encoder 1 at high quality by setting the quantization parameter QP = 2, where Q = 2QP .This coded version of the left view is used for disparity compensation.The compensated frames provide the side information for Decoder 2 to decode the right view.
Figs. 5 and 7 show the luminance PSNR over the total bit-rate of the distributed codec Encoder 2 for the sequences Funfair 2 and Tunnel 2, respectively.The sequences are the right views of the stereoscopic sequences.The rate distortion points are obtained by varying the quantization parameter for the nested lattice in Encoder 2. When compared to decoding without side information, decoding with coefficient side information reduces the bit-rate of Funfair 2 by up to 5% and that of Tunnel 2 by up to 8%.Decoding with disparitycompensated side information reduces the bit-rate of Funfair 2 by up to 8%.The block-based disparity compensation has limited accuracy and is not beneficial for Tunnel 2. But utilizing more accurate geometrical information about the scene will improve the side information for Decoder 2 and, hence, will further reduce the bit-rate of Encoder 2.

K=32 K=8
Fig. 6.Bit-rate difference vs. luminance PSNR at Decoder 2 for the sequence Funfair 2 (right view).The rate difference is the bit-rate for decoding with side information minus the bit-rate for decoding without side information and reflects the bit-rate savings due to decoding with side information.Smaller bitrate savings are observed for strong temporal decorrelation (K = 32) when compared to the bit-rate savings for weak temporal decorrelation (K = 8).
observed that strong temporal filtering results in lower bitrate savings due to side information when compared to the bit-rate savings due to side information for weaker temporal filtering.Obviously, there is a trade-off between the level of temporal decorrelation and the efficiency of multi-view side information.This trade-off is also found in the following theoretical investigation on the efficiency of video coding with side information.
IV. EFFICIENCY OF VIDEO CODING WITH SIDE INFORMATION In this section, we outline a signal model to study video coding with side information in more detail.We derive 0 100 200 300 400 500 600 700 800 900 1000 30

K=32 K=8
Fig. 8. Bit-rate difference vs. luminance PSNR at Decoder 2 for the sequence Tunnel 2 (right view).The rate difference is the bit-rate for decoding with side information minus the bit-rate for decoding without side information and reflects the bit-rate savings due to decoding with side information.Smaller bitrate savings are observed for strong temporal decorrelation (K = 32) when compared to the bit-rate savings for weak temporal decorrelation (K = 8).
performance bounds and compare to coding without side information.

A. Model for Transform-Coded Video Signals
We build upon a model for motion-compensated subband coding of video that is outlined in [5], [32].Let the video pictures s k = {s k [x, y], (x, y) ∈ Π} be scalar random fields over a two-dimensional orthogonal grid Π with horizontal and vertical spacing of 1.
As depicted in Fig. 9, we assume that the pictures s k are shifted versions of the model picture v and degraded by independent additive white Gaussian noise n k [5].
Fig. 9. Signal model for a group of K pictures.
is the displacement error in the k-th picture, statistically independent from the model picture v and the noise n k but correlated to other displacement errors.We assume a 2-D normal distribution with variance σ 2 ∆ and zero mean where the x-and y-components are statistically independent.
From [5], we adopt the matrix of the power spectral densities of the pictures s k and normalize it with respect to the power spectral density of the model picture v.We write it also with the identity matrix I and the matrix 11 T with all entries equal to 1.
α = α(ω) is the normalized power spectral density of the noise Φ n k n k (ω) with respect to the model picture v.
is the characteristic function of the continuous 2-D Gaussian displacement error.

B. Rate Distortion with Video Side Information
Now, we consider the video coding scheme in Fig. 1 at high rates such that the reconstructed side information approaches the original side information ŵk → w k .With that, we have a Wyner-Ziv scheme (Fig. 10) and the rate distortion function R * of Encoder 2 is bounded by the conditional rate distortion function [1].
In the following, we assume very accurate disparity compensation and consider only illumination changes.We model the side information as a noisy version of the video signal to be encoded, i.e. w k = s k + u k , and assume that the noise u k is also Gaussian with variance σ 2 u and independent of s k .In this case, the matrix of the power spectral densities of the side information pictures is simply Φ ww (ω) = Φ ss (ω) + Φ uu (ω) with the matrix of the normalized power spectral densities of the side information noise γ = γ(ω) is the normalized power spectral density of the side information noise Φ u k u k (ω) with respect to the model picture v.
With these assumptions, the rate distortion function R * of Encoder 2 is equal to the conditional rate distortion function [1].Now, it is sufficient to use the conditional Karhunen-Loeve transform to code video signals with side information and achieve the conditional rate distortion function.

C. Conditional Karhunen-Loeve Transform
In the case of motion-compensated transform coding of video with side information, the conditional Karhunen-Loeve transform is required to obtain the performance bounds.We determine the well known conditional power spectral density matrix Φ s|w (ω) of the video signal s k given the video side information w k .
With the model in Section IV-A and the assumptions in Section IV-B, we obtain for the normalized conditional spectral density matrix For our signal model, the conditional Karhunen-Loeve transform is as follows: The first eigenvector just adds all components and scales with 1/ √ K.For the remaining eigenvectors, any orthonormal basis can be used that is orthogonal to the first eigenvector.The Haar wavelet that we use for our coding scheme meets these requirements.Finally, K eigendensities are needed to determine the performance bounds:

D. Coding Gain due to Side Information
With the conditional eigendensities, we are able to determine the coding gain due to side information.We normalize the conditional eigendensities Λ * k (ω) with respect to the eigendensities Λ k (ω) that we obtain for coding without side information as The rate difference is used to measure the improved compression efficiency for each picture k in the presence of side information.
It represents the maximum bit-rate reduction (in bit/sample) possible by optimum encoding of the eigensignal with side information, compared to optimum encoding of the eigensignal without side information for Gaussian wide-sense stationary signals for the same mean square reconstruction error.The overall rate difference ∆R * is the average over all K eigensignals [32], [33].
We observe for a given correlation-SNR of the side information that larger bit-rate savings are achievable if the GOP size K is smaller.The experimental results in Figs. 6 and  n ]/σ 2 u ) of 20 dB.Again, the variance of the model picture v is normalized to σ 2 v = 1.We observe that for K = 32, halfpel accurate motion compensation (β = −1), and a c-SNR of 20 dB, the rate difference is limited to -0.3 bit/sample.Also, the bit-rate savings due to side information increase for less accurate motion compensation.That is, there is a trade-off between the gain due to accurate motion compensation and side information.Practically speaking, less accurate motion compensation reduces the coding efficiency of the encoder, and with that, its computational complexity, but improved side information may compensate for similar overall efficiency.
V. CONCLUSIONS This paper discusses robust coding of visual content for a distributed multimedia system.The distributed system com-presses two correlated video signals.The coding scheme is based on motion-compensated temporal wavelets and transform coding of temporal subbands.The scalar transform coefficients are represented by a nested lattice code.For this representation, we define bit-planes and encode these with runlength coding.As the correlation of the transform coefficients is not stationary, we decode with feed-back and adapt the coarseness of the code to the actual correlation.Also, we investigate how scene analysis at the decoder can improve the coding efficiency of the distributed system.We estimate the disparity between the two views and perform disparity compensation.With disparity-compensated side information, we observe up to 8% bit-rate savings over decoding without side information.
Finally, we investigate theoretically motion-compensated spatiotemporal transforms.We derive the optimal motioncompensated spatiotemporal transform for video coding with video side information at high rates.For our video signal model, we show that the motion-compensated Haar wavelet is an optimal transform at high rates.Given the correlation of the video side information, we also investigate the theoretical bit-rate reduction for the distributed coding scheme.We observe a trade-off in coding efficiency between the level of temporal decorrelation and the efficiency of multi-view side information.A similar trade-off is found between the level of accurate motion compensation and the efficiency of multi-view side information.

Fig. 1
depicts the distributed coding scheme for two viewpoints of a dynamic scene.The dynamic scene is represented by the image sequences s k [x, y] and w k [x, y].The coding scheme comprises of Encoder 1 and Encoder 2 that operate independently as well as of Decoder 2 that is dependent on Decoder 1.The side information for Decoder 2 can be improved by considering the spatial camera positions and performing disparity compensation.As the video signals are not stationary, Decoder 2 is decoding with feed-back.

Fig. 1 .
Fig. 1.Distributed coding scheme for two view-points of a dynamic scene with disparity compensation.

Fig. 3
Fig. 3 explains the coset-coding principle.Assume that Encoder 2 transmits at a rate R T X of 1 bit per transform coefficient and utilizes two cosets C 1,0 = {o 0 , o 2 , o 4 , o 6 } and C 1,1 = {o 1 , o 3 , o 5 , o 7 } for encoding.Now, the transform coefficient o 4 shall be encoded and the encoder sends one bit to signal coset C 1,0 .With the help of the side information coefficient z, the decoder is able to decode o 4 correctly.If Encoder 2 does not send any bit, the decoder will decode o 3 and we observe a decoding error.Consider the 64 transform coefficients c i of the 8 × 8 DCT at Encoder 2. The correlation between the i-th transform coefficient c i at Encoder 2 and the i-th transform coefficient of the side information z i depends strongly on the coefficient index i.In general, the correlation between corresponding DC coefficients (i = 0) is very high, whereas the correlation between corresponding high-frequency coefficients decreases rapidly.To encounter the problem of varying correlation, we adapt the transmission rate R T X to each transform coefficient.For weakly correlated coefficients, a higher transmission rate has to be chosen.Adapting the transmission rate to the actual correlation is accomplished with nested lattice codes[9].The idea of nested lattices is, roughly, to generate diluted versions of the original coset code.As we use uniform scalar quantization, we consider the 1-dimensional lattice.Fig.4depicts the fine code C 0 in the Euclidean space with minimum distance Q. C 1 , C 2 , and C 3 are nested codes with the ν-th coset C µ,ν of C µ relative to C 0 .The nested codes are coarser and the union of their cosets gives the fine code C 0 , i.e. ν C 1,ν = C 0 .

Fig. 4 .
Fig. 4. Nested lattices.The 1-dimensional fine code C 0 is embedded into the Euclidean space with minimum distance Q. C 1 , C 2 , and C 3 are nested codes with the ν-th coset Cµ,ν of Cµ relative to C 0 .

Fig. 6
and 8  show the bit-rate difference between decoding with side information and decoding without side information over the luminance PSNR at Decoder 2 for the sequences Funfair 2 (right view) and Tunnel 2 (right view), respectively.The bit-rate savings due to side information are depicted for weak temporal filtering with K = 8 pictures per GOP and strong temporal filtering with K = 32 pictures per GOP.Note that both the coded signal (right view) and the side information (left view) are encoded with the same GOP length K.It is disp.comp.side info.Dec. 2 with side info.Dec. 2 w/o side info.

Fig. 5 .
Fig.5.Luminance PSNR vs. total bit-rate at Decoder 2 for the sequence Funfair 2 (right view).Compared is decoding with disparity-compensated side information, decoding with coefficient side information only, and decoding without side information.For all cases, groups of K = 32 pictures are used.
disp.comp.side info.Dec. 2 with side info.Dec. 2 w/o side info.

Fig. 7 .
Fig.7.Luminance PSNR vs. total bit-rate at Decoder 2 for the sequence Tunnel 2 (right view).Compared is decoding with disparity-compensated side information, decoding with coefficient side information only, and decoding without side information.For all cases, groups of K = 32 pictures are used.

Fig. 10 .
Fig. 10.Coding of K pictures s k at rate R * with side information of K pictures w k at the decoder.

Fig. 11 .Fig. 12 .
Fig.11.Rate difference to motion-compensated transform coding without side information vs. correlation-SNR for groups of K pictures.The displacement inaccuracy β is -1 (half-pel accuracy) and the residual noise is -30 dB.
Fig. 3. Coset-coding of transform coefficients where Encoder 2 transmits at a rate R T X of 1 bit per transform coefficient.