Distributed transform coding via source-splitting

Transform coding (TC) is one of the best known practical methods for quantizing high-dimensional vectors. In this article, a practical approach to distributed TC of jointly Gaussian vectors is presented. This approach, referred to as source-split distributed transform coding (SP-DTC), can be used to easily implement two terminal transform codes for any given rate-pair. The main idea is to apply source-splitting using orthogonal-transforms, so that only Wyner-Ziv (WZ) quantizers are required for compression of transform coefficients. This approach however requires optimizing the bit allocation among dependent sets of WZ quantizers. In order to solve this problem, a low-complexity tree-search algorithm based on analytical models for transform coefficient quantization is developed. A rate-distortion (RD) analysis of SP-DTCs for jointly Gaussian sources is presented, which indicates that these codes can significantly outperform the practical alternative of independent TC of each source, whenever there is a strong correlation between the sources. For practical implementation of SP-DTCs, the idea of using conditional entropy constrained (CEC) quantizers followed by Slepian-Wolf coding is explored. Experimental results obtained with SP-DTC designs based on both CEC scalar quantizers and CEC trellis-coded quantizers demonstrate that actual implementations of SP-DTCs can achieve RD performance close to the analytically predicted limits.


Introduction
Many new applications such as multi-camera imaging systems rely on networks of distributed wireless sensors to acquire signals in the form of high-dimensional vectors [1].In such situations, an encoder in each sensor quantizes a vector of observation variables (without exchanging any information with other sensors) and transmits its output to a central processor which jointly decodes all the sources.The strong statistical dependencies among the signals observed by different sensors can be exploited in the decoder to reduce the transmission bit-rate of each sensor.This problem, in general, is referred to as distributed (or multiterminal) vector quantization (VQ).The design of a distributed VQ for a large number of source variables is a difficult task.A practically simpler, yet very effective approach to quantizing a large number of correlated variables by using a bank of single variable quantizers is transform coding (TC) [2][3][4].Clearly, TC can be used for distributed VQ when separately observed vectors have both inter-vector and intra-vector statistical dependencies, a situation typical in applications such as camera networks.Most of the previous work [5][6][7] studies Wyner-Ziv (WZ) transform coding (WZ-TC), which is a special case of more general multiterminal transform coding (MT-TC) [8].In WZ-TC, a single source is quantized given that the decoder has access to side information about the source.
Information-theoretic studies of distributed transform coding (DTC) can be found in [5,8].In [8], the optimal linear transform for Gaussian WZ-TC under the mean square-error (MSE) criterion is shown to be the conditional Karhunen-Loéve transform (CKLT), which is a natural extension of the result in [2].This result is based on the assumption that each transform coefficient is compressed by a rate-distortion (RD) optimal WZ quantizer and hence describes the optimal performance theoretically attainable (OPTA) in Gaussian WZ-TC.However, the optimal solution to the more general MT-TC problem remains unsolved, even for the Gaussian case.In [8], an iterative descent algorithm for determining the OPTA of Gaussian MT-TC problem is given.It is shown that, while this algorithm [referred to as the distributed KLT (DKLT)] always converges to a solution, the final solution is not necessarily the global optimum.In any case, the practical implementation of distributed quantizers implied by the DKLT remains an open problem.In [5] WZ-TC based on high-rate scalar quantization and ideal Slepian-Wolf (SW) coding [9] is studied.In particular, it is shown that, for jointly Gaussian vectors, CKLT followed by uniform scalar quantization is asymptotically optimal, a natural extension of the result in [10] for entropy-coded quantization at high-rate.More importantly, the bit-allocations and quantizer-step sizes found in [5] can be used for practical design of WZ-transform codes as long as high-rate approximations hold.However, we note that, even when scalar quantizers are used, achieving good performance with this approach still requires the use of a subsequent block-based SW coding method (e.g., Turbo codes or LDPC codes).Other previous studies on WZ-TC can be found in [6,7].However, they rely on WZ scalar quantization of transform coefficients.Such methods are therefore most suitable for applications requiring low coding delay as their performance is strictly inferior to block-based quantization.
In contrast to WZ-TC, we consider in this article the practical design of two-terminal transform codes for jointly Gaussian vectors in which arbitrary transmission rates can be assigned to each terminal.Our approach is based on the idea of source-splitting [11,12] to convert the two-terminal TC problem into two WZ-TC problems.Since transform codes quantize linear projections, we perform source splitting in terms of optimal linear approximations, i.e., a linear approximation of one source is provided as decoder side-information for the other source.The proposed source-split DTC (SP-DTC) approach only requires the design of two-sets of WZ quantizers sequentially, and avoids having to iteratively optimize two sets of WZ quantizers to each other as in [8].However, this approach requires the solution of a bit allocation problem involving dependent WZ quantizers.To solve this problem for Gaussian sources, we propose an efficient tree-search algorithm, which can be used to find the a good SP-DTC under different models for quantization of transform coefficients.When used with the RD-optimal WZ quantization model [8], this algorithm can potentially locate the optimal SP-DTC for Gaussian sources.In practice, with constraints imposed on tree-search complexity, the algorithm yields a near-optimal solution.We refer to the optimal solution to the Gaussian problem as the source-split DKLT (SP-DKLT).Using this algorithm, we numerically compute the rate-region achievable with a SP-DKLT code for two examples of jointly Gaussian vector sources.This study shows that, when there is sufficient intersource correlation, optimal SP-DKLT codes can achieve substantially better performance than independent transform codes for the two sources.However, we find that the rates achievable with SP-DKLT codes are strictly inside the optimal achievable rate-region predicted by the DKLT algorithm of [8].In order to approach the performance predicted by the optimal SP-DKLT in practice, block WZ quantization of transform coefficients is required.For implementation of block WZ quantizers, we consider the use of trellis-coded quantization (TCQ) followed by SW coding.This two stage approach is known to achieve the RD function of Gaussian WZ coding [13].In order practically implement this approach, we introduce the idea of designing conditional entropy constrained TCQ (CEC-TCQ) based on analytically found bit-allocations.We present experimental results to demonstrate that practical implementations of SP-DTCs for Gaussian sources can closely approach the performance limits indicated by the optimal SP-DKLT.On the other hand when SW coded high-rate scalar quantization model [5] is assumed for encoding transform coefficients, the tree-search algorithm proposed in this article can also be used to find asymptotically good SP-DTC codes for scalar quantization based implementations.These codes can be readily implemented using CEC scalar quantizers (CEC-SQ) as demonstrated by experimental studies presented in this article.In our experimental study, we also investigate the design of good SP-DTCs based on widely used discrete cosine transform (DCT).
This article is organized as follows.Section 2 presents a review of WZ-TC of Gaussian vectors and motivates the particular approach introduced in this article.Section 3 presents the idea of SP-DTC and develops the tree-search algorithm for finding the optimal transforms and the bit-allocation for SP-DKLT codes.Section 4 computes the achievable rate region of SP-DKLT codes for two example Gaussian source models, and presents experimental results obtained by designing SP-DTCs based on both KLT and DCT.Finally, some concluding remarks are given in Section 5.
Notation: As usual, bold letters denote vectors and matrices, upper case denotes random variables, and lower case denotes realizations.Σ X denotes the auto-covariance matrix of the vector X.Σ XY and Σ X|Y , respectively denote the joint covariance matrix of (X, Y) and the conditional covariance matrix of X given Y.The eigenvalues l 1 , . .., l M of a M × M covariance matrix are always indexed such that l 1 ≥ l 2 . . .≥ l M , and the corresponding KLT matrix has the structure T = (u T 1 , . . ., u T M ), where u m is the eigenvector associated l m .

WZ-TC of Gaussian vectors
Consider encoding of a Gaussian vector X ∈ R M 1 using B bits per vector, given that the decoder has access to a jointly Gaussian vector Y ∈ R M 2 .Assume that both vectors have zero mean, and let the auto-covariance matrix of X be Σ X = E{XX T }.In WZ-TC, a linear transform is first applied to X and each component of the transform coefficient vector U = T T X is separately compressed by a WZ quantizer, considering Y as decoder side-information, where T is a M 1 × M 1 unitary matrix.Let Û be the quantized value of U. The decoder then estimates the source vector based on Û and Y.We wish to find the optimal transform and the allocation of B bits among M 1 transform coefficients, which minimize the quantization MSE E X − X

2
, where X = E{X| Û, Y} is the optimal estimate (at the decoder) of the source vector.The solution of this problem requires an analytical model for coefficient quantization.To this end, [8] considers RD optimal WZ quantization (RD-WZQ) model, the solution based on which is appropriate for practical block quantization techniques such as TCQ.On the other hand [5] considers SW-coded highrate scalar quantization (SWCHRSQ) model.
Let the eigenvalues of the conditional covariance matrix [8].It is easy to verify that E {θθ T |Y} = Λ, i.e., the components of θ are conditionally uncorrelated, given Y.For convenience, define where N ≤ M 1 is a positive integer.

RD-WZQ model
When RD-WZQ model is used, the quantization MSE for U m is given by [14] where l m = var(U m |Y) and var(•|•) denotes the conditional variance.The optimal solution to the WZ-TC problem under RD-WZQ model is given by the following theorem.
Theorem 1 Given jointly Gaussian X and Y as defined above, and a total bit budget of B bits, if each transform coefficient U m , m = 1, . .., M 1 , where U = T T X, is quantized by an RD optimal WZ quantizer which uses Y as decoder side-information, then the transform T which minimizes E X − X 2 is the CKLT of X given Y, and the number of bits allocated to quantizing U m is where N ≤ M is the largest integer for which l m ≥ d * (l, B, N), m = 1, . .., N. The quantization MSE of the mth coefficient is and the overall MSE is Proof 1 Directly follows from [ [8], Section III-B].
Note that RD-WZQ model implies infinite-dimensional VQ of each coefficient and hence the above MSE is the OPTA in the Gaussian WZ-TC problem.

SWC-HRSQ model
The asymptotically (in rate) optimal solution to the WZ-TC problem under SWC-HRSQ model is given by the following theorem.
Theorem 2 Let X, Y, and B be as in Theorem 1.If each transform coefficient U m , m = 1, . .., M 1 , where U = T T X, is quantized by a high-rate scalar quantizer and the quantizer output is encoded by a SW code which uses Y as decoder side-information, then the transform T which asymptotically minimizes E X − X 2 is the CKLT of X given Y and the bit allocation is given by (2).Furthermore, the asymptotically optimal quantizer for U m , m = 1, . .., N is a uniform quantizer with step-size = (2π e)d * (λ, B, N) .The resulting quantization MSE is Proof 2 See Section "Proof of Theorem 2" in Appendix 1.

Sufficiency of scalar side-information
The WZ quantizers with vector-valued decoder sideinformation as considered in Theorem 1 are difficult to design in practice.However, the following theorem establishes that when CKLT is used and RD-WZQ model applies for quantization of coefficients, a linear transformation of the side-information vector can be used to convert the vector side-information problem into an equivalent scalar side-information problem.Furthermore, [ [5], Section 6] shows that this result applies in asymptotic sense to the SWC-HRSQ model as well.
Theorem 3 Let the mean-zero vectors X ∈ R M 1 and Y ∈ R M 2 be jointly Gaussian, and let T be the CKLT of X given Y. Suppose that transform coefficients U m , m = 1, . .., M 1 , where U = T T X, are each compressed by an RD optimal WZ quantizer relative to decoder side-information Y.Then, the minimum MSE (MMSE) estimate ũ m (y) = E{U m |y} of U m given Y = y is a sufficient statistic for decoder side-information for quantizing U m .
Proof 3 See Section "Proof of Theorem 3" in Appendix 1.
Wyner-Ziv transform coding is a special case of more general MT-TC where two or more terminals apply TC to their respective inputs and transmit the quantized outputs to a single decoder which exploits the intersource correlation to jointly reconstruct all the sources.In this case, the problem is to optimally allocate a given bit budget among all the terminals such that the total MSE is minimized.However, the closed-from solution to this problem appears difficult, due to the inter-dependence of the encoders in different terminals.An iterative descent algorithm is given in [8] for solving the Gaussian MT-TC problem.Given a total bit-budget, the bitrate of the system is incremented by a small amount in each iteration, and the optimal WZ-TC for each terminal is determined by fixing the encoders of all other terminals and considering their outputs as decoder-side information.The solution that gives the MMSE is accepted and the iterations are repeated until the total bit-budget is exhausted.While this algorithm, referred to as the DKLT algorithm, is guaranteed to converge to at least a locally optimal solution, there is no tractable way to implement the quantizers implied by the final solution since it is not practical to optimize a set of near-optimal WZ quantizers in each iteration of this algorithm.Note also that, DKLT requires joint decoding of two vector sources.

Source-splitting based distributed TC
In general, designing a multi-terminal VQ is more difficult than designing a WZ-VQ, due to the mutual dependence among the encoders.However, one could use WZ-VQs to realize a multi-terminal VQ by using source-splitting [12].It is known that in the quadratic-Gaussian case, source-splitting can be used to realize any rate-pair in the achievable rate-region by only using ideal WZ-VQs which correspond to the corner-points of the achievable rate-region [ [12], Section V-C], [15].While the same optimality properties cannot be claimed for source-splitting by linear transforms, the aforementioned observation still provides us the motivation to take a similar approach in practically realizing the DTCs which can operate at arbitrary rates, by using only WZ quantizers.
A block diagram of the SP-DTC system is shown in Figure 1.Let the total number of bits available for encoding two jointly Gaussian vectors X 1 ∈ R M 1 and X 2 ∈ R M 2 be B bits.The terminal 1 performs source splitting by providing an approximation Y 1 ∈ R N 1 of X 1 at the rate B 1 (< B) bits/vector as decoder sideinformation for WZ coding of terminal 2, where N 1 ≤ M 1 .In a TC framework, the goal is to provide the best (in MMSE sense) linear approximation of X 1 as the decoder side-information.Therefore, Y 1 is the B 1 -bit approximation of a linear projection Then, given the quantized linear projections of both X 1 and X 2 available at the decoder, the terminal 1 quantizes the a linear projection where N 1 ≤ M 1 .In the receiver, each source vector is reconstructed by a WZ decoder.The MMSE optimal reconstructions for X 1 and X 2 are, respectively given by , where The total transmission rate for source X 1 is thus B 1 = B 1 + B 1 bits/vector.The rates used by terminals 1 and 2 in bits/sample are given by T .Also, let the bit-rates allocated to quantizing these transform coefficients be , and T respectively, and define Given a total of B bits for encoding both X 1 and X 2 , the design of a SP-DTC involves determining the values of the transforms T 1 , T 1 , and T 2 , and a bit allocation among the transform coefficients is minimized.By writing the quantization MSEs of U 1,i and U 2,j as d 1,i r 1,i , r 1 , r 2 , and d 2,j r 2,j , r 1 , respectively, i = 1, . .., M 1 , j = 1, . .., M 2 , the total MSE can be expressed as The bit allocation problem can now be stated as follows: Given a total bit-budget of B bits min r D(r) (8) subject to where r = r 1 , . . ., r 2M 1 +M 2 T .The explicit solution of this minimization problem is unfortunately intractable due to the inter-dependence of the three transform codes involved.However, an explicit solution can be found for a variant of this problem obtained by fixing B 1 , B 1 and B 2 , so that the number of bits allocated to each transform code is fixed and it is only required to optimize the bit allocation among the quantizers within each transform code.For simplicity, we refer to this problem as the constrained bit-allocation problem.In the following, an explicit solution to this problem is derived.Based on the result, we then present a tree-search algorithm to solve the unconstrained problem (8).Under both RD-WZQ and SWC-HRSQ models, the optimal transforms for Gaussian sources are CKLTs.Therefore, we refer to the solution to problem (8) as the SP-DKLT.

Solution to the constrained bit-allocation problem 3.1.1 RD optimal quantization
Let B 1 , B 1 , and B 2 be fixed in Figure 1 and let the coefficient quantization be represented by the RD-WZQ model (for components of U 1 , this reduces to the non-distributed RD-optimal quantization).From Theorem 1, it follows that the MMSE optimal transform (in the sense of providing the best linear approximation as decoder side information for terminal 2) T 1 is the KLT of X 1 .Let the eigenvalues of X 1 be λ 1 = λ 1,1 , . . . ,λ 1,M 1 .Then, using (2), the optimal bit allocation for U 1,m can be given by for some Figure 1 The proposed source-split transform coding (SP-DTC) system for two-terminal distributed quantization of two correlated vectors.
RD theoretic sense, the quantized value (up to a scaling factor) of the mean-zero Gaussian variable U 1,m can be given by see [ [16], Section 10.3.2].Therefore, we can write where . According to (11), Y 1 and X 1 are jointly Gaussian, and it follows that Furthermore, X 2 and Y 1 are jointly Gaussian with the conditional covariance matrix Next consider TC X 2 given Y 1 as decoder side-information.From Theorem 1, it follows that the E X 2 − X2 2 is minimized by choosing T 2 as the CKLT of X 2 given Y 1 and by applying RD optimal WZ quantization to each element of U 2 = T T 2 X 2 given decoder side information Y 1 based on a bit-allocation specified by the eigenvalues of The optimal bit allocation for U 2,m is given by for some N 2 ≤ M 2 .The resulting MSE is given by As before, the quantized value of U 2 up to a scaling factor, can be represented by [ [8], Theorem 3], where 2 denotes the M 2 × N 2 matrix consisting of first N 2 columns of K 2 and the quantization noise Z 2 = diag Z 2,1 , . . . ,Z 2,N 2 is a mean zero iid Gaussian vector independent of X 2 .The covariance matrix of Z 2 is given by Z 2 = diag Z 2,1 , . . . ,Z 2,N 2 , where EZ 2 2,m , m = 1, . .., N 2 are given by EZ 2  2,m = The covariance matrix of Y 2 is Therefore X 1 and V = Y 1 Y 2 T are jointly Gaussian with the cross-covariance matrix where Finally, consider quantizing X 1 , given as the decoder side-information.
As before, E X 1 − X1 2 is minimized by choosing T 1 as the CKLT of X 2 given V, and RD optimal WZ quantization of each element of U 1 = T T 1 X 1 given V, based on a bit-allocation specified by the eigenvalues of T .The bit rate allocated to quantizing U 1,m given by for some N 1 ≤ M1 and the resulting MSE is Given a rate tuple B 1 , B 1 , B 2 , the MMSE achievable with a SP-DKLT code for two jointly Gaussian vectors is given by

High-resolution scalar quantization and SW coding
Due to Theorem 2, the expressions for bit-allocations given by ( 9), ( 14) and ( 20) applies to SWC-HRSQ model as well.However, the resulting MSE and hence the quantization noise variances are different to those of the RD-WZQ model.More specifically, since the decoder side-information in a SP-DTC depends on quantization noise of the other terminals, the optimal transforms T 2 and T 1 and the associated bit allocations obtained with the SWC-HRSQ model are different to those obtained with the RD-WZQ model.In order to make the problem tractable, we assume that the quantization noise of SWC-HRSQ model also follows (11) and ( 16) (for a discussion on the validity of this assumption, see [17]).This assumption essentially allows us to compute the conditional covariance matrices X 2 |Y 1 and X 1 |V as in the previous case, and then apply ( 9), ( 14) and ( 20) to find the optimal transforms and the bit-allocations.However, due to (27), the quantization noise variance in (10) in this case is given by EZ 2  1,m = A similar expression exists for the quantization noise variance in (17).

A tree-search solution to the unconstrained bitallocation problem
We note that the optimal solution to the unconstrained problem defined in (8) corresponds to the MMSE solution of the constrained problem over the set of ratetuples S = B 1 , B 1 , B2 : B 1 ∈ (0, B) , B 1 ∈ (0, B) , B2 ∈ (0, B) , B 1 + B 1 + B2 ≤ B .This set is shown in Figure 2. One approach to locating the MMSE solution is to search over an appropriately discretized grid of points inside S .As we will see, even though an exhaustive search on a fine grid can be prohibitively complex, a much simpler constrained treesearch algorithm exists which can be used to locate the required solution with a very high probability.
The proposed algorithm is a generalization of a class of bit-allocation algorithms in which a small fraction ΔB of the total bit-budget B is allocated to the "most deserving" quantizer among a set of quantizers in an incremental fashion, until the entire bit-budget is exhausted [ [4], Section 8.4].Unfortunately, this type of a greedy search cannot guarantee that the final solution is overall optimal and can yield poor results in our problem where the bit allocation among three sets of dependent quantizers must be achieved.On the other hand, if the increment ΔB is chosen small enough, a near-optimal solution can be found by resorting to a tree-search.

S
The solution space S for the unconstrained bit-allocation problem.The tree-search algorithm uses a search-grid of regularly spaced points (i.e., a cubic lattice) with a separation of ΔB inside S .
Even though a full tree-search is intractable, a simple algorithm referred to as the (M, L)-algorithm [18] exists for detecting the minimum cost path in the tree with a high probability.We use this insight to formulate a treesearch algorithm for solving the unconstrained bit allocation problem, in which a set of constrained bit allocation problems are solved in each iteration.
In order to describe the proposed tree-search algorithm in detail, let ΔB be the incremental amount of bits to be allocated in each step of the search, where 0 <ΔB ≪ B. The algorithm is initialized by setting B 1 , B 1 , B 2 = (0, 0, 0) , i.e., the origin in Figure 2. Now if we are to allocate ΔB bits to only one of the three transform codes T 1 , T 1 or T 2 , then there are three possible choices for the rate-tuple B 1 , B 1 , B 2 , namely (ΔB, 0, 0), (0, ΔB, 0), and (0, 0, ΔB).For each of these choices, we can explicitly solve the constrained bit allocation problem as described in the previous section and find the MMSE solution.Each of these candidate solutions can be viewed as a node in a tree as shown in Figure 3.The root node of the tree corresponds to a SP-DTC of rate 0, and a node in the first level of nodes obtained in the first iteration of the algorithm corresponds to a SP-DTC of rate of ΔB bits per source pair (X 1 , X 2 ).In the second iteration of the algorithm, we allocate ΔB more bits to each of the three candidate SP-DTCs (but to one of the SP-DTCs at a time) in the first level of nodes.Note that, for each SP-DTC we can allocate ΔB bits in three different ways, i.e., ΔB bits can be added to either B 1 , B 1 , or B 2 .This requires the solution of three constrained bit allocations problems for each of the 3 nodes in the first-level.As a result, the tree will be extended to a second level of 3 2 nodes, in which each node corresponds to a SP-DTC of 2ΔB bits per sourcepair, as shown in Figure 3.We can repeat this procedure, allocating ΔB bits to each of terminal node of the tree in a given iteration, until all B bits are exhausted.After the final iteration, the tree would consist of L = ⌈B/ΔB⌉ levels with 3 L nodes in the last level (terminal nodes).Each terminal node corresponds to a candidate SP-DTC of rate B, and rate-tuples of these SP-DTCs lie on the plane B 1 + B 1 + B 2 = B in Figure 2. The MMSE terminal node of the tree is the optimal solution to the unconstrained bit allocation problem, provided that the latter solution is on the search-grid.If ΔB is chosen small enough, then we can ensure that the optimal solution is nearly on the search-grid.Suppose that, in each iteration, the algorithm saves the MSE of the solution to the constrained bit-allocation problem associated with each node.In theory, the optimal solution can be found by an exhaustive tree-search, using the MSE of a node [given by (22)] as the path-cost.In order to practically implement the tree-search, we use the (M, L)-algorithm, in which the parameter M can be chosen to reduce the complexity at the expense of decreased accuracy (i.e., the probability of detecting the lowest cost path in the tree).In the (M, L) algorithm [ [18], p. 216] for a tree of depth L, one only retains the M best (lowest MSE) nodes in each iteration.When M = 1 we have a completely greedy search.On the other hand when M n = 3 n in (0, 0, 0) (ΔB, 0, 0) (2ΔB, 0, 0) (ΔB, ΔB, 0) (ΔB, 0, ΔB) (0, ΔB, 0) (0, 0, ΔB) (ΔB, 0, ΔB) (0, ΔB, ΔB) (0, 0, 2ΔB) (ΔB, ΔB, 0) (0, ΔB, ΔB) (0, 2ΔB, 0) Figure 3 Bit allocation tree with the values of the rate-tuple B 1 , B 1 , B 2 after two iterations of the tree-search algorithm.
the nth iteration, we have a full-tree search which has a complexity that grows exponentially with the iteration number.When M (1 ≤ M ≤ 3 L ) is a prescribed constant, the complexity is linear in M, independent of the number of iterations.In obtaining the simulation results presented in Section 4, M = 27 and L = 135 (ΔB = 0.2) were found to be sufficient to obtain near optimal results.For example, it was observed that even for M = 81 and L = 405, nearly the same result was obtained.

Numerical results and discussion
Source model A: Let the components of X 1 be M 1 consecutive samples of a first-order Gauss-Markov process with a unit-variance and the correlation coefficient |r| < 1, i.e., X 1,m = rX 1,(m-1) + Z m , m = 2, . .., M 1 , where Z m , m = 1, . .., M 1 are mean-zero iid Gaussian variables such that EZ 2 m = 1 − ρ 2 .The auto-covariance matrix X 1 is a Teoplitz matrix with the first row 1, ρ, ρ 2 , . . . ,ρ M 1 −1 .Now define the components of X 2 to be noisy observations of the components of X 1 , i.e., X 2,m = gX 1,m + W m , where |g| < 1 and W m is a meanzero, iid Gaussian variable with Furthermore, the cross-covariance matrix . Note that X 1 and X 2 are not statistically similar and the components of X 1 are more correlated than those of X 2 .
Source model B: Consider a spatial Gaussian random field in which the correlation function decays with distance d according to the squared exponential model [19].We define the random vectors X 1 and X 2 to be observations picked-up by a pair of sensor arrays placed in this random filed.In this case, the auto-covariance matrix of X 1 is given by where a is a constant and d ij is the distance between X 1, i and X 1,j .The auto-covariance matrix of X 2 also has a similar form.For simplicity assume that the sensors in each array are placed on a M × M square grid of unit spacing (i.e., M 1 = M 2 = M 2 ), the two arrays are on parallel planes separated by a distance r, and the two grids are aligned so that the distance between X 1i and X 2i is r for all i.With this setup, the distance between X 1i and , and the cross-covariance matrix is given by , where θ = exp {-(ar) 2 }.This sensor structure ensures that X 1 and X 2 are statistically similar.However, can be chosen independently (by choosing array separation r) of X 1 and X 2 .

RD performance
We compute the rate-pairs (R 1 , R 2 ) achievable with a SP-DKLT code for a given a total MSE D, by fixing R 1 (or R 2 ) and then searching for minimum R 2 (or R 1 ) required to achieve the MSE D [given by (6)].The rate-pairs achievable for D = 0.01 with SP-DKLT coding of vectors from source model A with r = 0.9, g = 0.9, are plotted in Figure 4.These values of r and g result in a source cross-covariance matrix with the largest element 0.9.In Figure 4, the curve "SP-DKLT (X 1 split)" corresponds to a system in which the input to terminal 1 (which applies source splitting) is X 1 as shown in Figure 1.The curve "SP-DKLT (X 2 split)" corresponds to a system in which the input to the terminal 1 is X 2 .Note that the two curves are not symmetric in rates and they coincide if R 1 and R 2 are inter-changed in one of the curves.This is because X 1 and X 2 have different auto-covariance matrices, and hence inter-changing the rates is equivalent to interchanging the terminals.Importantly, this result indicates that when the two sources are not statistically identical, which source is chosen for splitting does not affect the SP-DKLT performance.Table 1 lists the best bit allocations found by the tree-search algorithm for SP-DKLT codes shown in Figure 4.Note that, for the same (B 1 , B 2 ), the rate-split between B 1 and B 2 when X 1 is applied to the terminal 1 is not identical to that when X 2 is applied to the terminal 1.
Figure 4 also shows the rate region achievable if each source is independently compressed using the KLT (i.e., only intra-vector correlation is utilized), labeled IKLT, and the OPTA lower bound for distributed TC predicted by the iterative DKLT algorithm [8].The performance of both distributed and non-distributed TC of source model A degrades as r decreases, since both auto-and cross-covariance matrices of X 1 and X 2 are functions of r.Next consider the source model B for which the lowest achievable rate-pairs corresponding to D = 0.005 are plotted in Figure 5.The source parameter a = 0.32 results in auto-covariance matrices whose largest off-diagonal element is 0.9.Also recall that θ is the largest element in the source cross-covariance matrix.Note that changing θ only affects the cross-covariance matrix, and hence has no effect on the best achievable rates for independent coding of the two sources.On the other hand, as θ increases, the rates achievable with distributed coding do improve.Since in source model B, the two sources are statistically similar, the curves in Figure 5 are symmetric in rates and the optimal bit allocation does not depend on which source is chosen for splitting.
The RD performance in Figures 4 and 5 indicate that SP-DKLT codes can significantly outperform IKLT codes at all rates when there is sufficient correlation between the two distributed sources.The performance of SP-DKLT coding necessarily approaches OPTA (DKLT) bound when either R 1 or R 2 is sufficiently high.That is, the terminal with the higher bit rate can independently transform code its input with negligible distortion, and the other terminal can then apply WZ-TC at the minimum rate achievable with "almost unquantized" decoder side-information.However, for both source-models the rate-region achievable with sourcesplitting is strictly inside that of DKLT.In other words, there are some rate-pairs inside the DKLT rate-region for a given MSE D, which cannot to be achieved by a SP-DKLT code.A closely related issue is that, for a range of values of (R 1 , R 2 ), the sumrate R 1 + R 2 of SP-DKLT codes remains constant and reaches its minimum.For example, it can be seen from Figure 4 and Table 1 that the sum-rate is about 4.125 bits when the rate of X 1 is in the range 1.375 -2.5 bits/sample.From Table 1, it can be seen that when the sum-rate is greater than its minimum value, the optimal SP-DKLT code approaches a WZ transform code, i.e., no source-splitting occurs.This situation, which also exists in Figure 5, suggests that optimal SP-DKLT codes for the sum-rate at which a given D can be achieved, are equivalent to time-sharing [12] of two "corner points".Figure 6 illustrates this situation for optimal SP-DKLT codes at D = 0.005 for source model B (θ = 0.9 in Figure 5).It should however be noted that, unlike source-splitting, timesharing between the two terminals requires synchronization of their encoders [11].

Design examples
In this section, we focus on the practical design of SP-DKLT codes for a given pair of rates (R 1 , R 2 ) based on both scalar and block-quantization.RD-WZQ model used in Section 3.1.1implies infinite block-length WZ-VQ of each coefficient.A practically realizable approach to block WZ quantization is SWC-TCQ [20].Experimental results obtained with LDPC codes of block length up to 10 6 bits and TCQs up to 8,192 states are presented in [20] for quadratic Gaussian WZ Figure 4 Comparison of rate-regions achievable with different TC approaches in quantizing 16-dimensional vectors (M 1 = M 2 = 16) of source model A (r = 0.9, g = 0.9)."SP-DKLT (X 2 split)" refers to the case when X 2 is used as the input to terminal 1 (and hence transmitted at rate R 1 ) which applies source-splitting.
quantization, which indicate that performance very close to the theoretical limit can be achieved with SWC-TCQ.Motivated by these results, we aim to implement SP-DTCs which can approach theoretical performance predicted in Section 3.1.1using TCQ and SW coding for encoding transform coefficients.However, the SWC-TCQ design procedure followed in [20] is to first design a TCQ whose MSE satisfies a constraint (by choosing a sufficiently high rate) and then to estimate the output conditional entropy (which is the target rate of the SW code) of the resulting TCQ.This is sufficient for verifying the achievable rate pairs for a given MSE which is the goal of [20].Our problem is different in that the rate of the SW code is specified by the solution to the bit-allocation problem and our goal is to design a TCQ which minimizes the MSE, subject to a constraint on the output conditional entropy.This requires an alternative formulation of the design procedure, which we refer to as CEC-TCQ.In previous work on non-distributed quantization, entropy constrained TCQ (EC-TCQ) has been investigated in [21][22][23][24].CEC-TCQ is a modification of EC-TCQ in [21,22] to accommodate block SW-coding of the TCQ output relative to a decoder side-information sequence.Our formulation of CEC-TCQ follows the supersetentropy formulation of EC-TCQ in [22].
Suppose that a sequence of source samples {U n ℝ} has to be quantized, given that the sequence {Y n ℝ}, is available at the decoder as side-information, where n = 1, 2, . . .denotes the discrete-time.Similar to an ordinary TCQ [25], a CEC-TCQ uses a size 2 R TCQ +1 scalar codebook to quantize the input sequence U 1 , U 2 , . .., into a R TCQ bits/sample output sequence Û 1 , Û 2 , . . . .However, CEC-TCQ output satisfies the additional property that the conditional entropy H (Û n |Y n ) = E{log 2 P (Û n |Y n )} ≤ R for some given R. It follows from [9] that if the CEC-TCQ output is SW-coded relative to the decoder side-information sequence {Y n } then LR bits are sufficient to (almost) losslessly transmit a sequence of L source samples as L ∞.The optimal CEC-TCQ minimizes the MSE E{(U n -Û n ) 2 }, subject to the constraint E{-log 2 P (Û n |Y n )} ≤ R, or equivalently, minimizes the Lagrangian where b > 0 is the Lagrange multiplier.This implies that, given a specific sequence of input samples u 1 , u 2 , . .., the CEC-TCQ encoder should use the Viterbi algorithm based on the path-cost function The optimal SP-DKLT codes at D = 0.005 for source model B with a = 0.32 and θ = 0.9.The minimum sum-rate is 2.5 bits/ sample, which can also be achieved by time sharing of codes C 1 and C 2 .
Table 1 Bit allocations found by the tree-search algorithm for 16-dimensional SP-DKLT coding of source model A (r = 0.9, g = 0.9) In a rate R TCQ TCQ, each codeword c k , k = 1, . . ., 2 R TCQ +1 , in the codebook is labeled with R TCQ -bit binary string [25].Let b i , i = 1, . . ., 2 R TCQ be these binary labels.Then, to compute (25), the cost bE{log 2 P (b i |Y)} must also be stored for each binary label, where Y is the random variable representing the decoder side-information.For a fixed value of b, we can use a slight modification of the algorithm in [21] for optimizing the CEC-TCQ codebook, by replacing the codeword entropies E{-log 2 P(b i )} by the conditional entropies E{log 2 P (b i |Y)}, and by using a training sequence of (U n , Y n ) pairs.In order to approximate the expectations by sample averages computed from training data, the sideinformation variable Y is discretized to Ŷ ∈ {η 1 , . . .η Y } , where Y is a large enough positive integer.Then E{- log 2 P (B = b i |Y)} can be approximated by where B ∈ b 1 , . . . ,b 2 R TCQ is the binary-labeled output of the TCQ.Given a TCQ code-book, the probabilities To com- plete the design, it is necessary to search for the value of b for which E{-log 2 P (Û n |Y n )} ≈ R by repeating the codebook optimization for an appropriately chosen sequence of b values.
For block WZ-code designs, the transforms and the bit allocations are found by RD-WZQ model (Section 3.1.1)and WZ quantizers are implemented using CEC-TCQ followed by binary SW coding.More specifically, the rate found by the bit allocation algorithm for each transform coefficient is used as the conditional entropy constraint in the design of a CEC-TCQ for that coefficient.As described in Section 2.3, the CEC-TCQ designs are based on scalar-side information obtained by a linear transform of the vector side-information at the decoder, see Theorem 3.All CEC-TCQ designs are based on the 8-state trellis used in JPEG2000 [ [26], Figure 3.16].For trellis encoding and decoding, a sequence length of 256 source samples has been used.For design and testing quantizers, sample sequences of length 5 × 10 5 have been used.Since, the main focus this paper is the design of transforms and the quantizers, we assume ideal SW coding of the binary output of each CEC-TCQ, so that our results do not depend on any particular SW coding method.In a practical implementation (e.g., [20]), near optimal performance can be obtained by employing a sufficiently long SW code (note that sequence length for SW-coding can be chosen arbitrary larger than the sequence length used for TCQ encoding).This type of coding is well suited for applications such as distributed image compression, where the coding is inherently block-based.
We also consider SP-DKLT code designs based on scalar quantization.In this case, the transforms and bit allocations are found by using the SWC-HRSQ model (Section 3.1.2).While it is possible to use the step-size predicted by SWC-HRSQ model to design uniform quantizers, we found that such quantizers in reality do not satisfy the required entropy constraint at lower rates.We instead use conditional entropy constrained scalar quantizers (CEC-SQ), designed by modifying the algorithm in [27] to accommodate a conditional entropy constraint similar to CEC-TCQ approach above.
The reconstruction signal-to-noise ratio (RSNR) [with MSE as given by ( 6)] of SP-DKLT code designs for source model B is shown in the rows labeled Design in Table 2 where SP-DKLT/CECSQ and SP-DKLT/ CECTCQ refer to scalar quantization and TCQ based designs respectively.The rows labeled Analytical show the performance predicted by the SWC-HRSQ and RD-WZQ models upon which the transforms and bit-allocations are based (note however that the performance predicted by SWC-HRSQ model is not necessarily an upper-bound for CEC-SQ designs which are not constrained to be uniform quantizers).We compare the performance of our SP-DKLT code designs with IKLT codes for which the bit-allocations are obtained by using either entropy coded high-rate quantization model (for scalar quantizer design) [[4], Section 9.9] or RD-optimal quantization model (for block quantizer design) [ [16], Section 10.3.3] for Gaussian variables.The IKLT codes with scalar quantization have been implemented by using entropy constrained scalar quantizers (EC-SQ) while those with block quantizers have been implemented by using EC-TCQ [23], where we assume ideal entropy coding of the quantizer outputs.In Table 2,  Analytical values refer to SNR of the quantization model assumed for determining the optimal transforms and the bit-allocation IKLT/ECSQ and IKLT/ECTCQ respectively refer to these designs.From a practical view point, the use of DCT instead of KLT is interesting [26].We therefore consider the design of SP-DTCs based on the DCT, referred to as source-split distributed DCT (SP-DDCT) codes.Since DCT is a fixed transform, we only need to optimize the bit allocations.To do this, we assume that DCT is approximately a decorrelating transform for Gaussian vectors [4].Then, the bit-allocations given by ( 9), (14), and (20) are still valid provided that the eigenvalues in these expressions are replaced by the variances of the corresponding DCT coefficients.The rest of the design procedure is the same as that with SP-DKLT.The RSNR of DCT based designs are presented in Table 3 (again, the performance predicted by SWC-HRSQ model is not an upper bound for corresponding practical codes).The results show that, for this particular source model, even the scalar quantization based SP-DDCT codes outperform TCQ-based IDCT codes.

Concluding remarks
Rate-distortion analysis and experimental results demonstrate that SP-DTC is a promising practical approach to implementing distributed VQ of highdimensional correlated vectors.The comparisons shown in Table 2, as well as similar comparisons for source model A and the source model in [ [8], Example 6], indicate that these codes can substantially outperform the independent transform codes, when there is sufficient inter-vector correlation.This approach has also been demonstrated to be effective for DCT-based systems.Therefore, the proposed approach can be potentially used in applications such as stereo image compression when inter-camera communication is impractical.Our RD analysis however indicates that the achievable rateregion of SP-DKLT codes for jointly Gaussian sources is strictly inside that predicted by the DKLT of [8].An interesting avenue of future work is to find implementable distributed transforms codes which can achieve the rate-pairs below the "time-sharing" line in Figure 6.Another issue is the extension of the proposed approach to more than two vector sources.In principle, sourcesplitting can be easily applied to more than two sources.However, with more than two vector sources, the complexity of the bit-allocation will be significantly higher.

Proof of Theorem 2
The optimality of CKLT is proved in [5].To prove the optimality of the bit allocation, consider high-rate scalar quantization of the transform coefficient U m and ideal SW coding of the quantizer output Û m at the rate r m = H (Û m |Y) bits/sample, m = 1, . .., M 1 , where H (•|•) denotes the conditional entropy [16].In this case, the asymptotically optimal scalar quantizer for each coefficient is known to be uniform [5].
Since ( 27) and (1) are the same within a constant factor of πe/6 (which is identical for all transform coefficients), it is easy to verify that the optimal bit-allocation solution under SWC-HRSQ model is also given by (2).However, the MSE of mth coefficient in this case is

Proof of Theorem 3
For jointly Gaussian and mean-zero X and Y, there exists a matrix A and a mean-zero Gaussian vector W 1 independent of Y such that X = AY + W 1 , where

Figure 5
Figure 5 Comparison of rate-regions achievable with different TC approaches in quantizing 16-dimensional vectors (M 1 = M 2 = 16) of source model B for a = 0.32 and different values of θ.

n u n − ûn 2 + 1 C 2 2 Figure 6
Figure 6  The optimal SP-DKLT codes at D = 0.005 for source model B with a = 0.32 and θ = 0.9.The minimum sum-rate is 2.5 bits/ sample, which can also be achieved by time sharing of codes C 1 and C 2 .

λ m 2 −
For high-rate uniform quantization, H (Û m |Y) ≈ h(U m |Y) -log 2 (Δ m ), where Δ m is the quantizer step-size and h(•|•) is the conditional differential entropy [[16], Section 8.3].Since the conditional variance of the transform coefficient U m , given the side-information Y is E U 2 m |Y = λ m , and (U m , Y) are jointly Gaussian, it follows that h(U m |Y) = (1/2) log 2 (2πel m ) [16], and hence m = √ 2π eλ m 2 −r m .Therefore, the MSE of high-rate uniform quantization followed by ideal SW coding of U m is d 2r m .

Table 2
RSNR (in dB) of KLT-based transform code designs for quantizing 16-dimensional vectors (M 1

Table 3
SNR (in dB) of DCT-based transform code designs for quantizing 16-dimensional vectors (M 1 = M 2 = 16) of source model B (a = 0.32, θ = 0.9) at R 1 = R 2 = R bits/ sample Analytical values refer to SNR of the quantization model assumed for determining the bit-allocation.