Combined Source-Channel Coding of Images under Power and Bandwidth Constraints

This paper proposes a framework for combined source-channel coding for a power and bandwidth constrained noisy channel. The framework is applied to progressive image transmission using constant envelope M -ary phase shift key ( M -PSK) signaling over an additive white Gaussian noise channel. First, the framework is developed for uncoded M -PSK signaling (with M = 2 k ). Then, it is extended to include coded M -PSK modulation using trellis coded modulation (TCM). An adaptive TCM system is also presented. Simulation results show that, depending on the constellation size, coded M -PSK signaling performs 3.1 to 5.2dB better than uncoded M -PSK signaling. Finally, the performance of our combined source-channel coding scheme is investigated from the channel capacity point of view. Our framework is further extended to include powerful channel codes like turbo and low-density parity-check (LDPC) codes. With these powerful codes, our proposed scheme performs about one dB away from the capacity-achieving SNR value of the QPSK channel.


INTRODUCTION
Shannon's separation principle [1] states that source coding and channel coding could be optimized individually and then operated in a cascaded system without sacrificing optimality.Therefore, traditionally, channel coders are designed independently of the actual source, while source coders are designed without considering the channel.The resulting coders are then cascaded.However, Shannon's separation principle is valid only for asymptotic conditions such as infinite block length and memoryless channel.Thus, under practical delay and storage constraints, independent designs of source and channel coders are not optimal.This motivates a joint optimal design [2] of the source and channel coders.However, joint optimization is quite complex in practical systems.Not only does the traditional theoretical approach require infinite complexity, but also a completely coupled design seems practically infeasible.
This paper presents a low-complexity technique, which increases the performance of cascaded systems by introducing some amount of coupling between the source coder and the channel coder.Specifically, source-and channel-rate allocations are studied for embedded source coders and a power and bandwidth constrained noisy channel.
The average energy transmitted per source symbol is considered to be an important design parameter when using a power-constrained (e.g., AWGN) channel.Since the transmission rate is the number of bits transmitted per source symbol, if the signal constellation is known, the average energy transmitted per source symbol can be formulated to optimize the end-to-end quantization error of the system.The transmitted bits include source bits and redundant bits.It is therefore important to effectively allocate these bits between the source coder and the channel coder.This allocation is characterized by the choice of a channel code rate.By introducing a bandwidth constraint, this degree of freedom becomes the choices of signal constellation in conjunction with both the channel code rate (resulting in coded modulation) and the source code rate.Thus, there is a tradeoff between modulation, source coding, and channel coding.These components will be examined by jointly optimizing the transmission rate and the channel code rate for a certain class of source and channel codes.Our goal is to minimize the average distortion of a source transmitted over a bandwidth and power constrained noisy channel.
Sherwood and Zeger [3] used a combined source-channel scheme based on Said and Pearlman's set partitioning in hierarchical trees (SPIHT) image-coding algorithm [4].
They utilized cyclic redundancy check (CRC) codes [5] and rate-compatible punctured convolutional (RCPC) channel codes for image transmission over binary symmetric channels (BSCs).Since then, a large body of works (see [6] and references therein) has addressed joint source-channel coding (JSCC) for scalable multimedia transmission over both BSCs and packet-erasure channels.Fossorier et al. [7] generalized the scheme of [3] from BSCs to analog binary channels by choosing the average energy per transmitted bit in conjunction with both the source rate and the channel code rate under a power constraint.While the additional degree of freedom makes it possible to achieve higher overall peak signal-to-noise ratio (PSNR) values, it also results in either bandwidth reduction or expansion (with respect to the underlying reference system), the latter being highly undesirable.
The embedded property of SPIHT coded image bitstream has been exploited to provide unequal error protection (UEP) by the use of different channel codes with codes of higher rates allocated to the tail of the bitstream.However, it has been shown in [8,9] that optimal UEP (with much high complexity and longer delay) only offers a small performance gain over optimal equal error protection (EEP) for BSCs.This motivates us to study efficient transmission scheme obtained with constellation expansion, that is, coded modulation, in the spirit of EEP that does not lead to bandwidth expansion as in [7].
Forward error correction is a practical technique for increasing the transmission efficiency of virtually all-digital communication channels.Ungerboeck [10] showed that with TCM, it is possible to achieve asymptotic coding gain of as much as 5.8 dB in average energy per symbol (E s /N 0 ) within precisely the same signal spectral bandwidth, by doubling the signal constellation set from M = 2 k1 to M = 2 k using a method called set partitioning.The main idea is to maximize Euclidean distance rather than dealing with Hamming distance.The set partitioning strategy maximizes the intrasubset Euclidean distance.It has led to extensive research [11] on finding practical codes and their performance bounds.Viterbi et al. [12] introduced bandwidth-efficient pragmatic codes which generate trellis codes for higher M-PSK constellation by using an industry standard rate-1/2 trellis code, at the loss of some performance compared to Ungerboeck codes.Wolf and Zehavi [13] extended pragmatic codes to a wide range of high-rate punctured trellis codes for both PSK and QAM modulations.
This paper proposes a combined source-channel coding framework based on embedded image coders such as SPIHT and JPEG2000.The SNR is chosen in conjunction with the source code rate and the channel code rate under a power constraint.In the meantime, TCM is used in conjunction with a bandwidth constraint.An adaptive TCM system capable of operating at variable rates and modulation formats is designed using punctured TCM codes [14].Theoretical performance bounds are computed analytically for TCM coding and simulations performed to match the theoretical analysis of TCM coders for our combined source-channel coding system.In addition, simulation results using turbo [15] and LDPC codes [16] are also presented in this study; the turbo (and LDPC) based source-channel coding system has a gap of 1.2 (and 0.98) dB from the capacity-achieving SNR (SNR gap) value of the QPSK channel.
This paper is organized as follows.In Section 2, we present our combined source-channel coding framework using the SPIHT image coder under power and bandwidth constraints.The SPIHT image coder is reviewed, and both uncoded and coded signaling formats are considered.In Section 3, the proposed framework is applied to M-PSK signaling.Theoretical and simulation results for both uncoded and coded cases are presented, followed by the design of an adaptive TCM system.The input constrained capacity for AWGN channels is considered in Section 4. Results from applying both turbo and LPDC codes are also presented.Section 5 concludes the paper.

The SPIHT image coder
The SPIHT coder by Said and Pearlman [4] is a celebrated wavelet-based embedded image coder.It employs octaveband filter banks for subband/wavelet decomposition of the input image and takes advantage of the fact that the variance of the coefficients decreases from the lowest to the highest bands in the subband pyramid.This SPIHT coding algorithm is an improvement of Shapiro's embedded zerotree wavelet (EZW) coding algorithm [17].The difference between SPIHT and EZW is that the SPIHT algorithm provides better performance.Both coders outperform JPEG while producing an embedded bitstream, which means that the decoder can stop at any point of the bitstream and still produce a decoded image of commensurate quality.EZW and SPIHT have led to the development of the new JPEG2000 image compression standard.Since both SPIHT and JPEG2000 produce embedded bitstreams, our proposed framework is applicable to both of them.However, we only use the SPIHT image coder in this paper.

The proposed framework
Consider a JSCC system employing the SPIHT image coder emitting bits at rate r s , measured in bits per pixel (bpp), where the total number of pixels in the input image(s) is assumed to be L.The quality of the decoded image is measured by the mean-squared error (MSE) D as a function of r s .Figure 1  coder, decoding stops if a single error occurs. 2 Thus the average distortion after transmitting an N-bit SPIHT bitstream across a channel characterized by its bit-error probability P b can be calculated as If R constellation signals per source sample 3 are transmitted over the channel using an average energy of E s per transmitted signal, then for a given target power level P 0 (in maximum permitted energy per source sample), power constrained transmission means RE s P 0 .On the other hand, the bandwidth constraint R 0 implies a duration per constellation signal (or channel use) of at least 1/R 0 second, then R = R 0 implies E s = P 0 /R 0 if both the maximum available power and available bandwidth are used.
Let b 0 be the total number of transmitted symbols for the source image (with L pixels); by the definition of R, we have In all systems considered in this work, R 0 is fixed.Equation ( 2) means b 0 is a constant in all systems.
If a channel code with rate r c is used for error correction, the maximum number of bits per source sample available for 2 Throughout this paper, we assume that channel errors (if any) can be detected perfectly (e.g., by CRC codes, which are widely used for error detection because of the simplicity of their implementation and the low complexity of both the encoder and the decoder); see, for example, the CRC-RCPC code used in [3]. 3 For transporting images, a source sample corresponds to an image pixel.
We use them interchangeably in this paper.
source coding is r s = R 0 r c k, with M = 2 k being the number of modulation levels.Thus, when the maximum available bandwidth is utilized, that is, R = R 0 , we also have It is assumed that each constellation {S i } used for transmission over an AWGN channel with zero mean and variance N 0 /2 is associated with a capacity C i (E s /N 0 ).Shannon's channel coding theorem states that if r c k < C i (E s /N 0 ), then, r s bits per source sample can be transmitted with an arbitrarily small probability of error and Shannon's separation principle implies that the distortion level D(r s ), corresponding to rate r s , can be achieved.
Since D(r s ) is assumed to be a nonincreasing function of r s , this simply suggests the selection of the signal constellation that achieves the highest capacity under the power and bandwidth constraints (assuming infinite block lengths).

Application to an arbitrary modulation format for an AWGN channel
We consider the following practical problem based on the embedded SPIHT image coder: for a given AWGN channel with zero mean, variance N 0 /2, and constraints on both the average power and bandwidth, what is the minimum achievable average MSE of transmitted images, using arbitrary modulation signaling (AMS) for both coded and uncoded systems?

Uncoded AMS signaling
The SPIHT image coder is used in conjunction with uncoded 2 k -AMS signaling, that is, r c = 1.The corresponding average bit-error probability is computed and given as P b (k).For an image (with L pixels) compressed at rate of r s bpp, r c kb 0 = kb 0 = Lr s source bits are transmitted over the AWGN channel with b 0 symbols.Due to the embedded nature of the SPIHT coded image bitstream, the average MSE can be expressed as where D(r s ) represents the distortion of the image decoded at rate r s bpp (see Figure 1).From ( 2) and ( 3), the source code rate can be rewritten as r s = r c kb 0 /L = kb 0 /L, which varies only with k under uncoded signaling.Equation (4) then becomes (5)

EURASIP Journal on Advances in Signal Processing
Since D(kb 0 /L) decreases while P b (k) increases as k increases, it implies that for a given value of E s /N 0 , the optimum choice of k corresponds to the MSE Intuitively, this choice is justified by the fact that as the channel condition improves (i.e., E s /N 0 increases), a larger constellation size (i.e., larger value of k) can be chosen to achieve higher throughput (source) rate r s with lower MSE.However, a lower average MSE can be obtained if channel coding is combined with the modulation, resulting in coded modulation.The following section illustrates how to do this.

Coded AMS signaling
Assume a rate-r c channel code (with r c < 1) is used to transmit images compressed at rate of r s bpp with 2 k -AMS signaling, so that r c kb 0 = Lr s .If the corresponding bit-error probability is approximated as P b (k), then the average MSE becomes We optimize (7) over r c and k for fixed b 0 and L to obtain In terms of the PSNR in dB, it becomes where D opt (E s /N 0 ) is chosen as (6) or (8).Depending on the channel condition, we optimize both the channel code rate and modulation format for minimum distortion (or maximum PSNR).

Phase shift keying (PSK)
PSK is a combined energy modulation scheme in which the source information is contained in the phase of the transmitted carrier.For a given value of E s /N 0 , the bit-error probability P b (k) of M-PSK signaling over an AWGN channel using  4 at very low E s /N 0 (since P b (k) 1 for all k), however, since r s is higher for larger k, the system performance plateaus sooner at lower PSNR with smaller k than with larger k.The best system performance corresponds to the envelop of the different PSNR versus E s /N 0 curves.The uncoded system performs poorly at low E s /N 0 .To improve this performance, coded modulation techniques like TCM should be used.

Trellis-coded modulation (TCM)
TCM codes [10] introduce the redundancy required for error control without increasing the signal bandwidth by expanding the signal constellation size.Now, symbol mapping becomes part of the TCM code design and it is done in a special way called set partitioning.Ungerboeck [10] showed that it is possible to achieve an asymptotic coding gain of as much as 5.8 dB in E s /N 0 without any bandwidth expansion.The probability of symbol error for transmission over noisy channels is a function of the minimum Euclidean distance d free between pairs of distinct signal sequences.If b dfree is the total number of information bit errors associated with the erroneous paths at distance d free from the transmitted one, averaged over all possible transmitted paths, we have a probability of bit error [19] of at sufficiently high E s /N 0 .Figure 3 depicts the performance of three coded systems that uses 4-state rate-1/2 TCM (with QPSK), 8-state rate-2/3 TCM (with 8-PSK), and 8-state rate-3/4 TCM (with 16-PSK), respectively, again for transmitting the 512 ¢ 512 Lena image using 65,536 symbols (or R 0 = 0.25).The corresponding source coding rate r s = R 0 r c k is 0.25, 0.5, and 0.75 bpp,  11) is approximated using only the error paths at distance d free .
The performance of uncoded systems of Figure 2 are also included for comparison purposes.It is seen that, at the same r s , a TCM coded system performs better than an uncoded system at low E s /N 0 .
We note that a similar approach has been presented in [20] for robust video coding.However in [20], binary channel coding with gray-mapped QPSK signaling is considered in conjunction with an enhancement, which allows one to select two rotated versions of the QPSK constellation, resulting in nonuniform 8-PSK signaling.Contrary to our proposed scheme, channel coding in [20] is realized independently of the modulation so that independent parallel binary channels are considered at the receiver.

Adaptive TCM system
The performance of the TCM system depicted in Figure 3 still saturates quickly and in some regions of E s /N 0 values, the uncoded system performs better.Moreover, each configuration requires a separate code.Hence for practical use with variable channel conditions, the JSCC-TCM system presented above is not suitable.We thus devise a single encoder-decoder TCM system based on punctured codes [14].It is assumed that the transmitter is able to perform adaptive modulation, which can be achieved, for example, with the help of channel side information.
Figure 4 presents the performance of this adaptive TCM system.It employs a single 64-state rate-1/2 TCM code in [12] as its base code, which has reasonable decoding complexity.By varying the puncturing rate (which leads to different r c 's) and k (or the constellation size M), a number of system configurations are generated and their performance presented.The best performance of this adaptive TCM system is the envelop of all PSNR versus E s /N 0 curves.Table 1 summarizes the best choices of r c and constellation size M = 2 k with PSK (and the associated r s = R 0 r c k) corresponding to different E s /N 0 ranges.It is seen from Figure 4 that our QPSK 64-state rate-1/2 TCM coded system performs 5.2 dB better than uncoded BPSK signaling, and that our 8-PSK 64-state rate-2/3 TCM coded system and 16-PSK 64-state rate-3/4 TCM coded system performs 3.1 dB better than uncoded QPSK and 8-PSK signaling, respectively.
So far, the performance of our TCM-based JSSC scheme is studied in terms of E s /N 0 .In the next section, the performance is studied from a channel capacity perspective using powerful channel codes.

Channel capacity
The capacity of a discrete input continuous output memoryless (e.g., AWGN) channel is given as If b 0 symbols are transmitted over this channel, then the minimum achievable distortion is given by D(b 0 C M /L), where D(¡) is the operational distortion-rate function (see Figure 1) of the SPIHT image coder.
In Figure 5, the performance of the JSCC framework, employing the adaptive TCM system (see Section 3.3) and uncoded M-PSK modulation, is compared with the minimum achievable distortion.We observe that there still remain large SNR gaps at the low SNR range.The performance can be improved by employing capacity-approaching random codes like turbo [15] and LDPC codes [16] for low E s /N 0 values (although theoretical expressions are no longer feasible).

Turbo-coded JSCC system
A turbo encoder consists of two binary rate-1/2 recursive systematic convolutional (RSC) encoders separated by an interleaver.Unfortunately, the presence of an interleaver complicates the structure of a turbo code trellis, and a decoder based on maximum-likelihood estimation cannot be used.Thus a suboptimal iterative decoder based on the a posteriori probability (APP) binary BCJR [21] algorithm is used.Given the channel output sequence, the BCJR decoder estimates the bit probability.
In the case of turbo coded modulation, there are a couple of techniques that can be used.A turbo system can be designed specifically for the corresponding modulation scheme [22,23].For example, a symbol interleaver is used in [23] and a symbol-based BCJR algorithm is replaced at the decoder side.The technique in [24] uses a direct extension of binary turbo codes.The output of the binary turbo encoder is gray mapped to some constellation symbols.The received symbols are demodulated and the log-likelihood ratio (LLR) of each bit in the symbol is computed.This soft information is then passed to the decoder.This scheme is simple and easily extendable.We designed turbo codes of rate- codes are 1.9 0.13 = 1.77 and 7 5.6345 = 1.3655 dB away from capacity for 8-PSK, respectively.The performance for our turbo coded system degrades at low SNR because of increased noise power.
The above turbo codes are on average 1.4 dB away from near-Shannon-limit error-correction performance.This gap can be further reduced by increasing the frame size but at the cost of increased computation and latency, and/or by using other types of turbo codes designed specifically for coded modulation.An alternate is to use low-complexity LDPC codes.

LDPC-coded JSCC system
An LDPC code is completely specified by its parity check matrix.Extensive research works (e.g., [25]) have been conducted on the design of LDPC codes.When designed carefully, irregular LDPC codes can perform very closely to the capacity of typical channels.
In our experiments, we set the maximum number of LDPC decoding iterations to be 60 (between the demodulator and the LDPC decoder) and 25 (for the LDPC decoder).Because there is always a probability of decoding error, we run the same image transmission 5,000 times at the operating E s /N 0 and make sure that correct image decoding is guaranteed at least 996 out of every 1,000 runs before reporting the averaged PSNR results.This makes sure that the effect on the PSNR performance due to the probability of error is negligible at the operating E s /N 0 .Figure 8 indicates that the average decrease in image quality due to LDPC decoding errors is 33.9935 33.9923 = 0.0012 dB in PSNR (because all four errors in every 1,000 runs in our experiments occur towards the end of the source bitstream).It is also seen that our JSCC system with LDPC codes (operating at E s /N 0 = 1.18 dB) is 0.98 dB away from the capacity and it performs 0.22 dB and 3.82 dB better than the turbo system and TCM system, respectively, for the QPSK system.
The overall performance achieved by our scheme with rate-1/2 code using various coding schemes (e.g., TCM, turbo and LDPC codes) for QPSK modulation is summarized in Table 2. Similar results can be achieved by using turbo and LDPC codes with various rates and M-PSK modulations.

CONCLUSIONS
In this paper, a general framework for determining the optimal source-channel coding tradeoff for a power and bandwidth constrained channel has been presented.It addresses a potential shortcoming of [7] with respect to bandwidth expansion.It also offers an additional degree of freedom with respect to the EEP/UEP approaches of [3,8,9], as well as a means of improvement.This framework has been applied to progressive image transmission with constant envelope M-PSK TCM signaling over the AWGN channel.An adaptive M-PSK TCM system employing a single encoder-decoder pair is also presented.Our combined source-channel coding approach is close to be optimal, when used in conjunction with strong random coding techniques.Extensions to other signaling constellations or channel models follow in a straightforward manner.A particularly well-suited example for PAM signaling over a fading channel is the JSCC scheme proposed in [28] in which several PAM constellations can be chosen adaptively.

Figure 1 :
Figure 1: Operational distortion-rate function D(r s ) of the SPIHT coder for the 512 ¢ 512 Lena image.

Figure 3 :
Figure 3: PSNR versus E s /N 0 performance of using a TCM system for transmitting the SPIHT compressed 512 ¢ 512 Lena image using 65,536 symbols.The source coding rate for QPSK 4-state rate-1/2 TCM, 8-PSK 8-state rate-2/3 TCM, and 16-PSK 8-state rate-3/4 TCM is 0.25, 0.5, and 0.75 bpp, respectively.Both theoretical curve based on (11) and respective simulation results are provided.The performance of uncoded systems of Figure 2 is also included for comparison purposes.

Figure 4 :
Figure 4: PSNR versus E s /N 0 performance of our adaptive TCM system for transmitting the SPIHT compressed 512 ¢ 512 Lena image using 65,536 symbols.Numbers next to the performance ceilings are the source coding rates r s = R 0 r c k, with R 0 = 0.25 and M = 2 k being the constellation size.

Figure 5 :
Figure5: The best PSNR versus capacity-achieving E s /N 0 performance of using our JSCC system for transmitting the SPIHT compressed 512 ¢ 512 Lena image using 65,536 symbols.

Table 1 :
The best choice of channel code rate and signal constellation (and their associated source coding rate) corresponding to different E s /N 0 ranges based on Figure4for our adaptive TCM system when transmitting the SPIHT compressed 512 ¢ 512 Lena image using 65,536 symbols.
s /N 0 .This is because the BER in ( 1/2 with 16state QPSK and rates 1/3 and 2/3 with 16-state 8-PSK using = 67, 056 symbols, and the reported performance of turbo codes is calculated based on considering the first 65,536 symbols only.It is seen from Figure6that the rate-1/2 turbo code is 1.4 0.2 = 1.2 dB away from the capacity for QPSK; and turbo codes with coded modulation can achieve an additional gain of 3.6 dB over their TCM code counterpart.Figure7indicates that the rate-1/3 and rate-2/3 turbo s = r s = 0.25 bpp).

Table 2 :
Gains achieved with channel coding techniques (using rate-1/2 code and QPSK signaling) when transmitting the SPIHT compressed 512¢512 Lena image using 65,536 symbols.The source coding rate is r s = 0.25 bpp.