In our work, we extend MS-BCS to wireless video multicast with the aim of getting some gains over the SoftCast scheme, while preserving the property of accommodating heterogeneous users and graceful degradation of quality. To this end, we should meet the following requirements as much as possible: Firstly, a video source should be divided into packets with an equal importance. With this requirement, the video quality decoded at each user becomes independent of which packets are received; rather, it depends only on how many packets are received (according to its channel condition). Secondly, more important coefficients should occupy a higher percentage in each packet so that the reconstructed quality would still be acceptable even with the low-quality users who only receive a small number of packets. Thirdly, the overall system should consider the channel noise, and the reconstruction quality should smoothly change proportionally to the packet loss ratio and channel signal-to-noise ratio (CSNR).
Our framework is shown in Fig. 1, where we have made use of the MS-BCS-SPL scheme. Specifically, in the encoder side, each frame is first transformed by DWT and the resulting coefficients are measured by multiplying a multi-scale measurement matrix Φ
′, the same as that in Eq. (6). The measurements are packed and transmitted through a raw orthogonal frequency division multiplexing (OFDM) with an analog-like modulator to the decoders to provide wireless multicasting service. At the decoder side, the received packets at each user—the number of packets varies among different users—are first de-noised by the linear least square estimation (LLSE) algorithm, and then, each user will reconstruct a frame using the de-noised measurements.
3.1 Encoder side
Preprocessing. In our scheme, a preprocessing is performed to each frame before encoding in order to keep a low energy. In the original MS-BCS-SPL scheme, the image is first subtracted by its mean value (denoted as E
0) of all pixels included. At the decoder side, the mean value will be added back to the image after recovering. However, in the environment of wireless video multicast, the mean value will probably be lost during transmission over a noisy and lossy channel, thus leading to a very in-accurate reconstruction. To solve this problem, we propose to set E
0 to 128 (for 8-bit video frames) at the preprocessing step. Clearly, this constant (mean) value can be compensated back at the decoder side (regardless of how noisy and lossy the involved channel is, because it is not needed to be transmitted at all).
DWT. In our scheme, each frame is sampled and recovered in the wavelet domain, following what has been done in MS-BCS-SPL [17], where a dual-tree DWT (DDWT) as [22] with bivariate shrinkage [23] is applied within the DDWT domain to enforce sparsity as described in [16]. To this end, we propose to decompose each frame (after preprocessing) into L DWT levels (as shown in Fig. 2 where L=4). After the DWT decomposition, it is clear that the DWT coefficients have successively decreasing importance at higher decomposition levels.
Measuring with different CS rates. Since the coefficients in different DWT levels have different importance towards the reconstruction quality, we need to apply different CS rates (for measurement) that depend on the importance at each DWT level. This means that the more important the level is, the larger measurement CS rate will be allocated to it. To this end, we first divide the sub-bands of each DWT level into blocks. We choose different sizes at different levels: the size becomes increasingly larger from level1 to level3. Then, we will measure the sub-bands at different DWT levels with different measurement matrices corresponding to the block size in each level. For example, if four DWT levels are used (the same as in Fig. 2) and the block sizes 4×4, 8×8, and 16×16 are adopted, respectively, for level1 ∼ level3, we may select the corresponding measurement matrices to be of sizes 32×16, 96×64, and 176×256, respectively, i.e., Φ
1∈R
32×16 at level1, Φ
2∈R
96×64 at level2, and Φ
3∈R
176×256 at level3. Such arrangement means measurement rates of 200, 150, and 68.75 %, respectively. The over-measured CS data seem completely redundant at this moment (the corresponding measurement rate at level0 will be even larger; see discussions in the next sub-section). Nevertheless, it will be pointed out later that they are necessary when packing these CS data into packets.
Packing with optimized CS rate allocation. Packing of the CS measurements (in the DWT domain) is performed in our work in such a way that all packets have an equal importance. As a result, the reconstruction quality at each user in the multicast group depends only on how many packets are received, regardless of what packets are received. On the other hand, since the measurements from different DWT levels have different importance toward the reconstruction quality, we select different percentages for them in each packet.
Suppose that (i) the width and height of each frame are W and H, respectively; (ii) the total number of packets is k; and (iii) the overall rate for all measurements over each frame is set at full (100 %). In our scheme, we first arrange the whole baseband level0 into each packet (as shown in Fig. 3 where L=4), because it has the highest importance and losing it would lead to a sharply degraded reconstruction quality. Clearly, this repetition produces an over-sampling (by k times); nevertheless, it guarantees a minimum quality even when a user receives only one packet. On the other hand, the remaining levels need to use some well-determined CS rates so as to achieve the overall rate at 100 %.
Suppose that the CS rate for level1 is r
l
, with l=0,…,L−1, and r
l
≥r
l+1. To guarantee the full overall rate, we have
$$\begin{array}{@{}rcl@{}} \frac{W\times H}{4^{L-1}}k+\sum_{l=1}^{L-1}3\frac{W\times H}{4^{L-1}}r_{l}={W\times H} \end{array} $$
((12))
which can be simplified as
$$\begin{array}{@{}rcl@{}} \frac{1}{4^{L-1}}k+\sum_{l=1}^{L-1}3\frac{1}{4^{L-1}}r_{l}=1 \end{array} $$
((13))
Notice that Eq. (13) may not always be met exactly. In this case, we will choose r
l
properly to make Eq. (13) hold as much as possible.
Once all r
l
’s are obtained (with r
0 being always set at 1), each frame within a group of pictures (GOP) will be CS-sampled at the determined CS rates in the corresponding DWT levels. We perform the packing process on all frames within each GOP to generate a total number of k packets: the baseband level0 is put into each packet, whereas the measurements for level1 (the total number is \(3\frac {W\times H}{4^{L-1}}r_{l}\)) is evenly put into k packets, i.e., \(3\frac {W\times H}{4^{L-1}}r_{l}/k\) CS data are put into each packet. Notice that because measurements at level
l
are over-complete, we need to select \(3\frac {W\times H}{4^{L-1}}r_{l}/k\) CS data carefully so that they are as independent as possible with respect to each other.
3.2 Raw OFDM channel
Before packets are transmitted over a raw OFDM channel [24], the measurements in each packet are rounded and then directly mapped into the transmitted symbol, whereas no FEC of any kind is employed. Figure 4 shows the modulation adopted in our work: \(P_{s}^{[k]}\) and \(P_{s}^{[k+1]}\) are the k-th and (k+1)-th data in the s-th packet, and such pairs of data are directly mapped as the I and Q components of the transmitted symbol. Finally, the PHY layer directly transmits all symbols over OFDM channels in which we will consider different strengths of channel noise in our experimental results.
Figure 5 shows the overall OFDM channel structure in which the modulation is the same as that shown in Fig. 4. At the transmitter side, symbols obtained from modulation are inputted into some sub-bands after the serial-parallel conversion. Symbols in each sub-band go through Inverse Fast Fourier Transform (IFFT), guard interval insertion, and the parallel-serial conversion to get the OFDM signal. Then, the OFDM signal is transmitted over a wireless channel with additive white Gaussian noise (AWGN). At the receiver side, operations that are opposite to what have been done at the transmitter side are carried out. This whole procedure is the same as in the SoftCast scheme.
3.3 Decoder side
LLSE. When packets are transmitted over a noisy channel, channel noise is added to packets and thus may incur a certain deviation from the original data, which may badly influence the reconstruction quality. Here, we propose to apply LLSE [25] to the received data before the MS-BCS-SPL reconstruction.
First, assuming that all packets are received (i.e., no packet loss) but with channel errors corrupting them, we can rewrite the received signal as
$$\begin{array}{@{}rcl@{}} \widehat{y}=y+n \end{array} $$
((14))
where n is the additive white Gaussian noise. Then, LLSE estimates the original signal as
$$\begin{array}{@{}rcl@{}} y_{\text{LLSE}}=\Lambda_{y}(\Lambda_{y}+\Sigma)^{-1}\widehat{y} \end{array} $$
((15))
where y
LLSE refers to the LLSE estimate of measurement y, Λ
y
is the covariance matrix of y (which will be transmitted as metadata), and Σ is the covariance matrix of channel noise n. With a high channel SNR (CSNR), we obtain an approximation as
$$\begin{array}{@{}rcl@{}} y_{\text{LLSE}} \approx \Lambda_{y}(\Lambda_{y})^{-1}\widehat{y}=\widehat{y} \end{array} $$
((16))
This means that the LLSE step becomes void, which is reasonable because the measurements are trustable nearly completely.
Next, when a receiver experiences certain packet loss, let us define \(\widehat {y}_{*}\) as \(\widehat {y}\) after removing all lost packets, and similarly n
∗ as the corresponding noise vector, and we still have
$$\begin{array}{@{}rcl@{}} \widehat{y}_{*}=y_{*}+n_{*} \end{array} $$
((17))
Then, the LLSE decoder becomes
$$\begin{array}{@{}rcl@{}} y_{\text{LLSE}}=\Lambda_{y_{*}}(\Lambda_{y_{*}}+\Sigma_{*})^{-1}\widehat{y}_{*} \end{array} $$
((18))
Different measurement matrices. Since different users are connected with different bandwidths in the same multicast group, they receive different numbers of packets. Consequently, after LLSE, each user needs to use its own measurement matrix for each level according to the packets it receives. Suppose that the encoder uses a random matrix \(\Phi \in \mathbb {R}^{N_{l}}\) (which can be repeated exactly at the decoder side) to generate the measurements for level l and one user just receives M
l
from N
l
measurements, then the corresponding measurement matrix used at the decoder side for reconstruction can be obtained as
$$\begin{array}{@{}rcl@{}} {\Phi_{l}^{T}}=\{\left(\Phi^{T}\right)_{i}|i\in\{1,\cdots,M_{l}\}\} \end{array} $$
((19))
where i is the row index of Φ and can be obtained from the packet index.
MS-BCS-SPL reconstruction. MS-BCS-SPL provides a multi-scale reconstruction by deploying block-based CS sampling within the wavelet domain, which applies the Landweber step to each block in each sub-band at each decomposition level independently. Hence, the reconstruction x
l,s,j
for block j of sub-band s at level l can be expressed as
$$\begin{array}{@{}rcl@{}} \widetilde{x}_{l,s,j}=x_{l,s,j}+{\Phi_{l}^{T}}(y_{l,s,j}-\Phi_{l} x_{l,s,j}) \end{array} $$
((20))
where Φ
l
represents the block sampling operator of level l.
3.4 Transmission of metadata
As shown in Eq. (15), the decoder requires the covariance matrix of measurement y. To this end, we transmit the standard deviations as metadata so that the covariance matrices can be calculated at the decoder from the received standard deviations. In our MS-cast, there are several standard deviations for each frame. For the video sequence of 352×288 at 30 Hz with four-level wavelet decomposition, we will place the measurements into several packets (date matrix) with size 64×1584 (this is for the convenience of subsequent 64-point IFFT). Then, for each frame, there are 1 standard deviation from the vectorized level0 components of size 44×36 (1584) with repeated full measurement, 6 (3×200 %) standard deviation from three vectorized level1 components of size 44×36 with measurement rate r
1=200 %, 18 (3×4×150 %) standard deviation from three vectorized level2 components of size 88×72 with measurement rate r
2=150 %, and 33 (3×16×68.75 %) standard deviation from three vectorized level3 components of size 176×144 with measurement rate r
3=68.75 %. In total, there are 58 standard deviations for 58 vectors of size 1584 that need to be transmitted as metadata for each frame of size 352×288.
The transmission of the standard deviation is through the traditional communication scheme consisting of entropy coding, channel coding, and modulation. The standard deviations are quantized by a 32-bit scalar quantizer and compressed by entropy coding, and then further coded using the 1/2 convolutional code (with generator polynomials {133, 171}) and BPSK constellation. This forms the metadata packet. Hence, the percentage of this metadata is about 58×32×2/352/288=3.66 %. For this extra percentage of metadata, we just cut the measurement rate of level 3 to ensure the final equivalent rate of each frame to be full (100 %).