Skip to main content

Bit-depth scalable video coding with new inter-layer prediction

Abstract

The rapid advances in the capture and display of high-dynamic range (HDR) image/video content make it imperative to develop efficient compression techniques to deal with the huge amounts of HDR data. Since HDR device is not yet popular for the moment, the compatibility problems should be considered when rendering HDR content on conventional display devices. To this end, in this study, we propose three H.264/AVC-based bit-depth scalable video-coding schemes, called the LH scheme (low bit-depth to high bit-depth), the HL scheme (high bit-depth to low bit-depth), and the combined LH-HL scheme, respectively. The schemes efficiently exploit the high correlation between the high and the low bit-depth layers on the macroblock (MB) level. Experimental results demonstrate that the HL scheme outperforms the other two schemes in some scenarios. Moreover, it achieves up to 7 dB improvement over the simulcast approach when the high and low bit-depth representations are 12 bits and 8 bits, respectively.

1. Introduction

The need to transmit digital video/audio content over wired/wireless channels has increased with the continuing development of multimedia processing techniques and the wide deployment of Internet services. In a heterogeneous network, users try to access the same multimedia resource through different communication links; consequently, in a compressed bitstream, scalability has to be ensured to provide adaptability to various channel characteristics.

To make transmission over heterogeneous networks more flexible, the concept of scalable video coding (SVC) was proposed in [1–3]. Currently, SVC has become an extension of the H.264/AVC [4] video-coding standard so that full spatial, temporal, and quality scalability can be realized. Thus, any reasonable extraction from a scalable bitstream will yield a sequence with degraded characteristics, such as smaller spatial resolution, lower frame rate, or reduced visual quality.

Figure 1 shows the coding architecture of the SVC standard with two-layer spatial and quality scalabilities. A low-resolution input video can be generated from a high-resolution video by spatial downsampling and encoded by the H.264/AVC standard to form the base layer. Then, a quality-refined version of the low-resolution video can be obtained by combining the base layer with the enhancement layer. The enhancement layer can be realized by coarse grain scalability (CGS) or medium grain scalability (MGS). Similar to the H.264/AVC encoding procedure, for every MB of the current frame, only the residual related to its prediction will be encoded in SVC.

Figure 1
figure 1

The SVC coding architecture with two spatial layers [3].

The H.264/AVC standard supports two kinds of prediction: (1) intra-prediction, which removes spatial redundancy within a frame; and (2) inter-prediction, which eliminates temporal redundancy among frames. With regard to spatial scalability in SVC, in addition to intra/inter-predictions, the redundancy between the lower and the higher spatial layers can be exploited and removed by different types of inter-layer prediction, e.g., inter-layer intra-prediction, inter-layer motion prediction, and inter-layer residual prediction. Hence, the coding efficiency of SVC will be better than that under simulcast conditions, where each layer is encoded independently, since inter-layer prediction between the base and the enhancement layers may yield a better rate-distortion (R-D) performance for some MBs.

Acquiring high-dynamic range (HDR) images has become easier with the development of new capture techniques. As a result, HDR images receive considerable attention in many practical applications [5, 6]. For example, in High-Definition Multimedia Interface 1.3, the supported bit-depth has been extended from 8 to 16 bits per channel, so that viewers perceive the displayed content as more realistic. In 2003, the joint video team (JVT) called for proposals to enhance the bit-depth scope of H.264/AVC video coding [7]. The supported bit-depth in H.264/AVC is now up to 14 bits per color channel. However, the bandwidth required to transmit the encoded high bit-depth image/video content is much larger. In addition, conventional display devices cannot present the HDR video format, and so it is necessary to design algorithms that can resolve such problems. In addition to the three supported scalabilities, it is possible to extend the technical feasibility of the SVC standard to provide the bit-depth scalability. The embedded scalable bitstream can be truncated according to the bit-depth requirements of the specific application. In contrast, a high-quality, high bit-depth and high-resolution output is achievable by decoding the complete bitstream for high-definition television (HDTV) applications.

To cope with the increased size of high bit-depth image/video data compared to those of conventional LDR applications, it is necessary to develop appropriate compression techniques. Some approaches for HDR image compression that concentrate on backward compatibility with conventional image standards can be found in [8, 9]. Moreover, to address the scalability issue, a number of bit-depth scalable video-coding algorithms have been proposed in recent years, and many bit-depth-related proposals have been submitted to JVT meetings [10–14]. Similar to spatial scalability, the concept of inter-layer prediction is applied in bit-depth scalability to exploit the high correlation between bit-depth layers. For example, an inter-layer prediction scheme realized as an inverse tone-mapping technique was proposed in [10]. The scheme predicts a high bit-depth pixel from the corresponding low bit-depth pixel through scaling plus offset, where the scale and offset values are estimated from spatial neighboring blocks. Segall [15] introduced a bit-depth scalable video-coding algorithm that is applied on the macroblock (MB) level. In this scheme, the base layer is also generated by tone mapping of the high bit-depth input and then encoded by H.264/AVC. For high bit-depth input, in addition to inter/intra-prediction, inter-layer prediction is exploited to remove redundancy between bit-depth layers where a prediction from the low bit-depth layer is generated using a gain parameter and an offset parameter. Moreover, the high and the low bit-depth layers use the same motion information estimated in the low bit-depth layer. In [11, 16], Winken et al. proposed a coding method that first converts a high bit-depth video sequence into a low bit-depth format, which is then encoded by H.264/AVC as the base layer. Next, the reconstructed base layer is processed inversely as a prediction mechanism to predict the high bit-depth layer. The difference between the original high bit-depth layer and the predicted layer is treated as an enhancement layer, and no inter/intra-prediction is performed for the high bit-depth layer. In [17, 18], those authors proposed an implementation that considers spatial and bit-depth scalabilities simultaneously. To improve the coding efficiency, Wu et al. [17] recommended that inverse tone mapping should be realized before spatial upsampling. Moreover, the residual of the low bit-depth layer should be upsampled and utilized to predict the residual of the high bit-depth layer [18]. This approach removes more redundancy than the methods in [15, 16]. In [19], an MPEG-based HDR video-coding scheme was proposed. First, the low dynamic range (LDR) frames, which are tone-mapped versions of the HDR frames, are encoded by MPEG and serve as references for the HDR frames by appropriate processing. The residuals associated with the original HDR frames are filtered to eliminate invisible noise before quantization and entropy encoding. Finally, the encoded residual is stored in the auxiliary portion of the MPEG bitstream.

Most bit-depth scalable coding schemes use low bit-depth information to predict high bit-depth information. In addition to the inter-layer prediction from the low bit-depth layer, we consider also to perform the inter-layer prediction in the reverse direction in this article, i.e., from the high bit-depth layer to the low bit-depth layer [20]. The rationale for our approach is that the information contained in the high bit-depth layer should be more accurate than that in the low bit-depth layer. Thus, better coding efficiency can be expected when reverse prediction is adopted. Our previous study [20] can be seen as a preliminary and partial result of this study. A more detailed description of the proposed schemes, as well as a more complete and rigorous performance analysis of the proposed schemes will be addressed in this article.

The remainder of this article is organized as follows. Section 2 reviews the construction of HDR images and their properties, as well as several tone- and inverse tone-mapping methods. In Section 3, we introduce the proposed LH scheme, which is similar to most current methods. We also describe the proposed HL scheme and the combined LH-HL scheme in detail. Section 4 details the experimental results. Then, in Section 5, we summarize our conclusions.

2. HDR images and tone-mapping technology

HDR technologies for the capture and display of images/video content have grown rapidly in recent years. As a result, HDR imaging has become increasingly important in many applications, especially in the entertainment field, e.g., HDTV, digital cinema, mixed reality rendering, image/video editing, and remote sensing. In this section, we introduce the concept of HDR image technology and some tone/inverse tone-mapping techniques.

2.1. HDR images

In the real world, the dynamic range of light perceived by humans can be 14 orders of magnitude [21]. Even with in the same scene, the ratio of the brightest intensity over the darkest intensity perceived by humans is about five orders of magnitude. However, the dynamic range supported by contemporary cameras and display devices is much lower, which explains the visual quality of images containing natural scenes being not always satisfactory.

There are two kinds of HDR images: images rendered by computer graphics and images of real scenes. In this article, we focus on the latter type, which can be captured directly. Such latter type sensors for capturing the HDR image have been developed in recent years, and associated products are now available on the market. HDR images can also be constructed by conventional cameras using several LDR images with varied exposure times [22], as shown in Figure 2. A number of formats can be used to store HDR images, e.g., Radiance RGBE [23], LogLuv TIFF [24], and OpenEXR [25]. Currently, the conventional display and printing devices do not support HDR format, and it is difficult to render such images on these devices. Tone-mapping techniques have been developed to address the problem. We discuss several of those techniques in this article.

Figure 2
figure 2

The generation of HDR images from multiple LDR images [22].

2.2. Tone mapping

Bit truncation is the most intuitive way to transform HDR images into LDR images, but it often results in serious quality degradation. Thus, the key issue addressed by tone-mapping techniques is how to generate LDR images with smooth color transitions in consecutive areas while maintaining the details of the original HDR images as much as possible. Tone-mapping techniques can be categorized into four different types, namely, global operations, local operations, frequency domain operations, and gradient domain operations [21]. Global methods produce LDR images according to some predefined tables or functions based on the HDR images' features, but the methods also generate artifacts. The most significant artifacts result from distortion of the detail of the brightest or the darkest area. Although such artifacts can be resolved by using a local operator, local methods are less popular than global methods due to their high complexity. In contrast, frequency domain operations emphasize compression of the low-frequency content in an image, while gradient domain techniques try to attenuate the pixel intensity of areas with a high spatial gradient. Next, we introduce the tone-mapping algorithm used in our proposed bit-depth scalable coding schemes.

2.2.1. Review of the tone-mapping algorithm presented in [26]

The zone system [27] allows a photographer to use scene measurements to create more realistic photos. We adopt this concept in the tone-mapping technique employed in the proposed bit-depth scalable coding schemes. Usually, photographers use the zone system to map a real scene with a HDR into print zones. In the first step, it is necessary to determine the key of the scene, which indicates whether the scene is bright, normal, or dark. For example, a room that is painted white would have a high key, while a dim room would have a low key. The key can be estimated by calculating the log-average luminance [28] as follows:

(1)

where LHDR(x, y) is the HDR luminance at position (x, y); δ is a small value to avoid singularity in the log computation; and M is the total number of pixels in the image. Then, a scaled luminance value L s (x, y) can be computed as follows:

(2)

where c is a constant value determined by the user. For scenes with a normal key, c is usually set at 0.18 because is mapped to the middle-gray area of the print zone, and it corresponds to 18% reflectance of the print.

After that, a normalized LDR image can be obtained by

(3)

where LWhite represents the smallest luminance mapped to pure white, and the value of LLDR(x, y) is between 0 and 1. The first component on the right-hand side of (3) tries to compress areas of high luminance. Thus, areas with low luminance are scaled linearly, while areas of high luminance are compressed to a larger scale. The second component on the right-hand side of the equation is for linear scaling after considering the normalized maximum-intensity of the HDR image. For further details, readers may refer to [26]. Then, the final LDR image can be generated by mapping LLDR(x, y) into the corresponding value within the LDR. For example, the final LDR image can be easily obtained by

(4)

where NL denotes the bit-depth of the LDR image.

2.3. Inverse tone mapping

In general, HDR images cannot be recovered completely after inverse tone mapping of tone-mapped LDR images. This is because inverse tone mapping is not an exact inverse of tone mapping in the mathematical sense. Consequently, the goal of inverse tone mapping is to minimize the distortion of the reconstructed HDR images after the inverse-mapping process. In [11, 16], those authors propose three simple and intuitive methods for inverse tone mapping, namely, linear scaling, linear interpolation, and look-up table mapping. The look-up table is compiled by minimizing the difference between the original HDR images and the images after tone mapping followed by inverse tone mapping. In addition, some inverse tone-mapping techniques based on scaling and offset are described in [10, 15]. Specifically, HDR images are predicted by the addition of scaled LDR images with a suitable offset. In [29], an invertible tone/inverse tone-mapping pair is proposed. The associated tone-mapping algorithm is based on the μ-Law encoding algorithm [30], and its mathematical inverse form can be derived. However, because of the quantization error generated in the encoding process, it is impossible to reconstruct HDR images perfectly. In this study, we adopt the look-up table-mapping process proposed in [11, 16] for inverse tone mapping.

3. Proposed methods

3.1. The LH scheme

To ensure that the generated bitstream is embedded and be compliant with the H.264/AVC standard, most bit-depth scalable coding schemes employ inter-layer prediction, which uses the low bit-depth layer to predict the high bit-depth layer [15–18]. The proposed LH (low bit-depth to high bit-depth) scheme adopts this idea with several modifications. We explain how it differs from other methods later in the article.

The coding structure of the proposed LH scheme is shown in Figure 3. The low bit-depth input is obtained after tone mapping of the original high bit-depth input and then encoded by H.264/AVC, as shown in the left-hand side of Figure 3. In this way, the generated bit-depth scalable bitstream allows for backward compatibility with H.264/AVC.

Figure 3
figure 3

The coding architecture of the proposed LH scheme.

The right-hand side of Figure 3 shows the coding procedures for the high bit-depth layer. Like the low bit-depth layer, the encoding process is implemented on the MB level, but there are two differences. First, in addition to intra/inter-predictions, the high bit-depth MB level gets another prediction from the corresponding low bit-depth MB by inverse tone mapping of the reconstructed low bit-depth MB. This prediction, which we call intra-prediction from low bit-depth (IPLB), can be regarded as a type of inter-layer prediction and treated as an additional intra-prediction mode with a block size of 16 × 16, which is similar to inter-layer intra-prediction performed in the spatial scalability of the SVC standard. Thus, two kinds of intra-prediction are available in the proposed LH scheme: one explores the spatial redundancy within a frame, while the other tries to remove the redundancy between different bit-depth layers.

Furthermore, to improve the coding efficiency of inter-coding, the residual of the low bit-depth MB is inversely tone mapped and utilized to predict the residual of the high bit-depth MB. The process, called residual prediction can be regarded as another kind of inter-layer prediction and can be realized in two ways. The high bit-depth MB can perform motion estimation and motion compensation before subtracting the predicted residual derived from the low bit-depth layer, or it can subtract the predicted residual before motion estimation and motion compensation, which is similar to inter-layer residual prediction realized in the spatial scalability of the SVC standard. The residual prediction operation can be mathematically repressed as below:

(5)

where FHBD and denote the high bit-depth layer MB and the reconstructed residual of the low bit-depth layer MB, respectively. MEMC stands for the operation of motion estimation, followed by motion compensation, while ITM_R for inverse tone mapping of residual. Both residual prediction methods try to reduce the amount of redundancy in residuals of the low and the high bit-depth layers. Besides, contrary to IPLB mode where the inverse tone mapping used is based on look-up table, the inverse tone-mapping method used for the residual is based on linear scaling and expressed as follows,

(6)

where LBD_residual denotes the residual of the low bit-depth MB; HBD_input and LBD_input stand for the intensities of high bit-depth pixel and of low bit-depth pixel, respectively.

Basically, we utilize both IPLB prediction and residual prediction based on the results of R-D optimization. Note that there are four kinds of prediction in the proposed LH scheme: intra-prediction, inter-prediction, IPLB prediction, and residual prediction, which can be used in two ways. Moreover, residual prediction cooperates with inter-prediction if doing so yields better coding efficiency, while IPLB competes with other types of prediction. If inter-layer prediction (i.e., IPLB or residual prediction) is not used, then the high bit-depth layer is encoded by H.264/AVC. In this case, the coding performance in such scalable coding scheme is the same as that achieved by simulcast. Next, we summarize the features of the proposed LH scheme, which distinguish it from several current approaches.

  1. 1.

    IPLB: Similar to most bit-depth SVC schemes [15–18], the high bit-depth MB can be predicted from the corresponding low bit-depth MB by inverse tone mapping. However, in [16], intra/inter-prediction is not realized in the high bit-depth layer in conjunction with inter-layer prediction.

  2. 2.

    Residual Prediction: Residual Prediction can be applied in two ways, as indicated in Figure 3. The high bit-depth MB can perform motion estimation after subtracting the predicted residual derived from the low bit-depth layer, or it can subtract the predicted residual after motion compensation. Residual prediction is not used in the schemes proposed in [15, 16]. The residual prediction operation described in [17, 18] is performed only after motion compensation in the high bit-depth layer.

  3. 3.

    Motion information: In the proposed LH scheme, both the low and the high bit-depth layers have their own motion information including the MB mode and motion vector (MV). This is contrary to the approach in [15], where the high bit-depth MB uses directly the motion information obtained in the corresponding low bit-depth MB.

3.1.1. Bitstream structure in the LH scheme

In the LH scheme, the bitstream is embedded; hence, a reasonable truncation of the bitstream always ensures successful reconstruction of low bit-depth images. Figure 4 shows a possible arrangement of the LH scheme's bitstream structure where the GOP (group of pictures) size is 2. For the sake of simplicity, P frame contains no intra-MB in Figures 4, 6, and 7, although intra-MBs are allowed in P frames depending on the R-D performance. LBD_I represents the low bit-depth I-frame information; while LBD_Motion_Info and LBD_P denote, respectively, the motion information and all the associated data for the low bit-depth P-frame. The bitstream generated by the LH scheme is backward, compatible with H.264/AVC and can be extended to include higher bit-depth information as an enhancement layer. For example, to reconstruct the high bit-depth frames, we can use the following components: HBD_I, HBD_Motion_Info, and HBD_P, which represent, respectively, the information needed to reconstruct the high bit-depth I-frame, related motion information of P-frame, and the residual needed to reconstruct the P-frame. If the enhancement layer is not available at the decoder, then a rough high bit-depth video sequence may be generated by look-up table mapping. On the other hand, a quality refined high bit-depth video can be reconstructed if the enhancement layer is available.

Figure 4
figure 4

A possible bitstream structure in the proposed LH scheme.

3.2. The HL scheme

In this section, we propose a new scheme called the HL scheme which processes the high bit-depth layer first, and then provides the low bit-depth layer with useful information after suitable processing. The scheme achieves a better R-D performance in some scenarios, for example, if a display device supports the high bit-depth format and the user wants to view only the high bit-depth video content or the user requests both bit-depth versions simultaneously. The HL scheme tries to achieve a good coding performance in such applications. However, if the user only has a display device with low bit-depth, then a truncated bitstream would still guarantee successful reconstruction of a low bit-depth video.

First, we consider I-frame encoding in the proposed HL scheme. The high bit-depth I-frame is H.264/AVC encoded directly. It is not necessary to encode and transmit the corresponding low bit-depth layer, which can be created by tone mapping of the reconstructed high bit-depth I-frame at the decoder. Thus, the bitstream does not reserve a specific space for the low bit-depth I-frame.

For the P-frame, the low bit-depth layer input is obtained by tone mapping of the original high bit-depth input. Note that, in the HL scheme, the high bit-depth layer is processed before the corresponding low bit-depth layer. Every MB in the high bit-depth layer is intra-coded or inter-coded, depending on the optimization of the R-D cost. If the high bit-depth MB is designated as intra-mode, then the remaining coding procedure is exactly the same as that in H.264/AVC. The associated low bit-depth MB can be obtained at the decoder after tone mapping of the reconstructed high bit-depth MB using the procedures adopted for I-frames. On the other hand, if the high bit-depth MB is designated as inter-mode, then the subsequent coding procedures are different from those in H.264/AVC inter-coding. Figure 5 illustrates the encoding architecture for the inter-MB in the HL scheme. The encoding process can be summarized by three steps:

Figure 5
figure 5

The coding architecture for inter-MBs in the proposed HL scheme.

Step 1: After performing motion estimation (ME) and deciding the mode for the high bit-depth MB, the derived motion information, which contains the MV and MB modes of the high bit-depth MB, is transferred to the low bit-depth layer and utilized by the corresponding low bit-depth MB.

Step 2: After performing motion compensation (MC), the residual of the high bit-depth MB is tone mapped, followed by discrete cosine transform (DCT), quantization, and entropy encoding. Then, it becomes part of the embedded bitstream of the corresponding low bit-depth MB. As a result, the decoder can reconstruct the low bit-depth MB directly using the motion information of the high bit-depth MB to perform motion compensation, followed by a summation with the decoded residual.

The tone mapping for the residual is different from those used in textures. The tone-mapping method adopted for residual data is based on linear scaling and expressed as follows:

(7)
(8)

where TM_R and ITM denote the tone mapping for residual data and inverse tone mapping for textures, respectively. LBD_MC stands for the low bit-depth pixel intensity after performing motion compensation using the MV derived in the high bit-depth layer MB.

Step 3: The reconstructed residual of the low bit-depth MB is converted back to the high bit-depth layer by inverse tone mapping, similar to that performed in the LH scheme. Then, only the difference between the residual of the high bit-depth MB and the residual predicted from the low bit-depth MB is encoded, under which situation, a better R-D performance is achieved in this way.

From the description above, the features of the HL scheme can be summarized as follows:

  1. 1.

    The low bit-depth I-frame is not transmitted and can be generated at the decoder by tone mapping of the reconstructed high bit-depth layer I-frame.

  2. 2.

    Two kinds of inter-layer prediction are employed for inter-coding in the HL scheme.

  3. a.

    The first kind of inter-layer prediction is from the high bit-depth layer to the low bit-depth layer, where the motion information derived in the high bit-depth layer is shared by the low bit-depth layer. Moreover, the residual of the high bit-depth layer is tone mapped to be the residual of the low bit-depth layer.

  4. b.

    The second kind of inter-layer prediction is from the low bit-depth layer to the high bit-depth layer, where the quantized residual of the low bit-depth layer can be used for predicting the residual of the high bit-depth layer. It is called residual prediction in the HL scheme.

3.2.1. Bitstream structure in the HL scheme

The bitstream in the HL scheme is different from that in the LH scheme, as shown in Figure 6, where the GOP size is 2. The base layer consists of three components. It starts by filling up information about the high bit-depth I-frame, denoted as HBD_I, followed by information about the P-frame for both the high bit-depth and low bit-depth layers. The low bit-depth MB and the corresponding high bit-depth MB are reconstructed using the same MV and MB modes, denoted as HBD_Motion_Info. The residual of the high bit-depth layer is tone mapped to the low bit-depth layer. After transformation, quantization- and entropy-encoding operations, it will form LBD_P. HBD_P denotes the residual data used for reconstructing the high bit-depth layer. Obviously, the entire encoded HL bitstream is smaller than the bitstream in the LH scheme because of the absence of low bit-depth intra-coded MBs and because both bit-depth layers share motion information for inter-coded MBs.

Figure 6
figure 6

A possible bitstream structure in the proposed HL scheme.

Note that, although motion estimation is only performed in the high bit-depth layer, the low bit-depth layer in the HL schemes uses this motion information, as well as the residual of the high bit-depth layer for reconstruction. The motion information is put into the base layer bitstream, instead of into the enhancement layer bitstream. Moreover, the residual data in the base layer comes from the tone mapping of the residual of the high bit-depth layer. After transformation, quantization and entropy coding, this residual is also put into the base layer bitstream. Thus, there is no drift issue in the HL schemes due to the embedded bitstream structures.

3.3. Combined LH-HL scheme

As mentioned earlier, for I-frames, the bitstream of the HL scheme only contains high bit-depth information. Intuitively, this will result in bandwidth inefficiency if the receiver uses a low bit-depth display device, especially in the case where a small GOP size is adopted and the data in the I-frames dominate the bitstream. To improve the coding efficiency in such situations, we combine the HL scheme with the LH scheme to form a hybrid LH-HL scheme in which the intra-MBs and inter-MBs are encoded by the LH scheme and the HL scheme, respectively. It means that intra-mode-encoding path in the LH scheme and inter-mode-encoding path in the HL scheme are combined in the LH-HL scheme. For every high bit-depth MB in the LH-HL scheme, either intra-mode or inter-mode is chosen by comparing the R-D cost. It means that the R-D cost of intra-coding by the LH scheme and the R-D cost of inter-coding by the HL scheme will be compared. If the R-D cost of intra-coding by the LH scheme is smaller, then this MB is encoded as intra-mode; otherwise, it is inter-mode and encoded by the HL scheme. The combined LH-HL scheme tries to improve the coding performance of the HL scheme in the above situation.

3.3.1. Bitstream structural in the LH-HL scheme

Figure 7 shows a possible bitstream structure of the combined LH-HL scheme, where the GOP size is 2. For each GOP in the base layer, three components provide the information used for reconstructing the low bit-depth layer, i.e., LBD_I for low bit-depth I-frames, HBD_Motion_Info and LBD_P for the low bit-depth P-frame. Besides, HBD_I and HBD_P are used to ensure the reconstruction of the high bit-depth I- and P-frames, respectively.

Figure 7
figure 7

A possible bitstream structure in the proposed LH-HL scheme.

Note that, the LH-HL scheme is H.264/AVC compatible. First, intra-MB coding in LH-HL scheme is exactly the same as that in LH scheme. For inter-MB in P frame, the MV obtained in the high bit-depth layer MB is used by the low bit-depth layer directly and put into the base layer bitstream. Moreover, the residual data in the base layer comes from the tone mapping of the residual of the high bit-depth layer. After transformation, quantization, and entropy coding, this residual is also put into the base layer bitstream. In this way, the generated bit-depth scalable bitstream of the LH-HL scheme allows backward compatibility with H.264/AVC, and there is no drift issue involved.

3.4. Comparison of three proposed schemes

In Table 1, we compare the coding strategies of the three proposed schemes for the low bit-depth layer and the high bit-depth layer, denoted as LBD and HBD, respectively. Here, intra-coding and inter-coding operations are the same as those defined in H.264/AVC; that is, intra-coding and inter-coding include intra-prediction and inter-prediction, respectively, followed by DCT, quantization, and entropy coding. Note that, for the high bit-depth layer, residual prediction in the LH scheme can be used either before or after motion estimation. On the other hand, in the HL scheme, residual prediction can only be used after motion estimation and motion compensation. Moreover, HBD-based inter-coding requires that the residual of the high bit-depth MB is tone mapped, followed by DCT, quantization, and entropy coding before it can become part of the embedded bitstream of the low bit-depth MB; and no motion estimation is executed in the low bit-depth layer. Then, the reconstruction of the low bit-depth layer is realized by using the MV of the high bit-depth layer to find the referenced block in the previously reconstructed low bit-depth frame, in conjunction with the decoded residual.

Table 1 Comparison of the coding strategies of the proposed schemes

Table 2 summarizes the inter-coding complexity of the proposed three schemes. Compared to [15], the high bit-depth MB in the LH scheme needs higher computation complexity due to multi-loop MC, once IPLB mode is chosen. In the HL and the LH-HL schemes, the low bit-depth layer needs no motion estimation because a shared MV is provided by the high bit-depth layer. Moreover, there is no multi-loop MC issue in the high bit-depth layer.

Table 2 Comparison of the inter-coding complexity of the proposed schemes

4. Experimental results

We extend H.264/AVC baseline profile to complete the proposed bit-depth scalable video-coding scheme. The used reference software is JM 9.3, which supports 12-bit video input. To evaluate the performance of the proposed algorithms, two 12-bit (high bit-depth) test sequences, "Sunrise" (960 × 540) and "Library" (900 × 540), provided in [31] are used in the simulation. Both sequences have low camera motion, and the color format is 4:2:0. In our systems, the low bit-depth input is 8 bits for each color channel, and the high bit-depth input is 12 bits. The frame rate of both sequences is 30 Hz, and the 8-bit representations are acquired by tone mapping of the original 12-bit sequences. We employ the tone-mapping method in [26], and use look-up table mapping [11, 16] to realize the inverse tone mapping. Note that the tone and inverse-tone mapping techniques used in this article are the same for all the schemes. Thus, we can avoid the influence of different techniques on the coding efficiency. Both the high and low bit-depth layers use the same quantization parameter (QP) settings, so no extra QP scaling is needed to encode the high bit-depth layer. Moreover, GOPs containing 1, 4, 8, and 16 pictures are used for differentiating the coding efficiency of I-frames and P-frames in proposed coding schemes.

4.1. Intra-coding performance (GOP = 1)

The R-D performance of the proposed algorithm is shown in Figures 8 and 9 when the GOP size is 1. The PSNR is calculated as follows:

Figure 8
figure 8

Performance comparison for "12-bit Sunrise" (GOP = 1).

Figure 9
figure 9

Performance comparison for "12-bit Library" (GOP = 1).

(9)

where N is the bit-depth, and MSE denotes the mean squared error between the reconstructed and the original images. The performances of 12-bit single-layer and simulcast codings are also compared. In this case, the HL scheme is equivalent to single-layer coding; and the combined LH-HL scheme is the same as the LH scheme as well as the approach in [15].

Figures 8 and 9 show that the HL and the LH schemes achieve better coding efficiency than the simulcast scheme. Specifically, the HL scheme achieves up to 7 dB improvement over the simulcast scheme in the high bit-rate scenario. Table 3 summarizes the percentages of IPLB mode employed in I-frames for the LH scheme. The table shows that the percentages of IPLB mode increase, as the QP value decreases. This indicates that high bit-depth intra-MBs are likely to be predicted from their low bit-depth versions, instead of by conventional intra-prediction, if the corresponding low bit-depth MB is reconstructed well. As a result, the generated bitrate can be reduced.

Table 3 Percentages of IPLB mode employed in I-frames in the LH scheme

4.2. Coding performance when GOP = 4, 8, and 16

Next, we consider the coding performance of the proposed schemes when GOP is 4, 8, and 16. Figures 10 and 11 compare the performances of the schemes for sequences "Sunrise" and "Library," respectively. The results demonstrate that the three proposed schemes outperform the simulcast scheme. It is also clear that the HL scheme outperforms the LH scheme, the combined LH-HL scheme, as well as the approach proposed in [15] by approximately 2 dB. Tables 4 and 5 detail the statistical distributions of the inter-layer mode chosen for MBs in the high bit-depth layer in the LH scheme and the HL scheme, respectively. Note that, for the HL scheme, only the inter-frame is considered for the statistics in Table 5 because of no coding of low bit-depth I-frame. For the LH scheme, the statistics in Table 4 includes both I-frames and P-frames. For the LH scheme, the high bit-depth MB can be predicted from the associated low bit-depth MB in two ways: (1) by IPLB prediction, where the high bit-depth MB texture is predicted by inverse tone mapping of the reconstructed low bit-depth MB or (2) by residual prediction, where the residual of the high bit-depth MB is predicted from the residual of the low bit-depth MB. Obviously, the probability of adopting residual prediction is higher in the HL scheme than in the LH scheme. After analyzing the coding architecture of the three schemes, as well as the statistics in Tables 4 and 5, we observe that two factors are responsible for the superior performance of the HL scheme. First, the HL scheme does not need to transmit the low bit-depth intra-MB, and the motion information set is shared by both layers. Second, residual prediction from the high bit-depth layer to the low bit-depth layer is efficient and reliable.

Figure 10
figure 10

Performance comparison for "12-bit Sunrise": (a) GOP = 4, (b) GOP = 8, and (c) GOP = 16.

Figure 11
figure 11

Performance comparison for "12-bit Library": (a) GOP = 4, (b) GOP = 8, and (c) GOP = 16.

Table 4 Percentages of inter-layer prediction employed by high bit-depth layer MBs in the LH scheme
Table 5 Percentages of inter-layer prediction employed by high bit-depth layer MBs in the HL scheme

As mentioned in Section 3, the proposed residual prediction operation in the LH scheme can be applied in two ways. Table 6 summarizes the statistical distribution of the predictions derived by the two methods. In the table, residual prediction_1 means that the residual from the low bit-depth layer is used to predict the residual of the high bit-depth layer after motion estimation and compensation. Residual prediction_2 means that the high bit-depth layer MB performs motion estimation and compensation after subtracting the residual predicted by the low bit-depth layer from the original texture. As indicated in Table 6, residual prediction_1 is more likely to be used in the high bit-depth layer. Furthermore, it seems that residual prediction_2 can be removed to reduce the coding complexity in the high bit-depth layer without significant performance loss.

Table 6 Percentages of residual prediction used for high bit-depth inter-MBs in the LH scheme

4.3 Coding performance of modified LH schemes

4.3.1. Modified LH scheme with shared MV

Contrary to the approach in [15] where motion information in the low bit-depth layer is shared by MBs of both bit-depth layers, the low bit-depth and the high bit-depth layers in the LH scheme have their own motion information. We know that if high bit-depth layer uses directly the motion information provided by the low bit-depth layer, the data of header can be reduced because no motion information is embedded. However, the data of residual may be increased due to inaccurate MV. To verify the gain brought by separate motion information, Table 7 lists the rate distortion performance in terms of Bjontegaard delta bitrate (BDBR) and Bjontegaard delta PSNR (BDPSNR) [32] for the modified LH scheme where motion information of the low bit-depth layer is shared by the high bit-depth layer, with respect to the original LH scheme. Moreover, the comparison between the method in [15] and the LH is also expressed in terms of Bjontegaard metric, as shown in Table 8.

Table 7 Performance for the modified LH scheme (shared MV of LBD) with respect to the LH scheme
Table 8 Performance for the method in [15] with respect to the LH scheme

On the other hand, we also conduct a modified LH scheme where the motion information of the high bit-depth layer is shared with the low bit-depth layer, and the performance is presented in Table 9. This reveals that the modified LH scheme with shared MV from HBD performs worse than the original LH scheme. In fact, the residual data for the low bit-depth layer have been much increased in this modified scheme because of inaccurate MV. From Tables 7, 8, 9 and 10, we can conclude that the LH scheme outperforms the approach in [15] because of two factors: 1) in addition to IPLB mode, Residual Prediction is employed in the high bit-depth layer, and 2) individual motion estimation specified for each bit-depth layer is used.

Table 9 Performance for the modified LH scheme (shared MV of HBD) with respect to the LH scheme
Table 10 Performance for the modified LH scheme (PMV from LBD) with respect to the LH scheme

4.3.2. Modified LH scheme with PMV from LBD

To exploit the correlation of the MV in the high bit-depth and the low bit-depth layers, we conduct another experiment where the MV of the low bit-depth MB is served as the predicted motion vector (PMV) of the corresponding high bit-depth MB. Table 10 lists the rate distortion performance in terms of Bjontegaard delta bitrate (BDBR) and Bjontegaard delta PSNR (BDPSNR) [32] for this modified LH scheme, with respect to the original LH scheme. It seems that this new scheme has similar R-D performance as that in the original LH scheme.

4.3.3. Modified LH scheme with single-loop MC

To avoid multi-loop motion compensation, we modify the LH scheme to make IPLB mode applicable only for those high bit-depth MBs with intra-coded low bit-depth MBs, such that the single-loop motion compensation is achievable. The performances of the modified scheme are shown in Table 11. As indicated in this table, the PSNR loss under single-loop MC constraint is in the range of 0.54-0.76 dB.

Table 11 Performance for the modified LH scheme (single-loop MC) with respect to the LH scheme

4.4. Coding performance when the QPs used in both layers are different

In H.264/AVC standard, an additional QP scalar is adopted to modify the QP for inputs with bit-depth larger than 8 bit. The purpose is to constrain the bitstream size. The adjusted QP is expressed as

(10)

where input QP stands for the initial QP given by user. In this case, the QP value for high bit-depth layer is different from that used in the low bit-depth layer.

We conduct another experiment to verify the coding efficiency of the scheme where the QP value used in the high bit-depth layer follows the rule expressed in Equation 10. Figures 12 and 13 present the coding performances when QP scaling is carried out for GOP = 8 and GOP = 16, respectively. These two figures indicate that all the three schemes with QP scaling perform worse than those under the same QP setting. Moreover, the PSNR loss in the HL and the LH-HL schemes with QP scaling are more serious compared to that in the LH scheme.

Figure 12
figure 12

Performance comparison for the proposed schemes with QP scaling (Sunrise, GOP = 8).

Figure 13
figure 13

Performance comparison for the proposed schemes with QP scaling (Sunrise, GOP = 16).

Intuitively, a larger QP corresponds to a worse image quality. Thus, compared with the same QP setting, the prediction from the high bit-depth layer would become less reliable for the low bit-depth layer, and the coding efficiency will be degraded in the HL scheme. Moreover, in the scheme with QP scaling, although the high bit-depth layer can be predicted from a low bit-depth layer with higher reconstructed quality (due to a smaller QP) and results in a better coding efficiency in the high bit-depth layer, the bitrate consumption in the low bit-depth layer is higher than that for the scheme with the same QP setting. It indicates that the bitrate overhead is larger than the benefit brought by a more precise prediction source in the low bit-depth layer.

4.5. Coding performance of low bit-depth video

Figures 14a and 15a show the performance of low bit-depth representation for sequence "Sunrise" when the GOP sizes are 4 and 16, respectively, where the single-layer coding for an 8-bit sequence is equivalent to the proposed LH scheme. The figures show that the LH-HL scheme outperforms the other two schemes under most bitrates, because the LH-HL and the LH schemes adopt the same intra-coding method; hence, the figures demonstrate that the inter-coding in the LH-HL scheme achieves better R-D performance than that in the LH scheme.

Figure 14
figure 14

Performance comparison for "8-bit Sunrise" (GOP = 4): (a) with bitstream truncation and (b) without bitstream truncation.

Figure 15
figure 15

Performance comparison for "8-bit Sunrise" (GOP = 16): (a) with bitstream truncation and (b) without bitstream truncation.

We know that coding efficiency depends mainly on the data amount of residual after motion compensation. For the inter-coding of the LH-HL scheme, the motion information derived from the high bit-depth layer is shared by the low bit-depth layer. Figures 14a and 15a indicate that the shared MV from the high bit-depth layer, in conjunction with the tone-mapped residual from the high bit-depth layer results in a better reconstructed inter-MB in the LH-HL scheme, compared to that in the LH scheme. Besides, a primary reason accounts for the superiority of the HL scheme over the LH scheme at moderate-to-high bitrate: better reconstructed low bit-depth intra-frames are offered. Table 12 illustrates the PSNR of the low bit-depth intra-frame for the HL and the LH schemes; it implies that the HL scheme offers better low bit-depth I-frames, which echoes the statement described above. Figure 16 presents the PSNR over a number of frames for both bit-depth layers in the HL scheme, when GOP size is 16 and QP is 32.

Table 12 PSNRs (dB) of intra-frames for the HL scheme and the LH scheme
Figure 16
figure 16

PSNR of each frame in the proposed HL schemes for "Sunrise".

We are also interested in the performance of low bit-depth representation when the entire bitstream is received perfectly. Figures 14b and 15b show the performances when GOP sizes are 4 and 16, respectively. We can see that the PSNRs for the 8-bit video are the same in the two subfigures in Figures 14 and 15, and the bitrate in subfigure (a) is much lower than that in subfigure (b) because only the bitrate of the low bit-depth layer is counted.

The HL scheme outperforms the LH scheme up to 6.2 dB and 4.5 dB in Figures 14b and 15b, respectively. Thus, we conclude that if the whole bitstream can be delivered successfully without any truncation, then the HL scheme can provide both high bit-depth images and low bit-depth images with better quality.

5. Conclusion

We have proposed three H.264/AVC-based bit-depth scalable video-coding schemes. The LH scheme is similar to most existing approaches because the high bit-depth layer is encoded by considering the inter-layer prediction of the corresponding low bit-depth layer. The scheme provides an embedded encoding architecture that is fully backward compatible with H.264/AVC. On other hand, the proposed HL scheme yields better coding efficiency in the specified applications where only the high bit-depth layer or both layers are requested in the destination. The inter-layer prediction adopted in the HL scheme can be directed from the high bit-depth layer to the low bit-depth layer, as well as vice versa. To resolve the backward compatibility problem in the HL scheme, we propose a combined LH-HL scheme in which the LH scheme complements the HL scheme. Our experimental results demonstrate the efficacy of the proposed algorithms. In particular, the HL scheme achieves the best R-D performance if the decoder requests high bit-depth content. We have proved that the proposed HL scheme is effective, when the high bit-depth layer is processed first. Then, the low bit-depth layer can be encoded by considering certain information, such as the MV and the residual, provided by the high bit-depth layer. In addition, the combined LH-HL scheme outperforms the LH scheme in all the simulations, and these two schemes differ in the method of inter-MB encoding. From the results, we conclude that the information in the high bit-depth layer can be exploited to remove redundancy in both the low and high bit-depth layers, and better R-D performance can be ensured in this way.

Abbreviations

BDBR:

Bjontegaard delta bitrate

BDPSNR:

Bjontegaard delta PSNR

CGS:

coarse grain scalability

DCT:

discrete cosine transform

GOP:

group of pictures

HBD:

high bit-depth

HDR:

high-dynamic range

HDTV:

high-definition television

HL scheme:

high bit-depth to low bit-depth

IPLB:

intra-prediction from low bit-depth

ITM_R:

inverse tone mapping of residual

JVT:

joint video team

LBD:

low bit-depth

LDR:

low-dynamic range

LH scheme:

low bit-depth to high bit-depth

MB:

macroblock

MC:

motion compensation

ME:

motion estimation

MEMC:

operation of motion estimation: followed by motion compensation

MGS:

medium grain scalability

MSE:

mean squared error

MV:

motion vector

PMV:

predicted motion vector

PSNR:

peak signal-to-noise ratio

QP:

quantization parameter

R-D:

rate-distortion

SVC:

scalable video coding.

References

  1. Reichel J, Schwarz H, Wien M, Eds: Scalable video coding-joint draft 9. Joint Video Team, Doc JVT-V201, Marrakech, Morocco 2007.

  2. Vieron J, Wien M, Schwar H: Draft reference software for SVC. Joint Video Team, Doc JVT-AC203, Busan, Korea 2008.

    Google Scholar 

  3. Schwarz H, Marpe D, Wiegand T: Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Trans Circ Syst Video Technol 2007,17(9):1103-1120.

    Article  Google Scholar 

  4. Wiegand T, Sullivan G, Bjontegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Trans Circ Syst Video Technol 2003,13(7):560-576.

    Article  Google Scholar 

  5. Segall A: On the requirement for bit-depth and chroma format scalability. Joint Video Team, Doc JVT-Z036, Antalya, Turkey 2008.

    Google Scholar 

  6. Gao Y, Wu Y: Applications and requirement for color bit depth scalability. Joint Video Team, Doc JVT-U049, Hangzhou, China 2006.

    Google Scholar 

  7. Sullivan G, Luthra A, Wiegand T: Call for proposals for extended sample bit depth and chroma format support in the advanced video coding standard. Joint Video Team, Doc JVT-G048, Pattaya II, Thailand 2003.

    Google Scholar 

  8. Ward G, Simmons M: JPEG-HDR: a backward-compatible, high dynamic range extension to JPEG. Proceedings of the 13th Color Imaging Conference 2005.

    Google Scholar 

  9. Okuda M, Adami N: Two-layer coding algorithm for high dynamic range images based on luminance compensation. J Vis Commun Image R 2007, 17: 377-386.

    Article  Google Scholar 

  10. Liu S, Vetro A, Kim WS: Inter-layer prediction for SVC bit-depth scalability. Joint Video Team, Doc JVT-X075, Geneva, Switzerland 2007.

    Google Scholar 

  11. Winken M, Schwarz H, Marpe D, Wiegand T: SVC bit depth scalability. Joint Video Team, Doc JVT-V078, Marrakech, Morocco 2007.

    Google Scholar 

  12. Segall A, Su Y: System for bit-depth scalable coding. Joint Video Team, Doc JVT-W113. San Jose, California, USA 2007.

    Google Scholar 

  13. Ye Y, Chung H, Karczewicz M, Chong IS: Improvement to bit depth scalability coding. Joint Video Team, Doc JVT-Y048, Shenzhen, China 2007.

    Google Scholar 

  14. Yu Y, Gordon S, Yang M: Improving compression performance in bit depth SVC with a prediction filter. Joint Video Team, Doc JVT-Z045, Antalya, Turkey 2008.

    Google Scholar 

  15. Segall A: Scalable coding of high dynamic range video. Proceedings of IEEE International Conference On Image Processing, San Antonio, USA 2007, 1-4. {Au Query: Please provide the venue of the proceedings for this reference}

    Google Scholar 

  16. Winken M, Marpe D, Schwarz H, Wiegand T: Bit-depth scalable video coding. Proceedings of IEEE International Conference on Image Processing, San Antonio, USA 2007, 5-8. {Au Query: Please provide the venue of the proceedings for this reference}

    Google Scholar 

  17. Wu Y, Gao Y, Chen Y: Bit depth scalable coding. Proceedings of IEEE International Conference on Multimedia and Expo., Beijing, China 2007, 1139-1142. {Au Query: Please provide the venue of the proceedings for this reference}

    Google Scholar 

  18. Wu Y, Gao Y, Chen Y: Bit-depth scalable coding based on macroblock level inter-layer prediction. Proceedings of IEEE Symposium Conference on Circuits and Systems, Seattle, USA 2008, 3442-3445. {Au Query: Please provide the venue of the proceedings for this reference}

    Google Scholar 

  19. Mantiuk R, Efremov A, Myszkowski K, Seidel HP: Backward compatible high dynamic range MPEG video compression. Proceedings of ACM SIGGRAPH, Boston, USA 2006, 713-723. {Au Query: Please provide the venue of the proceedings for this reference}

    Google Scholar 

  20. Chiang JC, Kuo WT: Bit-depth scalable video coding using inter-layer prediction from high bit-depth layer. Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan 2009, 649-652. {Au Query: Please provide the venue of the proceedings for this reference}

    Google Scholar 

  21. Reinhard E, Pattanaik S, Ward G, Debevec P: High Dynamic Range Imaging: Acquisition, Display, And Image-Based Lighting. Morgan Kaufmann, San Francisco, CA; 2006. {Au Query: Publisher location has been added to this reference as per Google searches. Please check and confirm whether it is correct.} Confirmed

    Google Scholar 

  22. Debevec P, Malik J: Recovering high dynamic range radiance maps from photographs. Proceedings of ACM SIGGRAPH, Los Angeles, USA 1997, 369-378. {Au Query: Please provide the venue of the proceedings for this reference}

    Google Scholar 

  23. Ward G: Real Pixels. In Graphic Gems II. Edited by: Arvo J. Academic Press, San Diego, CA; 1991:80-83. {Au Query: Publisher location has been added to this reference as per Google searches. Please check and confirm whether it is correct.} Confirmed

    Chapter  Google Scholar 

  24. Ward G: The LogLuv encoding for full gamut, high dynamic range images. JGT 1998,3(1):15-31.

    Google Scholar 

  25. Kainz F, Bogart R, Hess D: The OpenEXR image file format. SIGGRAPH Technical Sketches 2003. [http://www.openexr.com]

    Google Scholar 

  26. Reinhard E, Stark M, Shirley P, Ferwerda J: Photographic tone reproduction for digital images. ACM T Graphic 2002,23(3):267-276.

    Google Scholar 

  27. Adams A: The Print: The Ansel Adams Photography Series. Little, Brown and Company, New York, USA; 1983.

    Google Scholar 

  28. Reinhard E: Parameter estimation for photographic tone reproduction. JGT 2003,7(1):45-51.

    Google Scholar 

  29. Sugiyama N, Kaida H, Xue X, Jinno T, Adami N, Okuda M: HDR image compression using optimized tone mapping model. Proceedings of IEEE International Conference on Acoustic, Speech And Signal Processing, Taipei, Taiwan 2009, 1001-1004.

    Google Scholar 

  30. Smith B: Instantaneous companding of quantized signals. Bell Syst Tech J 1957, 36: 653-709.

    Article  Google Scholar 

  31. Segall A: Donation of tone mapped image sequences. Joint Video Team, Doc. JVT-Y072, Shenzhen, China 2007.

    Google Scholar 

  32. Bjontegaard G: Calculation of average PSNR difference between RD-curves. document VCEG-M33.doc, ITU-T SG16/Q.6, Austin, TX 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jui-Chiu Chiang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Chiang, JC., Kuo, WT. & Kao, PH. Bit-depth scalable video coding with new inter-layer prediction. EURASIP J. Adv. Signal Process. 2011, 23 (2011). https://doi.org/10.1186/1687-6180-2011-23

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-6180-2011-23

Keywords