Bitrate control using a heuristic spatial resolution adjustment for a real-time H.264/AVC encoder

Rhee, Chae Eun; Kim, Jin-Sung; Lee, Hyuk-Jae

doi:10.1186/1687-6180-2012-87

Research
Open access
Published: 23 April 2012

Bitrate control using a heuristic spatial resolution adjustment for a real-time H.264/AVC encoder

Chae Eun Rhee¹,
Jin-Sung Kim² &
Hyuk-Jae Lee¹

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 87 (2012) Cite this article

3561 Accesses
2 Citations
Metrics details

Abstract

Conventional bitrate control algorithms that change only the quantization parameter (QP) often suffer from quality degradation when the target bitrate is very low. Therefore, rate control algorithms that adjust spatial resolution in addition to QP control have recently been proposed, but their computations are too complex to be processed in real time. This research proposes a very simple, but effective, rate control algorithm that employs spatial resolution control as well as the existing QP-based bitrate control. The spatial resolution ratio for the best peak signal-to-noise ratio (PSNR) is calculated using a simple estimation model which defines the relationship between the PSNR and the spatial resolution at very low bitrate compression. In the proposed bitrate control algorithm, two scalability tools for adjusting the QP and the spatial resolution ratio are used sequentially to reach the target PSNR and the control decision is made for a group of pictures. Experimental results show that the proposed bitrate control algorithm approximates an optimal solution and yields a better subjective quality as well as objective quality at various bitrates compared to the conventional QP-based bitrate control algorithm. The decision of the control parameters requires very small computational complexity and is made in a completely automatic manner so that the proposed algorithm is well suited for real-time applications.

1. Introduction

H.264/AVC standard is widely used in video streaming, video communication, and various mobile video applications due to its high compression efficiency through the use of many advanced tools. In recent video applications, the video resolution tends to be larger and, thus, the bandwidth requirement for video transmission also increases. As a result, the importance of the bitrate control is growing as it is necessary to regulate the bitrate of a video stream in order to achieve the target bitrate. In bitrate control algorithms, the target bits are allocated at the frame-level or the macroblock-level by considering the fullness of the output buffer and the encoding complexity. While achieving the allocated bit budget, rate control attempts to make quality compromises in ways that would have a minimum degradation on perceived through controlling quantization parameter (QP) values. When the QP is small, much of the information is preserved and when the QP is large, much of the information is discarded to reduce the bitrate at the cost of an increased distortion.

To achieve a target bitrate in real constraint situations, such as within a specific channel bandwidth or within defined encoder and decoder buffer sizes, QP values must vary dynamically based on the complexity of the input video and the current bitrate. QP values are determined using a rate-quantization (R-Q) model where the generated bitrate is modeled as a function of the QP and the complexity of the residual signal such as the mean absolute difference (MAD). In H.264/AVC, QPs are used not only in rate control, but also in rate distortion optimization (RDO). However, the famous "chicken-egg" dilemma complicates the selection of the QP value. The function for the RDO needs a pre-determined QP for its lambda factor. However, the QP can be determined by the MAD, which is only available after the RDO is obtained. To resolve that dilemma, the MAD is predicted from the previous frame using a linear function [1]. The estimated MAD from the previous frame is often different from the actual MAD in the current frame. Thus, an inadequate QP may be selected. For improved coding efficiency, enhanced distortion models [2–6] are developed and rate-distortion (R-D) model is optimized or contents-aware bit allocation is proposed [5–10]. A number of recent approaches incorporate characteristics of the human visual system (HVS) into bitrate control. Moreover, there are several reports on region-of-interest (ROI)-based bit allocation [11–13]; such approaches can potentially improve the perceived visual quality of images. In addition to ROI-based methods, a new R-D model, along with frame skipping and bit-allocation schemes, using various perceptual metrics that are based on the characteristics of HVS, is proposed [14–19]. However, the conventional bitrate control algorithms based on QP control often suffer from perceptible image quality degradations, such as blocking, ringing, or texture-deviation artifacts, when the target bitrate is very low. Furthermore, when the target bitrate is not satisfied, even with the maximum QP value, the conventional rate control cannot avoid a sudden frame drop which results in video quality degradation.

This article proposes a simple but effective bitrate control algorithm that applies spatial resolution controls to the conventional QP-based bitrate control. In the proposed algorithm, a new model that represents the relationship between the spatial resolution and the peak signal-to-noise ratio (PSNR) for low bitrate coding is proposed. By using the proposed model, the spatial resolution that gives the highest PSNR at a given bitrate is estimated. In the proposed model, the computational complexity is very low as it only requires a small number of parameters and the value of parameters is obtained in a heuristic manner. The two scalability tools, the QP and the spatial resolution, are processed sequentially to achieve the target PSNR. The proposed spatial resolution adjustment which is applied in the group of pictures (GOP)-level is used as a coarse-grain bitrate control. Inside a GOP, QP is changed by a conventional bitrate control to meet the allocated bits in a fine-grain manner. Thus, the target bitrate is satisfied with a combination of two control methods. To estimate the perceptual quality of encoded video sequences with reduced spatial resolution, Video Quality Metric (VQM) software [20] is used to measure the subjective quality in addition to the PSNR which measures the objective quality. The VQM computes the perceptual effects of video impairments including blurring, jerky/unnatural motion, global noise, block distortion and color distortion, and combines them into a single metric. Experimental results show that proposed bitrate control scheme outperforms the conventional QP-based bitrate control algorithm at a variety of bitrates. At a low bitrate, the PSNR and VQM values with the proposed spatial resolution control scheme are improved up to 1.85 and 5.15 dB, respectively, when compared to that with the conventional QP-only control. There is only a small difference between the real optimal spatial resolution and the spatial resolution obtained using the proposed scheme.

This article is organized as follows. In Section 2, background is presented and Section 3 explains the proposed bitrate control algorithm. Experimental results are presented in Section 4 and conclusions are given in Section 5.

2. Background

2.1. Previous study on spatial and temporal resolution controls

To improve flexibility in bit allocation, some rate control algorithms adjust frame rate or spatial resolution. When the frame rate decreases, additional bits can be allocated to each frame and frame image quality can be improved. However, frame skipping should be done very carefully because motion artifacts such as flickering or motion jerkiness may degrade subjective video quality. In [21, 22], the decision for the frame skip is based on buffer fullness and the spatial and temporal quality of the video. In [23], the similarity between successive frames measured by the PSNR is used to skip frames adaptively. To optimize coding performance, frames to be skipped are determined based on an R-D model [24, 25], but these works cannot be used in real-time applications. In [26–28], motion artifacts are reduced by adjusting the frame rate gradually based on the motion activity of the previous sub-GOP which is expressed by the histogram of difference image, thereby preserving motion smoothness. Even though a number of previous works have contributed to frame rate controls, the effect of providing bitrate control through frame rate adjustment is somewhat limited because, to avoid motion artifacts, the frame rate cannot be reduced below a certain value. In addition, temporal scalability may not be very effective for increasing subjective quality by temporal to spatial bitrate exchange. This is because the quality degradation due to dropping a frame is easily perceived, especially in low frame rate communications such as that used in two-way multimedia communication for mobile devices [29].

Spatial resolution control is another approach used for bit allocation when the target bitrate is very low. The dynamic resolution conversion (DRC) mode, which is supported in the advanced simple profile of MPEG-4, enables the video object plane to be encoded with reduced spatial resolution [30]. Similarly, a reduced resolution update (RRU) coding tool is adopted in Annex Q of the H.263 standard [31]. The RRU reduces the bitrate by coding the prediction error residuals at a reduced spatial resolution. However, the DRC and RRU techniques are not included in the H.264/AVC standard. Meanwhile, a number of previous studies have reported a relationship between down-sampling and video quality at a low bitrate. In [32], the optimum down-sampling ratio is determined according to the bitrate. In [33], it is reported that a down-sampled video, prior to compression and later up-sampled, visually outperforms that video compressed directly at high resolution with the same number of bits, when the target bitrate is very low. With this observation, a method to find the optimal down-sampling rate is suggested. These schemes are exploited by JPEG image compression standard not by video compression standards [29]. In [34, 35], discrete cosine transform (DCT) coefficients are decimated prior to quantization to reduce spatial resolution, but modifying the coding loop loses the conformation to the syntax compatibility for the video coding standard. In video transcoders, spatial resolution control has been an important factor in meeting a different target bitrate [36]. However, many previous works have focused on simplifying the computations in transcoding. In [37], the linear R-Q model is proposed to select the proper frame size, but it is not applicable to practical applications. Recently, in [38], the overall distortion is analyzed and the optimal spatial resolution is derived for a given bitrate. In [39], the spatial resolution ratio is appropriately selected, according to picture quality, bit rate, and power consumption. As shown in the above-mentioned works, in order to accomplish bitrate control by spatial resolution control, it is very important to select the optimal spatial resolution. When the selected spatial resolution is greater than the optimal one, the image quality can be degraded by using an excessively high QP value, while the quality can be degraded by aliasing artifacts when the selected spatial resolution is less than the optimal one. Nonetheless, it is difficult to find the optimal resolution because complex estimation models, which define the relationships among picture quality, bitrate, spatial resolution, and power consumption, are used in those previous works. Moreover, parameters used in the previous methods depend on the characteristics of the video content and the specific coding methods; thus, they cannot be calculated in real time. Therefore, the previous works cannot be applied to "on the fly", real-time rate control.

2.2. Comparison between spatial and temporal resolution controls

Rate control plays a critical role in the video encoder. However, the rate control algorithm is not standardized because it is independent of the decoder. For an enhanced rate control, additional control factors, such as frame rate and spatial resolution, can be used. The HVS is less sensitive to temporal details and more sensitive to spatial details, if a video is stationary. However, for video with high motion, the opposite is true. Empirically, a subjective quality degradation caused by a frame drop is more serious than that caused by spatial down-sampling at low bitrate. In Table 1, high definition (HD) video sequences are encoded at the target bitrates of 400 and 250 kbps and the spatial and temporal resolution controls are compared. When the target bitrate cannot be satisfied, even with a maximum QP value, a dyadic spatial resolution reduction (denoted by s_drop) or a dyadic temporal resolution reduction (denoted by t_drop) is applied. To implement t_drop, every alternate frame is dropped; subsequently, the undropped frames are repeated twice to replace the dropped frames. The VQM value of the Y component is measured to evaluate the subjective quality where lower VQM values represent a better subjective quality. A sequence-to-sequence comparison is made. To get the VQM of t_drop, the sequence which consists of the dropped and repeated frames as well as the undropped frames is compared with the original sequence. As shown in Table 1, s_drop always outperforms t_drop. This experiment shows that increasing the bitrate per frame by lowering the frame rate for a spatial quality does not lead to a higher subjective quality than that from a method using spatial scalability for any sequence.

Table 1 Comparison of VQM values between spatial and temporal drops at low target bitrates

Full size table

3. The proposed bitrate control with spatial scalability

3.1. Architecture of the target system

Figure 1 shows a block diagram of the target system used in this research. The input video is encoded by an H.264/AVC encoder and the reconstructed frames are stored for the next frames. It is assumed that both the encoder and decoder sides include modules to adjust the spatial resolution of the input and displaying videos, respectively. Based on the target bitrate and other encoding results, such as the current bitrate and PSNR information, the bitrate control module determines the proper QP value and the spatial resolution ratios for the encoder and the resolution conversion module, respectively. For implementation of the proposed bitrate control module in Figure 1, the bitrate control algorithm in the JM 13.2 reference software is used for the control of the QP value, whereas a new algorithm, described next, is proposed for spatial resolution control. In this article, the spatial resolution for the best video quality is determined by considering the PSNR which indicates objective quality. PSNR values are obtained from the difference between the original video and the up-sampled reconstructed video. The reason why the PSNR is chosen over VQM is that the VQM calculation is computationally expensive and needs the buffering of a few frames whereas the PSNR computation is quite simple.

3.2. Spatial resolution control

The reconstructed and up-sampled video includes two kinds of distortion. One is generated from the encoding process and the other is caused by spatial up/down-sampling. When spatial resolution control is used for bitrate control, it is important to find a resolution ratio that shows the best quality video under a given target bitrate. According to [39], experiments have shown that the PSNR degradation due to down-sampling and up-sampling operations increases approximately in proportion to the bitrate and the extent of the reduction in spatial resolution. Let PSNR_{coding_down} denote the PSNR of a video that has been down-sampled and encoded. Then, PSNR_{coding_down} is formulated as

PSN R_{coding_down} = q_{1} \cdot log R + q_{2} - q_{3} \cdot (\frac{1}{s a} - 1) \cdot R

(1)

where q 1, q 2, and q 3 are constants which depend on the video content and R is the bitrate of the encoded stream. The term sa, referred to here as the spatial resolution ratio, represents the ratio of the down-sampled frame area to the original frame area. When sa is smaller than 1, the frame is down-sampled. Equation (1) describes the relationship between sa and PSNR and thus is used for calculating the optimal spatial resolution. However, the parameters such as q 1, q 2, and q 3 in Equation (1) depend on the video content and cannot be known prior to encoding. Thus, this optimal solution cannot be applied to real-time systems on the fly.

Figure 2 shows the PSNR of HD-sized video sequences, Station2, at various spatial resolution ratios and at three bitrates: 600 kbps, 1 Mbps, and 2 Mbps. In Figure 2, the solid curves show the PSNR values obtained by simulation. Each graph has a peak PSNR value at a certain sa. The initial PSNR without spatial down-sampling is denoted by PSNR_full (when sa = 1). The peak PSNR of each graph is denoted by PSNR_peak and the sa that gives PSNR_peak is denoted by sa_peak. In the graph for the 600 kbps bitrate in Figure 2, PSNR_full is marked with a circle while PSNR_peak and sa_peak are marked with a triangle and a rectangle, respectively. If the spatial resolution ratio is adjusted to sa_peak, then the highest PSNR is achieved for a given bitrate.

In order to reduce the complexity of calculating sa_peak, a method for finding sa_peak based on a simplified model obtained from Figure 2 is proposed. In Figure 2, dotted lines connect PSNR_full and PSNR_peak. Within the range of sa from 1 to sa_peak, the dotted lines are very close to the measured data. Based on this proximity, the equation for PSNR_{coding_down} in Equation (1) is reformulated as a simplified model for low bitrate control as shown in Equation (2) in which α and β are positive and α represents the slope in the modeled graphs. If the information for α, PSNR_full, and PSNR_peak are given, the sa_peak can be estimated using the linear model in Equation (2).

PSN R_{coding_down_low R} = - α \cdot s a + β .

(2)

The slope of the model, α, can be estimated using the PSNR and sa values obtained from encoded frames. Let PSNR_prev and sa_prev denote the PSNR and sa of the previous GOP, respectively, and let PSNR_full be obtained when the first GOP is encoded with sa = 1. Based on the linear model in Equation (2), PSNR_prev and PSNR_full are expressed as given in Equations (3) and (4), respectively.

{PSNR}_{prev} = - α \cdot {s a}_{prev} + β

(3)

{PSNR}_{full} = - α \cdot 1 + β

(4)

If (4) is subtracted from (3), an estimate of α, denoted by α_est, is obtained by (5). The slope α_est obtained from (5) is the α value delayed by one GOP. It is based on the assumption that spatial resolution of the current frame can be applied to the next GOP, due to similarity between successive GOPs.

α_{est} = \frac{(PSN R_{prev} - PSN R_{full})}{1 - s a_{prev}}

(5)

PSNR_peak cannot be simply determined because it varies with both the image content and the target bitrate. In order to estimate PSNR_peak, PSNR_peak/PSNR_full ratios of 12 video sequences are measured under various target bitrates and at various values of slope α. The used videos are as follows: Akiyo and Coast Guard, with CIF (352 × 288) resolution, City, Crew, and Ice, with 4CIF (704 × 576) resolution, Aspen, Factory, Old Town Cross, Parkrun, and Pedestrian Area with HD (1280 × 720) resolution and West Windy Easy and Touchdown Pass, both with full HD (1920 × 1080) resolution are used in the evaluation. A sample of such results is shown in Figure 3 which shows that PSNR_peak/PSNR_full ratio is approximately proportional to the slope α regardless of video sequence types and bitrates. From this observation, PSNR_peak is calculated as given in (6), with the coefficients being chosen experimentally.

PSN R_{peak} = (0.03 \times PSN R_{full} \times (α - 0.5)) + PSN R_{full}

(6)

Based on the linear model in (2), sa_peak is calculated within the range from 0.1 to 1 as

s a_{peak} = 1 - \frac{(PSN R_{peak} - PSN R_{full})}{α} (0 .1 \leq s a_{peak} \leq 1),

(7)

where α and PSNR_peak are obtained from (5) and (6), respectively.

Slope α cannot be estimated before the second GOP because PSNR_prev and sa_prev in (5) are not given yet. In this case, the initial value of α is set to a small value from 0.1 to 0.3. This initial α is referred to as α_init. If PSNR_peak in (6) is substituted in (7), sa_peak is in inverse proportional to α. Thus, if a small α_init is used in (7), sa_peak value is relatively large and thus, the PSNR degradation caused by an excessive down-sampling operation can be avoided. Once PSNR_prev and sa_prev are obtained from the results for the second GOP encoded with α_init, then α_est calculated from (5) is used as follows:

α = \{\begin{matrix} α_{init} & for the second GOP \\ α_{est}, & after the second GOP \end{matrix}

(8)

The target spatial resolution ratio, sa_target, is adjusted only when the video quality is lower than an acceptable level. Let PSNR_target denote the PSNR for the acceptable image quality as required by users or applications. It is assumed that the reconstructed image is visually indistinguishable from the original one if the PSNR is greater than from 35 to 40 dB [40–43]. Thus, PSNR_target is set to 40 dB in this article. If PSNR_full is larger than PSNR_target, no spatial resolution adjustment is necessary, that is, the target spatial resolution ratio is set to 1. Otherwise, sa_target is adjusted to the sa_peak as in (7). Thus, sa_target is given by

s a_{target} = \{\begin{matrix} 1, & if (PSN R_{full} \geq {PSNR}_{target}) \\ s a_{peak}, & else \end{matrix}

(9)

3.3. The proposed bitrate control algorithm

Figure 4 shows the proposed bitrate control algorithm in which the QP and the spatial resolution ratio are determined sequentially in order to reach the PSNR_target. The QP value is decided frame-by-frame, whereas the spatial resolution ratios are determined for each GOP. At initialization, the PSNR_target is defined, and the algorithm step is started at Step 1. In the first GOP of Step 1, the spatial resolution ratio is not changed but only the QP is controlled to meet the target bitrate like a conventional QP-based rate control algorithm. If the target bitrate denoted by bitrate_target in Figure 4 cannot be satisfied in Step 1, the spatial resolution for the next GOP is simply down-sampled by a factor of 2, compared to that of the current GOP because meeting the bitrate_target is paramount. The encoding for the next GOP is started from Step 1 with a half-reduced spatial resolution. If the generated bitrate of the GOP in Step 1 meets the bitrate_target in Figure 4, Step 1 proceeds to Step 2. In Step 2, the sa_peak and PSNR_peak for the GOP to be encoded are calculated from (6) and (7), where the PSNR obtained from Step 1 and α_init are used. The sa_target is then determined from (9). In Step 3, the PSNR obtained from Step 2 is evaluated and used as PSNR_prev in (5) to adjust the proper α. Using the adjusted α denoted by α_est, the sa_peak, PSNR_peak, and sa_target for the GOP in Step 3 are calculated from (6), (7), and (9), respectively. The sa_target, determined once in Step 3, is used continually for the subsequent GOPs in Step 4. As long as the R-D characteristics of successive frames are similar, encoding with a sa_target works well. To cope with varying R-D characteristics, actions for bitrate change and QP change are described in Figure 4. If bitrate_target is changed, the relation between the spatial resolution and the PSNR becomes different, thus the slope α_est value needs to be refreshed through Steps 1, 2, and 3. Before going back to Step 1, the full size is set to 1 which is a full resolution with no reduction of spatial resolution when the increase of bitrate_target is greater than TH_BR. If the decrease of bitrate_target is greater than TH_BR, the full size is set to the current sa_target. A new sa_target for the decreased target bitrate will be determined as a value less than the current sa_target. In this article, TH_BR is calculated by using 0.02 × frame rate × original spatial resolution. Even though the bitrate_target is the same, the motion characteristics of video can be changed. If the motion is faster, the average QP value of the recent frames becomes higher than that of the previous ones, and vice versa. Therefore, sa_target is adjusted in a fine-grain manner by the change in the average QP. As shown in Figure 4, the whole flow to decide the proper spatial resolution is processed automatically and does not depend on advanced information about characteristics of the video content and the specific coding methods. Therefore, the proposed algorithm can easily be applied to real-time applications.

4. Experimental results

The proposed bitrate control scheme is implemented and integrated into the JM 13.2 reference software which adopts the QP-based bitrate control. To resize the spatial resolution, the up/down-sampling algorithms recommended in the SVC are used. For down-sampling, the algorithm based on the Sine-windowed Sinc-function is applied where a set of seven filters is used to support the extended range of the spatial scaling ratio. For up-sampling, the SVC normative up-sampling algorithm is applied which is based on a set of 6-taps filters derived from the Lanczos-3 filter. In this experiment, five HD video sequences, Pedestrian Area, Tractor, Station2, Sunflower, and Blue Sky, two full HD video sequences, Speed Bag and Life and two 4CIF videos, Harbor and Soccer, are used. The chosen length of a GOP is 30 frames and 150 frames are encoded. The GOP structure is IPPP.

Table 2 shows the average PSNR and VQM values of Y component for the fourth and the fifth GOPs. In the second column, 'Conventional' represents that the nine video sequences are encoded in the full size without spatial resolution control. When the target bitrate cannot be satisfied even with the maximum QP value, the frame rate is decreased by a half. The frame drop is realized by a factor of 2 in the encoder side which encodes all the macroblocks as the SKIP mode in the dropped frame. Thus, in the decoder side, frames are displayed at 30 fps with the dropped frame by a repetition of the previous frame. 'Proposed' and 'Optimal' represent, respectively, the proposed spatial resolution control and the optimal spatial resolution values. The optimal values are obtained experimentally by changing the resolution ratios from 0.1 to 1 and by measuring the PSNR values. Unlike a full size encoding, the rate control schemes represented by 'Proposed' and 'Optimal' do not use a frame-drop to meet the target bitrate. Note that the optimal resolution is a theoretical upper-bound of the spatial control and cannot be calculated on the fly. Experiments are conducted using these three rate control schemes with various bitrates: 250, 400, 600, 800, and 1000 kbps for HD, 400, 800, 1500, and 2000 kbps for full HD and 150, 300, 400, and 1000 kbps for 4CIF. In the result of HD with the high target bitrates of 1000 kbps, PSNR and VQM enhancements by the proposed rate control are 0.82 and 0.12 dB, respectively. As the target bitrate decreases, the VQM enhancement by the proposed rate control is increased by 5.15. Note that the PSNR difference between the proposed rate control and the conventional QP control is not large at the target bitrates of 400 and 250 kbps, whereas the PSNR enhancement by the proposed algorithm is 1.85 dB at the target bitrate of 600 kbps. This is because the frame rate is decreased in the ultra-low bitrate in the case of the conventional algorithm. Thus, at the target bitrates of 400 and 250 kbps, the PSNR of each frame encoded with the conventional algorithm is a little enhanced because the allocated bits per frame are increased. In this experiment, the optimal spatial resolution is chosen because it has the highest PSNR value than the other resolutions. The VQM value with the optimal spatial resolution could be a little worse than the one calculated from the proposed bitrate control like the cases at bitrate 250 and 800 kbps because of the difference between the two calculation methods, PSNR and VQM. However, the optimal PSNR values are always higher than the results from the conventional or the proposed bitrate control. For full HD and 4CIF as well as HD, the PSNR and VQM of the proposed spatial resolution are very close to that of the optimal one as shown in Table 2.

Table 2 Comparison of the PSNR and VQM among the conventional control, the proposed spatial resolution and the optimal spatial resolution at various target bitrates

Full size table

Figure 5 presents the R-D performance for three video sequences when using the proposed spatial resolution control. Both objective (PSNR) and subjective (VQM) results are presented to show the performance of the proposed algorithm. The gray solid curve represents the results of encoding with optimal spatial resolution whereas the black solid curve represents the results when the proposed algorithm is applied. The conventional rate control is represented by the black dotted curve. In Figure 5a-c, at the target bitrate of 600 kbps, the proposed algorithm enhances PSNR by 2.2, 3.1, and 3.4 dB, respectively, compared to results using the conventional rate control. Furthermore, the differences between 'Optimal' and 'Proposed' are negligible. As the target bitrate increases, PSNR improvement decreases. When the target bitrate is extremely low, such as 250 and 400 kbps, the allocated bits per pixel (bpp) are just 0.009 and 0.014 bpp, respectively. A conventional QP-based bitrate control cannot meet the target bitrate, even when using the maximum QP value. In the 'Conventional' method of Figure 5, frames are dropped to meet the target bitrate. Thus, video sequences are encoded at 15 and 7.5 fps for 400 and 250 kbps, respectively, whereas the frame rate is 30 fps in 600, 800, and 1000 kbps. For three video sequences, Sunflower, Pedestrian Area, and Tractor at 400 kbps, frame skips work for PSNR enhancement, to a limited amount, because the allocated bits per pixel are increased. However, additional frame skips, used to meet the target bitrate of 250 kbps, do not help the quality enhancement of each frame for the Sunflower and Tractor videos. Because the temporal correlation between frames becomes low, it results in low compression efficiency. The PSNR of the Pedestrian Area video is increased a little more at 250 kbps. In general, the results depend on the characteristic of the video sequence. In Figure 5d-f, VQMs clearly show that the proposed algorithm produces a significant improvement compared to the conventional rate control. The proposed rate control algorithm maintains a similar VQM quality from 1000 to 250 kbps, while the VQM of the conventional rate control increases drastically as the target bitrate decreases.

In Table 3, the average values of sa, as determined in the experiments summarized in Table 2, are shown. As the target bitrate increases, the 'Optimal' sa also increases. The values of sa determined by the 'Proposed' algorithm are very similar to those from the 'Optimal' one. In this experiment, the difference between 'Optimal' and 'Proposed' sa values is, on average, just 0.05.

Table 3 Comparison of the SA between the proposed spatial resolution and the optimal spatial resolution at various target bitrates

Full size table

In Figure 6, the values of sa for each GOP are presented for three HD video sequences, Sunflower, Pedestrian Area, and Tractor, which are encoded with various bitrates. In Figure 6a-c, the vertical axis shows sa while the horizontal axis shows the GOP number. Step 1 of Figure 4 starts from GOP 1. For 400 and 250 kbps, the extremely low bitrate control is already carried out to meet the target bitrate, as explained in Figure 4. Thus, the full sizes in GOP 1 for 400 and 250 kbps are 0.5 and 0.25, respectively. Through the proposed spatial resolution adjustment from GOP 1 to GOP 3, the sa is determined and stabilized. In Figure 6d-f, the sa for each GOP is compared to the optimal one. The vertical axis in Figure 6d-f shows the difference between the proposed and optimal sa values. To determine the difference, the optimal sa is subtracted from the proposed one. As shown in these graphs, the differences are very small.

Figures 7 and 8 show the 76th frame of the Sunflower sequence and the 135th frame of the Pedestrian Area sequence, respectively. The result of applying the conventional bitrate control, which uses only QP change, is shown in Figures 7a and 8a. As observed in these figures, much of the details in the frame is destroyed or decreased. In Figures 7b and 8b, QP and spatial resolution are controlled by the proposed rate control and, subjectively, the quality is better than the quality in Figures 7a and 8a.

Figure 9 shows the performance of the proposed algorithm when the motion characteristics are changed during encoding operations. For that figure, 300 frames of the 4CIF Soccer video sequence are encoded. The length of the GOP is 30. The values of sa and PSNR for each GOP are presented in Figure 9a, b, respectively. In that figure, the dotted graphs represent the results obtained from the proposed algorithm while the solid graphs represent the optimal values. In the 300 frames of Soccer, the first half has a slower motion than the second half. Until GOP 5, the sa value is determined to be 0.26 by the proposed algorithm. After that, the motion becomes faster, and consequently, the QP values are increased. Therefore, the sa value for the second half of the sequence is adjusted downward. In Figure 9a, sa is set to 0.19 for GOPs 6, 7, 8, and 9 and 0.13 for GOP 10. From GOP 3 to GOP 10, the differences between 'Proposed' and 'Optimal' values are just 0.02, on average. In Figure 9b, the PSNR values for 'Proposed' and 'Optimal' cases are compared and the differences are negligible.

In this experiment, the performance of the proposed algorithm is tested when the target bitrate is changed in the middle of the encoding process. The 4CIF Harbor video sequence is used as an example. In Figure 10a, b, the target bitrate for the first five GOPs is 300 kbps, whereas the target bitrate for the last five GOPs is 1500 kbps. In Figure 10a, the optimal spatial resolutions are 0.25 and 0.66 for the first and last half, respectively. In the proposed algorithm, sa is chosen initially as 0.5 because the 300 kbps target bitrate is extremely low; subsequently, the sa is determined to be 0.2 by the proposed algorithm. From GOP 5 onward, the target bitrate is increased to 1500 kbps. Thus, sa is set to 1 in GOP 5 and the process to determine the proper spatial resolution is restarted. On the basis of the bitrate change action explained in Figure 4, sa for the 1500 kbps target bitrate is determined to be 0.5. In Figure 10b, the PSNR values for the 'Proposed' and 'Optimal' cases are compared and the difference is 0.5 dB, on average. In Figure 10c, d, the target bitrate is reduced from 1000 to 500 kbps. For the first five GOPs, sa is decreased to 0.5 for 1000 kbps by the proposed algorithm. For the following five GOPs, sa is reduced because the target bitrate is reduced to 500 kbps. As shown in Figure 10c, the shape of the dotted graph obtained from the proposed algorithm closely follows the trend of the optimal graph. The difference between the two PSNR values in Figure 10d is just 0.28 dB.

5. Conclusion

The main contribution of this article is a real-time bitrate control algorithm using spatial down-sampling for the low bitrate encoding. The previous resolution control schemes are too complex to be processed at run time. In this article, a simple estimation model which defines the relationship between the PSNR and the spatial resolution ratio is presented for low bitrate compression. This estimation model is used to find the resolution ratio for acceptable quality on the fly for real-time systems. Two scalability tools for the QP and spatial resolution ratio are determined sequentially to reach the target PSNR. Experimental results show that the proposed bitrate control algorithm is close to the optimal solution and yields the better PSNR and VQM quality at various bitrates compared to the conventional QP-based bitrate control algorithm.

References

Li ZG, Pan F, Lim KP, Rahardja S: Adaptive rate control for H.264. Proc IEEE Int Conf Image Processing Singapore 2004, 2: 745-748.
Google Scholar
Kamaci N, Altunbasak Y, Mersereau RM: Frame bit allocation for the H.264/AVC video coder via Cauchy-density-based rate and distortion models. IEEE Trans Circ Syst Video Technol 2005, 15(8):994-1006.
Article Google Scholar
Ma S, Gao W, Lu Y: Rate-distortion analysis for H.264/AVC video coding and its application to rate control. IEEE Trans Circ Syst Video Technol 2005, 15(12):, 1533-1544.
Article Google Scholar
He Z, Kim Y-K, Mitra SK: Low-delay rate control for DCT video coding via ρ-domain source modeling. IEEE Trans Circ Syst Video Technol 2001, 11(8):928-940. 10.1109/76.937431
Article Google Scholar
Yuan W, Lin S, Zhang Y, Yuan W, Luo H: Optimum bit allocation and rate control for H.264/AVC. IEEE Trans Circ Syst Video Technol 2006, 16(6):705-715.
Article Google Scholar
Kwon D-K, Shen M-Y, Jay Kuo C-C: Rate control for H.264 video with enhanced rate and distortion models. IEEE Trans Circ Syst Video Technol 2007, 17(5):517-529.
Article Google Scholar
An C, Nguyen TQ: Iterative rate-distortion optimization of H.264 with constant bit rate constraint. IEEE Trans Image Process 2008, 17(9):1605-1615.
Article MathSciNet Google Scholar
Sullivan G, Wiegand T, Lim K-P: Joint model reference encoding methods and decoding concealment methods. Section 2.6 rate control JVT-I049 2003.
Google Scholar
Ma S, Gao W, Wu F, Lu Y: Rate control for JVT video coding scheme with HRD considerations. Proc IEEE ICIP Spain 2003, 3: 793-796.
Google Scholar
Yu HT, Pan F, Lin ZP: A new bit estimation scheme for H.264 rate control. Proc IEEE Int Symp Consumer Electronics, UK 2004, 396-399.
Google Scholar
Yang X, Lin W, Lu Z, Lin X, Rahardja S, Ong E, Yao S: Rate control for videophone using local perceptual cues. IEEE Trans Circ Syst Video Technol 2005, 15(4):496-507.
Article Google Scholar
Liu Y, Li ZG, Soh YC: Region-of-interest based resource allocation for conversational video communication of H.264/AVC. IEEE Trans Circ Syst Video Technol 2008, 18(1):134-139.
Article Google Scholar
Li H, Wang Z, Cui H, Tang K: An improved ROI-based rate control algorithm for H.264/AVC. Proc Int Conf Signal Processing China 2006, 2: 16-20.
Google Scholar
Hrarti M, Saadane H, Larabi M, Tamtaoui A, Aboutajdine D: A macroblock-based perceptually adaptive bit allocation for H264 rate control. Proc Int Symposium on I/V Communications and Mobile Network, Morocco 2010, 1-4.
Google Scholar
Huang C-M, Lin C-W: A novel 4-D perceptual quantization modeling for H.264 bit-rate control. IEEE Trans Multimed 2007, 9(6):1113-1124.
Article Google Scholar
Meng Q, Meng Q: Improved macroblock-level rate control algorithm with visual properties. Proc Int Workshop Intelligent Systems and Applications, China 2010, 1-5.
Google Scholar
Cui Z, Zhu X: SSIM-based content adaptive frame skipping for low bit rate H.264 video coding. Proc Int Conf Communication Technology, China 2010, 484-487.
Google Scholar
Ou T-S, Huang Y-H, Chen HH: SSIM-based perceptual rate control for video coding. IEEE Trans Circ Syst Video Technol 2011, 21(5):682-691.
Article Google Scholar
Jin R, Chen J: The coding rate control of consistent perceptual video quality in H.264 ROI. Proc Int Symposium Computer Network and Multimedia Technology, China 2009, 1-4.
Google Scholar
Wolf S, Pinson M: VQM software and measurement techniques. National Telecommunications and Information Administration Report 2002.
Google Scholar
Pan F, Lin X, Rahardja S, Lim KP, Li ZG, Wu DJ, Wu S: Proactive frame-skipping decision scheme for variable frame rate video coding. Proc Int Conf Multimedia and Expo Taiwan 2004, 3: 1903-1906.
Google Scholar
Pan F, Lin ZP, Lin X, Rahardja S, Juwono W, Slamet F: Adaptive frame skipping based on spatio-temporal complexity for low bit-rate video coding. J Vis Commun Image R 2006, 17(3):554-563. 10.1016/j.jvcir.2005.07.006
Article Google Scholar
Jun J, Lee S, He Z, Lee M, Jang ES: Adaptive key frame selection for efficient video coding. LNCS 2007, 4872: 853-866.
Google Scholar
Liu S, Kuo CJ: Joint temporal-spatial bit allocation for video coding with dependency. IEEE Trans Circ Syst Video Technol 2005, 15(1):15-26.
Article Google Scholar
Vetro A, Wang Y, Sun HF: Rate-distortion optimized video coding considering frameskip. Proc Int Conf Image Processing Greece 2001, 3: 534-537.
Google Scholar
Kim J, Kim Y-G, Song H, Kuo T-Y, Chung YJ, Kuo C-CJ: TCP-friendly Internet video streaming employing variable frame-rate encoding and interpolation. IEEE Trans Circ Syst Video Technol 2000, 10(7):1164-1177. 10.1109/76.875520
Article Google Scholar
Song H, Kuo C-CJ: Rate control for low-bit-rate video via variable-encoding frame rates. IEEE Trans Circ Syst Video Technol 2001, 11(4):512-521. 10.1109/76.915357
Article Google Scholar
Thaipanich T, Wu P-H, Kuo C-CJ: Low complexity algorithm for robust video frame rate up-conversion (FRUC) technique. IEEE Trans Consum Electron 2009, 55(1):220-228.
Article Google Scholar
Jackson AHAM, McEwan R, Mullin J: Impact of video frame rate on communicative behavior in two and four party groups. Proc ACM Conf Comput Supported Cooperative Work, Philadelphia, PA 2000, 11-20.
Google Scholar
Information technology - Coding of Audio-visual Objects - Part 2: Visual International Organization for Standardization 2000. ISO/IEC 14496-2:1999/Amd.1:2000(E)
Cote G, Erol B, Gallant M, Kossentini F: H.263+: video coding at low bit rates. IEEE Trans Circ Syst Video Technol 1998, 8(7):849-866. 10.1109/76.735381
Article Google Scholar
Segall CA, Elad M, Milanfar P, Webb R, Fogg C: Improved high-definition video by encoding at an intermediate resolution. Proc Conf Visual Communications and Image Processing USA 2004, 5308: 1007-1018.
Google Scholar
Bruckstein AM, Elad M, Kimmel R: Down-scaling for better transform compression. IEEE Trans Image Process 2003, 12(9):1132-1145. 10.1109/TIP.2003.816023
Article MathSciNet MATH Google Scholar
Ilgin HA, Chaparro LF: Low bit rate video coding using DCT based fast decimation/interpolation and embedded zero tree coding. IEEE Trans Circ Syst Video Technol 2007, 17(7):833-844.
Article Google Scholar
Nguyen VA, Tan YP, Lin WS: Adaptive downsampling/upsampling for better video compression at low bit rate. Proc of Int Symposium on Circuits and Systems, USA 2008, 1624-1627.
Google Scholar
Shu HY, Chau LP: The realization of arbitrary downsizing video transcoding. IEEE Trans Circ Syst Video Technol 2006, 16(4):540-546.
Article Google Scholar
Tan Y-P, Liang Y, Sun H: On the methods and performances of rational downsizing video transcoding. Signal Process Image Commun 2004, 19: 47-65. 10.1016/j.image.2003.08.017
Article Google Scholar
Wang R-J, Chien M-C, Chang P-C: Adaptive down-sampling video coding. Proc SPIE Multimedia on Mobile Devices 2010, 7542: 1-8.
Google Scholar
Lee H, Lee Y, Lee J, Lee D, Shin H: Design of a mobile video streaming system using adaptive spatial resolution control. IEEE Trans Consum Electron 2009, 55(3):1682-1689.
Article Google Scholar
Sirhindi R, Murtaza S, Afzal M: Improved data hiding technique for shares in extended visual secret sharing schemes. LNCS Inf Commun Secur 2008, 5308: 376-386. 10.1007/978-3-540-88625-9_25
Google Scholar
Samuel S, Penzhorn WT: Digital watermarking for copyright protection. Proc Conf AFRICON 2004, 2: 953-957.
Google Scholar
Kasmani SA, Naghsh-Nilchi A: A new robust digital image watermarking technique based on joint DWT-DCT transformation. Proc Int Conf on Convergence and Hybrid Information Technology, Korea 2008, 539-544.
Google Scholar
Liu R, Wang G, Wang P, Huang W: An image authentication scheme based on sliding window. Proc Conf Control and Decision, China 2008, 2937-2940.
Google Scholar

Download references

Acknowledgements

This study was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (2011-0027502).

Author information

Authors and Affiliations

Inter-university Semiconductor Research Center (ISRC), Department of Electrical Engineering and Computer Science, Seoul National University, Seoul, Korea
Chae Eun Rhee & Hyuk-Jae Lee
Department of Electronic Engineering of Sun Moon University, Asan, 336-708, Korea
Jin-Sung Kim

Authors

Chae Eun Rhee
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Sung Kim
View author publications
You can also search for this author in PubMed Google Scholar
Hyuk-Jae Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Sung Kim.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Authors’ original file for figure 15

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Rhee, C.E., Kim, JS. & Lee, HJ. Bitrate control using a heuristic spatial resolution adjustment for a real-time H.264/AVC encoder. EURASIP J. Adv. Signal Process. 2012, 87 (2012). https://doi.org/10.1186/1687-6180-2012-87

Download citation

Received: 19 July 2011
Accepted: 23 April 2012
Published: 23 April 2012
DOI: https://doi.org/10.1186/1687-6180-2012-87

Bitrate control using a heuristic spatial resolution adjustment for a real-time H.264/AVC encoder

Abstract

1. Introduction

2. Background

2.1. Previous study on spatial and temporal resolution controls

2.2. Comparison between spatial and temporal resolution controls

3. The proposed bitrate control with spatial scalability

3.1. Architecture of the target system

3.2. Spatial resolution control

3.3. The proposed bitrate control algorithm

4. Experimental results

5. Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords