Skip to main content

Bitrate control using a heuristic spatial resolution adjustment for a real-time H.264/AVC encoder


Conventional bitrate control algorithms that change only the quantization parameter (QP) often suffer from quality degradation when the target bitrate is very low. Therefore, rate control algorithms that adjust spatial resolution in addition to QP control have recently been proposed, but their computations are too complex to be processed in real time. This research proposes a very simple, but effective, rate control algorithm that employs spatial resolution control as well as the existing QP-based bitrate control. The spatial resolution ratio for the best peak signal-to-noise ratio (PSNR) is calculated using a simple estimation model which defines the relationship between the PSNR and the spatial resolution at very low bitrate compression. In the proposed bitrate control algorithm, two scalability tools for adjusting the QP and the spatial resolution ratio are used sequentially to reach the target PSNR and the control decision is made for a group of pictures. Experimental results show that the proposed bitrate control algorithm approximates an optimal solution and yields a better subjective quality as well as objective quality at various bitrates compared to the conventional QP-based bitrate control algorithm. The decision of the control parameters requires very small computational complexity and is made in a completely automatic manner so that the proposed algorithm is well suited for real-time applications.

1. Introduction

H.264/AVC standard is widely used in video streaming, video communication, and various mobile video applications due to its high compression efficiency through the use of many advanced tools. In recent video applications, the video resolution tends to be larger and, thus, the bandwidth requirement for video transmission also increases. As a result, the importance of the bitrate control is growing as it is necessary to regulate the bitrate of a video stream in order to achieve the target bitrate. In bitrate control algorithms, the target bits are allocated at the frame-level or the macroblock-level by considering the fullness of the output buffer and the encoding complexity. While achieving the allocated bit budget, rate control attempts to make quality compromises in ways that would have a minimum degradation on perceived through controlling quantization parameter (QP) values. When the QP is small, much of the information is preserved and when the QP is large, much of the information is discarded to reduce the bitrate at the cost of an increased distortion.

To achieve a target bitrate in real constraint situations, such as within a specific channel bandwidth or within defined encoder and decoder buffer sizes, QP values must vary dynamically based on the complexity of the input video and the current bitrate. QP values are determined using a rate-quantization (R-Q) model where the generated bitrate is modeled as a function of the QP and the complexity of the residual signal such as the mean absolute difference (MAD). In H.264/AVC, QPs are used not only in rate control, but also in rate distortion optimization (RDO). However, the famous "chicken-egg" dilemma complicates the selection of the QP value. The function for the RDO needs a pre-determined QP for its lambda factor. However, the QP can be determined by the MAD, which is only available after the RDO is obtained. To resolve that dilemma, the MAD is predicted from the previous frame using a linear function [1]. The estimated MAD from the previous frame is often different from the actual MAD in the current frame. Thus, an inadequate QP may be selected. For improved coding efficiency, enhanced distortion models [26] are developed and rate-distortion (R-D) model is optimized or contents-aware bit allocation is proposed [510]. A number of recent approaches incorporate characteristics of the human visual system (HVS) into bitrate control. Moreover, there are several reports on region-of-interest (ROI)-based bit allocation [1113]; such approaches can potentially improve the perceived visual quality of images. In addition to ROI-based methods, a new R-D model, along with frame skipping and bit-allocation schemes, using various perceptual metrics that are based on the characteristics of HVS, is proposed [1419]. However, the conventional bitrate control algorithms based on QP control often suffer from perceptible image quality degradations, such as blocking, ringing, or texture-deviation artifacts, when the target bitrate is very low. Furthermore, when the target bitrate is not satisfied, even with the maximum QP value, the conventional rate control cannot avoid a sudden frame drop which results in video quality degradation.

This article proposes a simple but effective bitrate control algorithm that applies spatial resolution controls to the conventional QP-based bitrate control. In the proposed algorithm, a new model that represents the relationship between the spatial resolution and the peak signal-to-noise ratio (PSNR) for low bitrate coding is proposed. By using the proposed model, the spatial resolution that gives the highest PSNR at a given bitrate is estimated. In the proposed model, the computational complexity is very low as it only requires a small number of parameters and the value of parameters is obtained in a heuristic manner. The two scalability tools, the QP and the spatial resolution, are processed sequentially to achieve the target PSNR. The proposed spatial resolution adjustment which is applied in the group of pictures (GOP)-level is used as a coarse-grain bitrate control. Inside a GOP, QP is changed by a conventional bitrate control to meet the allocated bits in a fine-grain manner. Thus, the target bitrate is satisfied with a combination of two control methods. To estimate the perceptual quality of encoded video sequences with reduced spatial resolution, Video Quality Metric (VQM) software [20] is used to measure the subjective quality in addition to the PSNR which measures the objective quality. The VQM computes the perceptual effects of video impairments including blurring, jerky/unnatural motion, global noise, block distortion and color distortion, and combines them into a single metric. Experimental results show that proposed bitrate control scheme outperforms the conventional QP-based bitrate control algorithm at a variety of bitrates. At a low bitrate, the PSNR and VQM values with the proposed spatial resolution control scheme are improved up to 1.85 and 5.15 dB, respectively, when compared to that with the conventional QP-only control. There is only a small difference between the real optimal spatial resolution and the spatial resolution obtained using the proposed scheme.

This article is organized as follows. In Section 2, background is presented and Section 3 explains the proposed bitrate control algorithm. Experimental results are presented in Section 4 and conclusions are given in Section 5.

2. Background

2.1. Previous study on spatial and temporal resolution controls

To improve flexibility in bit allocation, some rate control algorithms adjust frame rate or spatial resolution. When the frame rate decreases, additional bits can be allocated to each frame and frame image quality can be improved. However, frame skipping should be done very carefully because motion artifacts such as flickering or motion jerkiness may degrade subjective video quality. In [21, 22], the decision for the frame skip is based on buffer fullness and the spatial and temporal quality of the video. In [23], the similarity between successive frames measured by the PSNR is used to skip frames adaptively. To optimize coding performance, frames to be skipped are determined based on an R-D model [24, 25], but these works cannot be used in real-time applications. In [2628], motion artifacts are reduced by adjusting the frame rate gradually based on the motion activity of the previous sub-GOP which is expressed by the histogram of difference image, thereby preserving motion smoothness. Even though a number of previous works have contributed to frame rate controls, the effect of providing bitrate control through frame rate adjustment is somewhat limited because, to avoid motion artifacts, the frame rate cannot be reduced below a certain value. In addition, temporal scalability may not be very effective for increasing subjective quality by temporal to spatial bitrate exchange. This is because the quality degradation due to dropping a frame is easily perceived, especially in low frame rate communications such as that used in two-way multimedia communication for mobile devices [29].

Spatial resolution control is another approach used for bit allocation when the target bitrate is very low. The dynamic resolution conversion (DRC) mode, which is supported in the advanced simple profile of MPEG-4, enables the video object plane to be encoded with reduced spatial resolution [30]. Similarly, a reduced resolution update (RRU) coding tool is adopted in Annex Q of the H.263 standard [31]. The RRU reduces the bitrate by coding the prediction error residuals at a reduced spatial resolution. However, the DRC and RRU techniques are not included in the H.264/AVC standard. Meanwhile, a number of previous studies have reported a relationship between down-sampling and video quality at a low bitrate. In [32], the optimum down-sampling ratio is determined according to the bitrate. In [33], it is reported that a down-sampled video, prior to compression and later up-sampled, visually outperforms that video compressed directly at high resolution with the same number of bits, when the target bitrate is very low. With this observation, a method to find the optimal down-sampling rate is suggested. These schemes are exploited by JPEG image compression standard not by video compression standards [29]. In [34, 35], discrete cosine transform (DCT) coefficients are decimated prior to quantization to reduce spatial resolution, but modifying the coding loop loses the conformation to the syntax compatibility for the video coding standard. In video transcoders, spatial resolution control has been an important factor in meeting a different target bitrate [36]. However, many previous works have focused on simplifying the computations in transcoding. In [37], the linear R-Q model is proposed to select the proper frame size, but it is not applicable to practical applications. Recently, in [38], the overall distortion is analyzed and the optimal spatial resolution is derived for a given bitrate. In [39], the spatial resolution ratio is appropriately selected, according to picture quality, bit rate, and power consumption. As shown in the above-mentioned works, in order to accomplish bitrate control by spatial resolution control, it is very important to select the optimal spatial resolution. When the selected spatial resolution is greater than the optimal one, the image quality can be degraded by using an excessively high QP value, while the quality can be degraded by aliasing artifacts when the selected spatial resolution is less than the optimal one. Nonetheless, it is difficult to find the optimal resolution because complex estimation models, which define the relationships among picture quality, bitrate, spatial resolution, and power consumption, are used in those previous works. Moreover, parameters used in the previous methods depend on the characteristics of the video content and the specific coding methods; thus, they cannot be calculated in real time. Therefore, the previous works cannot be applied to "on the fly", real-time rate control.

2.2. Comparison between spatial and temporal resolution controls

Rate control plays a critical role in the video encoder. However, the rate control algorithm is not standardized because it is independent of the decoder. For an enhanced rate control, additional control factors, such as frame rate and spatial resolution, can be used. The HVS is less sensitive to temporal details and more sensitive to spatial details, if a video is stationary. However, for video with high motion, the opposite is true. Empirically, a subjective quality degradation caused by a frame drop is more serious than that caused by spatial down-sampling at low bitrate. In Table 1, high definition (HD) video sequences are encoded at the target bitrates of 400 and 250 kbps and the spatial and temporal resolution controls are compared. When the target bitrate cannot be satisfied, even with a maximum QP value, a dyadic spatial resolution reduction (denoted by s_drop) or a dyadic temporal resolution reduction (denoted by t_drop) is applied. To implement t_drop, every alternate frame is dropped; subsequently, the undropped frames are repeated twice to replace the dropped frames. The VQM value of the Y component is measured to evaluate the subjective quality where lower VQM values represent a better subjective quality. A sequence-to-sequence comparison is made. To get the VQM of t_drop, the sequence which consists of the dropped and repeated frames as well as the undropped frames is compared with the original sequence. As shown in Table 1, s_drop always outperforms t_drop. This experiment shows that increasing the bitrate per frame by lowering the frame rate for a spatial quality does not lead to a higher subjective quality than that from a method using spatial scalability for any sequence.

Table 1 Comparison of VQM values between spatial and temporal drops at low target bitrates

3. The proposed bitrate control with spatial scalability

3.1. Architecture of the target system

Figure 1 shows a block diagram of the target system used in this research. The input video is encoded by an H.264/AVC encoder and the reconstructed frames are stored for the next frames. It is assumed that both the encoder and decoder sides include modules to adjust the spatial resolution of the input and displaying videos, respectively. Based on the target bitrate and other encoding results, such as the current bitrate and PSNR information, the bitrate control module determines the proper QP value and the spatial resolution ratios for the encoder and the resolution conversion module, respectively. For implementation of the proposed bitrate control module in Figure 1, the bitrate control algorithm in the JM 13.2 reference software is used for the control of the QP value, whereas a new algorithm, described next, is proposed for spatial resolution control. In this article, the spatial resolution for the best video quality is determined by considering the PSNR which indicates objective quality. PSNR values are obtained from the difference between the original video and the up-sampled reconstructed video. The reason why the PSNR is chosen over VQM is that the VQM calculation is computationally expensive and needs the buffering of a few frames whereas the PSNR computation is quite simple.

Figure 1
figure 1

Architecture of a target system.

3.2. Spatial resolution control

The reconstructed and up-sampled video includes two kinds of distortion. One is generated from the encoding process and the other is caused by spatial up/down-sampling. When spatial resolution control is used for bitrate control, it is important to find a resolution ratio that shows the best quality video under a given target bitrate. According to [39], experiments have shown that the PSNR degradation due to down-sampling and up-sampling operations increases approximately in proportion to the bitrate and the extent of the reduction in spatial resolution. Let PSNRcoding_down denote the PSNR of a video that has been down-sampled and encoded. Then, PSNRcoding_down is formulated as

PSN R coding_down = q 1 log R + q 2 - q 3 1 s a - 1 R

where q 1, q 2, and q 3 are constants which depend on the video content and R is the bitrate of the encoded stream. The term sa, referred to here as the spatial resolution ratio, represents the ratio of the down-sampled frame area to the original frame area. When sa is smaller than 1, the frame is down-sampled. Equation (1) describes the relationship between sa and PSNR and thus is used for calculating the optimal spatial resolution. However, the parameters such as q 1, q 2, and q 3 in Equation (1) depend on the video content and cannot be known prior to encoding. Thus, this optimal solution cannot be applied to real-time systems on the fly.

Figure 2 shows the PSNR of HD-sized video sequences, Station2, at various spatial resolution ratios and at three bitrates: 600 kbps, 1 Mbps, and 2 Mbps. In Figure 2, the solid curves show the PSNR values obtained by simulation. Each graph has a peak PSNR value at a certain sa. The initial PSNR without spatial down-sampling is denoted by PSNRfull (when sa = 1). The peak PSNR of each graph is denoted by PSNRpeak and the sa that gives PSNRpeak is denoted by sapeak. In the graph for the 600 kbps bitrate in Figure 2, PSNRfull is marked with a circle while PSNRpeak and sapeak are marked with a triangle and a rectangle, respectively. If the spatial resolution ratio is adjusted to sapeak, then the highest PSNR is achieved for a given bitrate.

Figure 2
figure 2

PSNR depending on the spatial resolution ratio at various low bitrates obtained with HD-sized Station2 sequence.

In order to reduce the complexity of calculating sapeak, a method for finding sapeak based on a simplified model obtained from Figure 2 is proposed. In Figure 2, dotted lines connect PSNRfull and PSNRpeak. Within the range of sa from 1 to sapeak, the dotted lines are very close to the measured data. Based on this proximity, the equation for PSNRcoding_down in Equation (1) is reformulated as a simplified model for low bitrate control as shown in Equation (2) in which α and β are positive and α represents the slope in the modeled graphs. If the information for α, PSNRfull, and PSNRpeak are given, the sapeak can be estimated using the linear model in Equation (2).

PSN R coding_down_low R = - α s a + β .

The slope of the model, α, can be estimated using the PSNR and sa values obtained from encoded frames. Let PSNRprev and saprev denote the PSNR and sa of the previous GOP, respectively, and let PSNRfull be obtained when the first GOP is encoded with sa = 1. Based on the linear model in Equation (2), PSNRprev and PSNRfull are expressed as given in Equations (3) and (4), respectively.

PSNR prev = - α s a prev + β
PSNR full = - α 1 + β

If (4) is subtracted from (3), an estimate of α, denoted by αest, is obtained by (5). The slope αest obtained from (5) is the α value delayed by one GOP. It is based on the assumption that spatial resolution of the current frame can be applied to the next GOP, due to similarity between successive GOPs.

α est = ( PSN R prev - PSN R full ) 1 - s a prev

PSNRpeak cannot be simply determined because it varies with both the image content and the target bitrate. In order to estimate PSNRpeak, PSNRpeak/PSNRfull ratios of 12 video sequences are measured under various target bitrates and at various values of slope α. The used videos are as follows: Akiyo and Coast Guard, with CIF (352 × 288) resolution, City, Crew, and Ice, with 4CIF (704 × 576) resolution, Aspen, Factory, Old Town Cross, Parkrun, and Pedestrian Area with HD (1280 × 720) resolution and West Windy Easy and Touchdown Pass, both with full HD (1920 × 1080) resolution are used in the evaluation. A sample of such results is shown in Figure 3 which shows that PSNRpeak/PSNRfull ratio is approximately proportional to the slope α regardless of video sequence types and bitrates. From this observation, PSNRpeak is calculated as given in (6), with the coefficients being chosen experimentally.

Figure 3
figure 3

The ratio of PSNR peak to PSNR full depending on α.

PSN R peak = ( 0 . 03 × PSN R full × ( α - 0 . 5 ) ) + PSN R full

Based on the linear model in (2), sapeak is calculated within the range from 0.1 to 1 as

s a peak = 1 - ( PSN R peak - PSN R full ) α (0 .1  s a peak 1),

where α and PSNRpeak are obtained from (5) and (6), respectively.

Slope α cannot be estimated before the second GOP because PSNRprev and saprev in (5) are not given yet. In this case, the initial value of α is set to a small value from 0.1 to 0.3. This initial α is referred to as αinit. If PSNRpeak in (6) is substituted in (7), sapeak is in inverse proportional to α. Thus, if a small αinit is used in (7), sapeak value is relatively large and thus, the PSNR degradation caused by an excessive down-sampling operation can be avoided. Once PSNRprev and saprev are obtained from the results for the second GOP encoded with αinit, then αest calculated from (5) is used as follows:

α = α init for the second GOP α est , after the second GOP

The target spatial resolution ratio, satarget, is adjusted only when the video quality is lower than an acceptable level. Let PSNRtarget denote the PSNR for the acceptable image quality as required by users or applications. It is assumed that the reconstructed image is visually indistinguishable from the original one if the PSNR is greater than from 35 to 40 dB [4043]. Thus, PSNRtarget is set to 40 dB in this article. If PSNRfull is larger than PSNRtarget, no spatial resolution adjustment is necessary, that is, the target spatial resolution ratio is set to 1. Otherwise, satarget is adjusted to the sapeak as in (7). Thus, satarget is given by

s a target = 1 , if (PSN R full PSNR target  ) s a peak , else

3.3. The proposed bitrate control algorithm

Figure 4 shows the proposed bitrate control algorithm in which the QP and the spatial resolution ratio are determined sequentially in order to reach the PSNRtarget. The QP value is decided frame-by-frame, whereas the spatial resolution ratios are determined for each GOP. At initialization, the PSNRtarget is defined, and the algorithm step is started at Step 1. In the first GOP of Step 1, the spatial resolution ratio is not changed but only the QP is controlled to meet the target bitrate like a conventional QP-based rate control algorithm. If the target bitrate denoted by bitratetarget in Figure 4 cannot be satisfied in Step 1, the spatial resolution for the next GOP is simply down-sampled by a factor of 2, compared to that of the current GOP because meeting the bitratetarget is paramount. The encoding for the next GOP is started from Step 1 with a half-reduced spatial resolution. If the generated bitrate of the GOP in Step 1 meets the bitratetarget in Figure 4, Step 1 proceeds to Step 2. In Step 2, the sapeak and PSNRpeak for the GOP to be encoded are calculated from (6) and (7), where the PSNR obtained from Step 1 and αinit are used. The satarget is then determined from (9). In Step 3, the PSNR obtained from Step 2 is evaluated and used as PSNRprev in (5) to adjust the proper α. Using the adjusted α denoted by αest, the sapeak, PSNRpeak, and satarget for the GOP in Step 3 are calculated from (6), (7), and (9), respectively. The satarget, determined once in Step 3, is used continually for the subsequent GOPs in Step 4. As long as the R-D characteristics of successive frames are similar, encoding with a satarget works well. To cope with varying R-D characteristics, actions for bitrate change and QP change are described in Figure 4. If bitratetarget is changed, the relation between the spatial resolution and the PSNR becomes different, thus the slope αest value needs to be refreshed through Steps 1, 2, and 3. Before going back to Step 1, the full size is set to 1 which is a full resolution with no reduction of spatial resolution when the increase of bitratetarget is greater than THBR. If the decrease of bitratetarget is greater than THBR, the full size is set to the current satarget. A new satarget for the decreased target bitrate will be determined as a value less than the current satarget. In this article, THBR is calculated by using 0.02 × frame rate × original spatial resolution. Even though the bitratetarget is the same, the motion characteristics of video can be changed. If the motion is faster, the average QP value of the recent frames becomes higher than that of the previous ones, and vice versa. Therefore, satarget is adjusted in a fine-grain manner by the change in the average QP. As shown in Figure 4, the whole flow to decide the proper spatial resolution is processed automatically and does not depend on advanced information about characteristics of the video content and the specific coding methods. Therefore, the proposed algorithm can easily be applied to real-time applications.

Figure 4
figure 4

Flow chart representing the steps in the proposed bitrate control algorithm.

4. Experimental results

The proposed bitrate control scheme is implemented and integrated into the JM 13.2 reference software which adopts the QP-based bitrate control. To resize the spatial resolution, the up/down-sampling algorithms recommended in the SVC are used. For down-sampling, the algorithm based on the Sine-windowed Sinc-function is applied where a set of seven filters is used to support the extended range of the spatial scaling ratio. For up-sampling, the SVC normative up-sampling algorithm is applied which is based on a set of 6-taps filters derived from the Lanczos-3 filter. In this experiment, five HD video sequences, Pedestrian Area, Tractor, Station2, Sunflower, and Blue Sky, two full HD video sequences, Speed Bag and Life and two 4CIF videos, Harbor and Soccer, are used. The chosen length of a GOP is 30 frames and 150 frames are encoded. The GOP structure is IPPP.

Table 2 shows the average PSNR and VQM values of Y component for the fourth and the fifth GOPs. In the second column, 'Conventional' represents that the nine video sequences are encoded in the full size without spatial resolution control. When the target bitrate cannot be satisfied even with the maximum QP value, the frame rate is decreased by a half. The frame drop is realized by a factor of 2 in the encoder side which encodes all the macroblocks as the SKIP mode in the dropped frame. Thus, in the decoder side, frames are displayed at 30 fps with the dropped frame by a repetition of the previous frame. 'Proposed' and 'Optimal' represent, respectively, the proposed spatial resolution control and the optimal spatial resolution values. The optimal values are obtained experimentally by changing the resolution ratios from 0.1 to 1 and by measuring the PSNR values. Unlike a full size encoding, the rate control schemes represented by 'Proposed' and 'Optimal' do not use a frame-drop to meet the target bitrate. Note that the optimal resolution is a theoretical upper-bound of the spatial control and cannot be calculated on the fly. Experiments are conducted using these three rate control schemes with various bitrates: 250, 400, 600, 800, and 1000 kbps for HD, 400, 800, 1500, and 2000 kbps for full HD and 150, 300, 400, and 1000 kbps for 4CIF. In the result of HD with the high target bitrates of 1000 kbps, PSNR and VQM enhancements by the proposed rate control are 0.82 and 0.12 dB, respectively. As the target bitrate decreases, the VQM enhancement by the proposed rate control is increased by 5.15. Note that the PSNR difference between the proposed rate control and the conventional QP control is not large at the target bitrates of 400 and 250 kbps, whereas the PSNR enhancement by the proposed algorithm is 1.85 dB at the target bitrate of 600 kbps. This is because the frame rate is decreased in the ultra-low bitrate in the case of the conventional algorithm. Thus, at the target bitrates of 400 and 250 kbps, the PSNR of each frame encoded with the conventional algorithm is a little enhanced because the allocated bits per frame are increased. In this experiment, the optimal spatial resolution is chosen because it has the highest PSNR value than the other resolutions. The VQM value with the optimal spatial resolution could be a little worse than the one calculated from the proposed bitrate control like the cases at bitrate 250 and 800 kbps because of the difference between the two calculation methods, PSNR and VQM. However, the optimal PSNR values are always higher than the results from the conventional or the proposed bitrate control. For full HD and 4CIF as well as HD, the PSNR and VQM of the proposed spatial resolution are very close to that of the optimal one as shown in Table 2.

Table 2 Comparison of the PSNR and VQM among the conventional control, the proposed spatial resolution and the optimal spatial resolution at various target bitrates

Figure 5 presents the R-D performance for three video sequences when using the proposed spatial resolution control. Both objective (PSNR) and subjective (VQM) results are presented to show the performance of the proposed algorithm. The gray solid curve represents the results of encoding with optimal spatial resolution whereas the black solid curve represents the results when the proposed algorithm is applied. The conventional rate control is represented by the black dotted curve. In Figure 5a-c, at the target bitrate of 600 kbps, the proposed algorithm enhances PSNR by 2.2, 3.1, and 3.4 dB, respectively, compared to results using the conventional rate control. Furthermore, the differences between 'Optimal' and 'Proposed' are negligible. As the target bitrate increases, PSNR improvement decreases. When the target bitrate is extremely low, such as 250 and 400 kbps, the allocated bits per pixel (bpp) are just 0.009 and 0.014 bpp, respectively. A conventional QP-based bitrate control cannot meet the target bitrate, even when using the maximum QP value. In the 'Conventional' method of Figure 5, frames are dropped to meet the target bitrate. Thus, video sequences are encoded at 15 and 7.5 fps for 400 and 250 kbps, respectively, whereas the frame rate is 30 fps in 600, 800, and 1000 kbps. For three video sequences, Sunflower, Pedestrian Area, and Tractor at 400 kbps, frame skips work for PSNR enhancement, to a limited amount, because the allocated bits per pixel are increased. However, additional frame skips, used to meet the target bitrate of 250 kbps, do not help the quality enhancement of each frame for the Sunflower and Tractor videos. Because the temporal correlation between frames becomes low, it results in low compression efficiency. The PSNR of the Pedestrian Area video is increased a little more at 250 kbps. In general, the results depend on the characteristic of the video sequence. In Figure 5d-f, VQMs clearly show that the proposed algorithm produces a significant improvement compared to the conventional rate control. The proposed rate control algorithm maintains a similar VQM quality from 1000 to 250 kbps, while the VQM of the conventional rate control increases drastically as the target bitrate decreases.

Figure 5
figure 5

R-D performance of the proposed spatial resolution control: (a) PSNR for Sunflower, (b) PSNR for Pedestrian Area, (c) PSNR for Tractor, (d) VQM for Sunflower, (e) VQM for Pedestrian Area, and (f) VQM for Tractor.

In Table 3, the average values of sa, as determined in the experiments summarized in Table 2, are shown. As the target bitrate increases, the 'Optimal' sa also increases. The values of sa determined by the 'Proposed' algorithm are very similar to those from the 'Optimal' one. In this experiment, the difference between 'Optimal' and 'Proposed' sa values is, on average, just 0.05.

Table 3 Comparison of the SA between the proposed spatial resolution and the optimal spatial resolution at various target bitrates

In Figure 6, the values of sa for each GOP are presented for three HD video sequences, Sunflower, Pedestrian Area, and Tractor, which are encoded with various bitrates. In Figure 6a-c, the vertical axis shows sa while the horizontal axis shows the GOP number. Step 1 of Figure 4 starts from GOP 1. For 400 and 250 kbps, the extremely low bitrate control is already carried out to meet the target bitrate, as explained in Figure 4. Thus, the full sizes in GOP 1 for 400 and 250 kbps are 0.5 and 0.25, respectively. Through the proposed spatial resolution adjustment from GOP 1 to GOP 3, the sa is determined and stabilized. In Figure 6d-f, the sa for each GOP is compared to the optimal one. The vertical axis in Figure 6d-f shows the difference between the proposed and optimal sa values. To determine the difference, the optimal sa is subtracted from the proposed one. As shown in these graphs, the differences are very small.

Figure 6
figure 6

Spatial resolution for each GOP, and a comparison between the proposed and optimal spatial resolutions with various bitrates for (a) sa for Sunflower, (b) sa for Pedestrian Area, (c) sa for Tractor, (d) difference of sa for Sunflower, (e) difference in sa for Pedestrian Area, and (f) difference in sa for Tractor.

Figures 7 and 8 show the 76th frame of the Sunflower sequence and the 135th frame of the Pedestrian Area sequence, respectively. The result of applying the conventional bitrate control, which uses only QP change, is shown in Figures 7a and 8a. As observed in these figures, much of the details in the frame is destroyed or decreased. In Figures 7b and 8b, QP and spatial resolution are controlled by the proposed rate control and, subjectively, the quality is better than the quality in Figures 7a and 8a.

Figure 7
figure 7

76th frame image of Sunflower sequence with 600 kbps: (a) conventional rate control, (b) proposed rate control.

Figure 8
figure 8

135th frame image of Pedestrian Area sequence with 600 kbps: (a) conventional rate control; (b) proposed rate control.

Figure 9 shows the performance of the proposed algorithm when the motion characteristics are changed during encoding operations. For that figure, 300 frames of the 4CIF Soccer video sequence are encoded. The length of the GOP is 30. The values of sa and PSNR for each GOP are presented in Figure 9a, b, respectively. In that figure, the dotted graphs represent the results obtained from the proposed algorithm while the solid graphs represent the optimal values. In the 300 frames of Soccer, the first half has a slower motion than the second half. Until GOP 5, the sa value is determined to be 0.26 by the proposed algorithm. After that, the motion becomes faster, and consequently, the QP values are increased. Therefore, the sa value for the second half of the sequence is adjusted downward. In Figure 9a, sa is set to 0.19 for GOPs 6, 7, 8, and 9 and 0.13 for GOP 10. From GOP 3 to GOP 10, the differences between 'Proposed' and 'Optimal' values are just 0.02, on average. In Figure 9b, the PSNR values for 'Proposed' and 'Optimal' cases are compared and the differences are negligible.

Figure 9
figure 9

Performance of the proposed spatial resolution control when the motion characteristics are changed for Soccer in terms of (a) sa and (b) PSNR.

In this experiment, the performance of the proposed algorithm is tested when the target bitrate is changed in the middle of the encoding process. The 4CIF Harbor video sequence is used as an example. In Figure 10a, b, the target bitrate for the first five GOPs is 300 kbps, whereas the target bitrate for the last five GOPs is 1500 kbps. In Figure 10a, the optimal spatial resolutions are 0.25 and 0.66 for the first and last half, respectively. In the proposed algorithm, sa is chosen initially as 0.5 because the 300 kbps target bitrate is extremely low; subsequently, the sa is determined to be 0.2 by the proposed algorithm. From GOP 5 onward, the target bitrate is increased to 1500 kbps. Thus, sa is set to 1 in GOP 5 and the process to determine the proper spatial resolution is restarted. On the basis of the bitrate change action explained in Figure 4, sa for the 1500 kbps target bitrate is determined to be 0.5. In Figure 10b, the PSNR values for the 'Proposed' and 'Optimal' cases are compared and the difference is 0.5 dB, on average. In Figure 10c, d, the target bitrate is reduced from 1000 to 500 kbps. For the first five GOPs, sa is decreased to 0.5 for 1000 kbps by the proposed algorithm. For the following five GOPs, sa is reduced because the target bitrate is reduced to 500 kbps. As shown in Figure 10c, the shape of the dotted graph obtained from the proposed algorithm closely follows the trend of the optimal graph. The difference between the two PSNR values in Figure 10d is just 0.28 dB.

Figure 10
figure 10

Performance of the proposed spatial resolution control when the target bitrate is changed for Harbor in terms of (a) sa change when the target bitrate is changed from 300 to 1500 kbps, (b) PSNR when the target bitrate is changed from 300 to 1500 kbps, (c) sa change when the target bitrate is changed from 1000 to 500 kbps, and (d) PSNR when the target bitrate is changed from 1000 to 500 kbps.

5. Conclusion

The main contribution of this article is a real-time bitrate control algorithm using spatial down-sampling for the low bitrate encoding. The previous resolution control schemes are too complex to be processed at run time. In this article, a simple estimation model which defines the relationship between the PSNR and the spatial resolution ratio is presented for low bitrate compression. This estimation model is used to find the resolution ratio for acceptable quality on the fly for real-time systems. Two scalability tools for the QP and spatial resolution ratio are determined sequentially to reach the target PSNR. Experimental results show that the proposed bitrate control algorithm is close to the optimal solution and yields the better PSNR and VQM quality at various bitrates compared to the conventional QP-based bitrate control algorithm.


  1. Li ZG, Pan F, Lim KP, Rahardja S: Adaptive rate control for H.264. Proc IEEE Int Conf Image Processing Singapore 2004, 2: 745-748.

    Google Scholar 

  2. Kamaci N, Altunbasak Y, Mersereau RM: Frame bit allocation for the H.264/AVC video coder via Cauchy-density-based rate and distortion models. IEEE Trans Circ Syst Video Technol 2005, 15(8):994-1006.

    Article  Google Scholar 

  3. Ma S, Gao W, Lu Y: Rate-distortion analysis for H.264/AVC video coding and its application to rate control. IEEE Trans Circ Syst Video Technol 2005, 15(12):, 1533-1544.

    Article  Google Scholar 

  4. He Z, Kim Y-K, Mitra SK: Low-delay rate control for DCT video coding via ρ-domain source modeling. IEEE Trans Circ Syst Video Technol 2001, 11(8):928-940. 10.1109/76.937431

    Article  Google Scholar 

  5. Yuan W, Lin S, Zhang Y, Yuan W, Luo H: Optimum bit allocation and rate control for H.264/AVC. IEEE Trans Circ Syst Video Technol 2006, 16(6):705-715.

    Article  Google Scholar 

  6. Kwon D-K, Shen M-Y, Jay Kuo C-C: Rate control for H.264 video with enhanced rate and distortion models. IEEE Trans Circ Syst Video Technol 2007, 17(5):517-529.

    Article  Google Scholar 

  7. An C, Nguyen TQ: Iterative rate-distortion optimization of H.264 with constant bit rate constraint. IEEE Trans Image Process 2008, 17(9):1605-1615.

    Article  MathSciNet  Google Scholar 

  8. Sullivan G, Wiegand T, Lim K-P: Joint model reference encoding methods and decoding concealment methods. Section 2.6 rate control JVT-I049 2003.

    Google Scholar 

  9. Ma S, Gao W, Wu F, Lu Y: Rate control for JVT video coding scheme with HRD considerations. Proc IEEE ICIP Spain 2003, 3: 793-796.

    Google Scholar 

  10. Yu HT, Pan F, Lin ZP: A new bit estimation scheme for H.264 rate control. Proc IEEE Int Symp Consumer Electronics, UK 2004, 396-399.

    Google Scholar 

  11. Yang X, Lin W, Lu Z, Lin X, Rahardja S, Ong E, Yao S: Rate control for videophone using local perceptual cues. IEEE Trans Circ Syst Video Technol 2005, 15(4):496-507.

    Article  Google Scholar 

  12. Liu Y, Li ZG, Soh YC: Region-of-interest based resource allocation for conversational video communication of H.264/AVC. IEEE Trans Circ Syst Video Technol 2008, 18(1):134-139.

    Article  Google Scholar 

  13. Li H, Wang Z, Cui H, Tang K: An improved ROI-based rate control algorithm for H.264/AVC. Proc Int Conf Signal Processing China 2006, 2: 16-20.

    Google Scholar 

  14. Hrarti M, Saadane H, Larabi M, Tamtaoui A, Aboutajdine D: A macroblock-based perceptually adaptive bit allocation for H264 rate control. Proc Int Symposium on I/V Communications and Mobile Network, Morocco 2010, 1-4.

    Google Scholar 

  15. Huang C-M, Lin C-W: A novel 4-D perceptual quantization modeling for H.264 bit-rate control. IEEE Trans Multimed 2007, 9(6):1113-1124.

    Article  Google Scholar 

  16. Meng Q, Meng Q: Improved macroblock-level rate control algorithm with visual properties. Proc Int Workshop Intelligent Systems and Applications, China 2010, 1-5.

    Google Scholar 

  17. Cui Z, Zhu X: SSIM-based content adaptive frame skipping for low bit rate H.264 video coding. Proc Int Conf Communication Technology, China 2010, 484-487.

    Google Scholar 

  18. Ou T-S, Huang Y-H, Chen HH: SSIM-based perceptual rate control for video coding. IEEE Trans Circ Syst Video Technol 2011, 21(5):682-691.

    Article  Google Scholar 

  19. Jin R, Chen J: The coding rate control of consistent perceptual video quality in H.264 ROI. Proc Int Symposium Computer Network and Multimedia Technology, China 2009, 1-4.

    Google Scholar 

  20. Wolf S, Pinson M: VQM software and measurement techniques. National Telecommunications and Information Administration Report 2002.

    Google Scholar 

  21. Pan F, Lin X, Rahardja S, Lim KP, Li ZG, Wu DJ, Wu S: Proactive frame-skipping decision scheme for variable frame rate video coding. Proc Int Conf Multimedia and Expo Taiwan 2004, 3: 1903-1906.

    Google Scholar 

  22. Pan F, Lin ZP, Lin X, Rahardja S, Juwono W, Slamet F: Adaptive frame skipping based on spatio-temporal complexity for low bit-rate video coding. J Vis Commun Image R 2006, 17(3):554-563. 10.1016/j.jvcir.2005.07.006

    Article  Google Scholar 

  23. Jun J, Lee S, He Z, Lee M, Jang ES: Adaptive key frame selection for efficient video coding. LNCS 2007, 4872: 853-866.

    Google Scholar 

  24. Liu S, Kuo CJ: Joint temporal-spatial bit allocation for video coding with dependency. IEEE Trans Circ Syst Video Technol 2005, 15(1):15-26.

    Article  Google Scholar 

  25. Vetro A, Wang Y, Sun HF: Rate-distortion optimized video coding considering frameskip. Proc Int Conf Image Processing Greece 2001, 3: 534-537.

    Google Scholar 

  26. Kim J, Kim Y-G, Song H, Kuo T-Y, Chung YJ, Kuo C-CJ: TCP-friendly Internet video streaming employing variable frame-rate encoding and interpolation. IEEE Trans Circ Syst Video Technol 2000, 10(7):1164-1177. 10.1109/76.875520

    Article  Google Scholar 

  27. Song H, Kuo C-CJ: Rate control for low-bit-rate video via variable-encoding frame rates. IEEE Trans Circ Syst Video Technol 2001, 11(4):512-521. 10.1109/76.915357

    Article  Google Scholar 

  28. Thaipanich T, Wu P-H, Kuo C-CJ: Low complexity algorithm for robust video frame rate up-conversion (FRUC) technique. IEEE Trans Consum Electron 2009, 55(1):220-228.

    Article  Google Scholar 

  29. Jackson AHAM, McEwan R, Mullin J: Impact of video frame rate on communicative behavior in two and four party groups. Proc ACM Conf Comput Supported Cooperative Work, Philadelphia, PA 2000, 11-20.

    Google Scholar 

  30. Information technology - Coding of Audio-visual Objects - Part 2: Visual International Organization for Standardization 2000. ISO/IEC 14496-2:1999/Amd.1:2000(E)

  31. Cote G, Erol B, Gallant M, Kossentini F: H.263+: video coding at low bit rates. IEEE Trans Circ Syst Video Technol 1998, 8(7):849-866. 10.1109/76.735381

    Article  Google Scholar 

  32. Segall CA, Elad M, Milanfar P, Webb R, Fogg C: Improved high-definition video by encoding at an intermediate resolution. Proc Conf Visual Communications and Image Processing USA 2004, 5308: 1007-1018.

    Google Scholar 

  33. Bruckstein AM, Elad M, Kimmel R: Down-scaling for better transform compression. IEEE Trans Image Process 2003, 12(9):1132-1145. 10.1109/TIP.2003.816023

    Article  MathSciNet  MATH  Google Scholar 

  34. Ilgin HA, Chaparro LF: Low bit rate video coding using DCT based fast decimation/interpolation and embedded zero tree coding. IEEE Trans Circ Syst Video Technol 2007, 17(7):833-844.

    Article  Google Scholar 

  35. Nguyen VA, Tan YP, Lin WS: Adaptive downsampling/upsampling for better video compression at low bit rate. Proc of Int Symposium on Circuits and Systems, USA 2008, 1624-1627.

    Google Scholar 

  36. Shu HY, Chau LP: The realization of arbitrary downsizing video transcoding. IEEE Trans Circ Syst Video Technol 2006, 16(4):540-546.

    Article  Google Scholar 

  37. Tan Y-P, Liang Y, Sun H: On the methods and performances of rational downsizing video transcoding. Signal Process Image Commun 2004, 19: 47-65. 10.1016/j.image.2003.08.017

    Article  Google Scholar 

  38. Wang R-J, Chien M-C, Chang P-C: Adaptive down-sampling video coding. Proc SPIE Multimedia on Mobile Devices 2010, 7542: 1-8.

    Google Scholar 

  39. Lee H, Lee Y, Lee J, Lee D, Shin H: Design of a mobile video streaming system using adaptive spatial resolution control. IEEE Trans Consum Electron 2009, 55(3):1682-1689.

    Article  Google Scholar 

  40. Sirhindi R, Murtaza S, Afzal M: Improved data hiding technique for shares in extended visual secret sharing schemes. LNCS Inf Commun Secur 2008, 5308: 376-386. 10.1007/978-3-540-88625-9_25

    Google Scholar 

  41. Samuel S, Penzhorn WT: Digital watermarking for copyright protection. Proc Conf AFRICON 2004, 2: 953-957.

    Google Scholar 

  42. Kasmani SA, Naghsh-Nilchi A: A new robust digital image watermarking technique based on joint DWT-DCT transformation. Proc Int Conf on Convergence and Hybrid Information Technology, Korea 2008, 539-544.

    Google Scholar 

  43. Liu R, Wang G, Wang P, Huang W: An image authentication scheme based on sliding window. Proc Conf Control and Decision, China 2008, 2937-2940.

    Google Scholar 

Download references


This study was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (2011-0027502).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jin-Sung Kim.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Rhee, C.E., Kim, JS. & Lee, HJ. Bitrate control using a heuristic spatial resolution adjustment for a real-time H.264/AVC encoder. EURASIP J. Adv. Signal Process. 2012, 87 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: