- Research
- Open Access

# Bitrate control using a heuristic spatial resolution adjustment for a real-time H.264/AVC encoder

- Chae Eun Rhee
^{1}, - Jin-Sung Kim
^{2}Email author and - Hyuk-Jae Lee
^{1}

**2012**:87

https://doi.org/10.1186/1687-6180-2012-87

© Rhee et al; licensee Springer. 2012

**Received:**19 July 2011**Accepted:**23 April 2012**Published:**23 April 2012

## Abstract

Conventional bitrate control algorithms that change only the quantization parameter (QP) often suffer from quality degradation when the target bitrate is very low. Therefore, rate control algorithms that adjust spatial resolution in addition to QP control have recently been proposed, but their computations are too complex to be processed in real time. This research proposes a very simple, but effective, rate control algorithm that employs spatial resolution control as well as the existing QP-based bitrate control. The spatial resolution ratio for the best peak signal-to-noise ratio (PSNR) is calculated using a simple estimation model which defines the relationship between the PSNR and the spatial resolution at very low bitrate compression. In the proposed bitrate control algorithm, two scalability tools for adjusting the QP and the spatial resolution ratio are used sequentially to reach the target PSNR and the control decision is made for a group of pictures. Experimental results show that the proposed bitrate control algorithm approximates an optimal solution and yields a better subjective quality as well as objective quality at various bitrates compared to the conventional QP-based bitrate control algorithm. The decision of the control parameters requires very small computational complexity and is made in a completely automatic manner so that the proposed algorithm is well suited for real-time applications.

## Keywords

- H.264/AVC
- bitrate control
- down-sampling
- spatial resolution control

## 1. Introduction

H.264/AVC standard is widely used in video streaming, video communication, and various mobile video applications due to its high compression efficiency through the use of many advanced tools. In recent video applications, the video resolution tends to be larger and, thus, the bandwidth requirement for video transmission also increases. As a result, the importance of the bitrate control is growing as it is necessary to regulate the bitrate of a video stream in order to achieve the target bitrate. In bitrate control algorithms, the target bits are allocated at the frame-level or the macroblock-level by considering the fullness of the output buffer and the encoding complexity. While achieving the allocated bit budget, rate control attempts to make quality compromises in ways that would have a minimum degradation on perceived through controlling quantization parameter (QP) values. When the QP is small, much of the information is preserved and when the QP is large, much of the information is discarded to reduce the bitrate at the cost of an increased distortion.

To achieve a target bitrate in real constraint situations, such as within a specific channel bandwidth or within defined encoder and decoder buffer sizes, QP values must vary dynamically based on the complexity of the input video and the current bitrate. QP values are determined using a rate-quantization (R-Q) model where the generated bitrate is modeled as a function of the QP and the complexity of the residual signal such as the mean absolute difference (MAD). In H.264/AVC, QPs are used not only in rate control, but also in rate distortion optimization (RDO). However, the famous "chicken-egg" dilemma complicates the selection of the QP value. The function for the RDO needs a pre-determined QP for its lambda factor. However, the QP can be determined by the MAD, which is only available after the RDO is obtained. To resolve that dilemma, the MAD is predicted from the previous frame using a linear function [1]. The estimated MAD from the previous frame is often different from the actual MAD in the current frame. Thus, an inadequate QP may be selected. For improved coding efficiency, enhanced distortion models [2–6] are developed and rate-distortion (R-D) model is optimized or contents-aware bit allocation is proposed [5–10]. A number of recent approaches incorporate characteristics of the human visual system (HVS) into bitrate control. Moreover, there are several reports on region-of-interest (ROI)-based bit allocation [11–13]; such approaches can potentially improve the perceived visual quality of images. In addition to ROI-based methods, a new R-D model, along with frame skipping and bit-allocation schemes, using various perceptual metrics that are based on the characteristics of HVS, is proposed [14–19]. However, the conventional bitrate control algorithms based on QP control often suffer from perceptible image quality degradations, such as blocking, ringing, or texture-deviation artifacts, when the target bitrate is very low. Furthermore, when the target bitrate is not satisfied, even with the maximum QP value, the conventional rate control cannot avoid a sudden frame drop which results in video quality degradation.

This article proposes a simple but effective bitrate control algorithm that applies spatial resolution controls to the conventional QP-based bitrate control. In the proposed algorithm, a new model that represents the relationship between the spatial resolution and the peak signal-to-noise ratio (PSNR) for low bitrate coding is proposed. By using the proposed model, the spatial resolution that gives the highest PSNR at a given bitrate is estimated. In the proposed model, the computational complexity is very low as it only requires a small number of parameters and the value of parameters is obtained in a heuristic manner. The two scalability tools, the QP and the spatial resolution, are processed sequentially to achieve the target PSNR. The proposed spatial resolution adjustment which is applied in the group of pictures (GOP)-level is used as a coarse-grain bitrate control. Inside a GOP, QP is changed by a conventional bitrate control to meet the allocated bits in a fine-grain manner. Thus, the target bitrate is satisfied with a combination of two control methods. To estimate the perceptual quality of encoded video sequences with reduced spatial resolution, Video Quality Metric (VQM) software [20] is used to measure the subjective quality in addition to the PSNR which measures the objective quality. The VQM computes the perceptual effects of video impairments including blurring, jerky/unnatural motion, global noise, block distortion and color distortion, and combines them into a single metric. Experimental results show that proposed bitrate control scheme outperforms the conventional QP-based bitrate control algorithm at a variety of bitrates. At a low bitrate, the PSNR and VQM values with the proposed spatial resolution control scheme are improved up to 1.85 and 5.15 dB, respectively, when compared to that with the conventional QP-only control. There is only a small difference between the real optimal spatial resolution and the spatial resolution obtained using the proposed scheme.

This article is organized as follows. In Section 2, background is presented and Section 3 explains the proposed bitrate control algorithm. Experimental results are presented in Section 4 and conclusions are given in Section 5.

## 2. Background

### 2.1. Previous study on spatial and temporal resolution controls

To improve flexibility in bit allocation, some rate control algorithms adjust frame rate or spatial resolution. When the frame rate decreases, additional bits can be allocated to each frame and frame image quality can be improved. However, frame skipping should be done very carefully because motion artifacts such as flickering or motion jerkiness may degrade subjective video quality. In [21, 22], the decision for the frame skip is based on buffer fullness and the spatial and temporal quality of the video. In [23], the similarity between successive frames measured by the PSNR is used to skip frames adaptively. To optimize coding performance, frames to be skipped are determined based on an R-D model [24, 25], but these works cannot be used in real-time applications. In [26–28], motion artifacts are reduced by adjusting the frame rate gradually based on the motion activity of the previous sub-GOP which is expressed by the histogram of difference image, thereby preserving motion smoothness. Even though a number of previous works have contributed to frame rate controls, the effect of providing bitrate control through frame rate adjustment is somewhat limited because, to avoid motion artifacts, the frame rate cannot be reduced below a certain value. In addition, temporal scalability may not be very effective for increasing subjective quality by temporal to spatial bitrate exchange. This is because the quality degradation due to dropping a frame is easily perceived, especially in low frame rate communications such as that used in two-way multimedia communication for mobile devices [29].

Spatial resolution control is another approach used for bit allocation when the target bitrate is very low. The dynamic resolution conversion (DRC) mode, which is supported in the advanced simple profile of MPEG-4, enables the video object plane to be encoded with reduced spatial resolution [30]. Similarly, a reduced resolution update (RRU) coding tool is adopted in Annex Q of the H.263 standard [31]. The RRU reduces the bitrate by coding the prediction error residuals at a reduced spatial resolution. However, the DRC and RRU techniques are not included in the H.264/AVC standard. Meanwhile, a number of previous studies have reported a relationship between down-sampling and video quality at a low bitrate. In [32], the optimum down-sampling ratio is determined according to the bitrate. In [33], it is reported that a down-sampled video, prior to compression and later up-sampled, visually outperforms that video compressed directly at high resolution with the same number of bits, when the target bitrate is very low. With this observation, a method to find the optimal down-sampling rate is suggested. These schemes are exploited by JPEG image compression standard not by video compression standards [29]. In [34, 35], discrete cosine transform (DCT) coefficients are decimated prior to quantization to reduce spatial resolution, but modifying the coding loop loses the conformation to the syntax compatibility for the video coding standard. In video transcoders, spatial resolution control has been an important factor in meeting a different target bitrate [36]. However, many previous works have focused on simplifying the computations in transcoding. In [37], the linear R-Q model is proposed to select the proper frame size, but it is not applicable to practical applications. Recently, in [38], the overall distortion is analyzed and the optimal spatial resolution is derived for a given bitrate. In [39], the spatial resolution ratio is appropriately selected, according to picture quality, bit rate, and power consumption. As shown in the above-mentioned works, in order to accomplish bitrate control by spatial resolution control, it is very important to select the optimal spatial resolution. When the selected spatial resolution is greater than the optimal one, the image quality can be degraded by using an excessively high QP value, while the quality can be degraded by aliasing artifacts when the selected spatial resolution is less than the optimal one. Nonetheless, it is difficult to find the optimal resolution because complex estimation models, which define the relationships among picture quality, bitrate, spatial resolution, and power consumption, are used in those previous works. Moreover, parameters used in the previous methods depend on the characteristics of the video content and the specific coding methods; thus, they cannot be calculated in real time. Therefore, the previous works cannot be applied to "on the fly", real-time rate control.

### 2.2. Comparison between spatial and temporal resolution controls

*s_drop*) or a dyadic temporal resolution reduction (denoted by

*t_drop*) is applied. To implement

*t_drop*, every alternate frame is dropped; subsequently, the undropped frames are repeated twice to replace the dropped frames. The VQM value of the Y component is measured to evaluate the subjective quality where lower VQM values represent a better subjective quality. A sequence-to-sequence comparison is made. To get the VQM of

*t_drop*, the sequence which consists of the dropped and repeated frames as well as the undropped frames is compared with the original sequence. As shown in Table 1,

*s_drop*always outperforms

*t_drop*. This experiment shows that increasing the bitrate per frame by lowering the frame rate for a spatial quality does not lead to a higher subjective quality than that from a method using spatial scalability for any sequence.

Comparison of VQM values between spatial and temporal drops at low target bitrates

Target bitrate (kbps) | Sequences | |||||
---|---|---|---|---|---|---|

Bluesky | Pedestrian area | Station 2 | Sunflower | Tractor | ||

VQM (Y) | ||||||

| 400 | 6.94 | 4.75 | 2.48 | 3.78 | 4.29 |

250 | 11.33 | 8.02 | 4.78 | 7.9 | 6.86 | |

| 400 | 3.03 | 2.51 | 2.01 | 2.16 | 3.25 |

250 | 3.46 | 2.69 | 2.44 | 2.4 | 3.18 | |

| 400 | 3.49 | 2.06 | 0.04 | 1.38 | 1.12 |

250 | 7.84 | 5.96 | 4.74 | 6.52 | 5.75 |

## 3. The proposed bitrate control with spatial scalability

### 3.1. Architecture of the target system

### 3.2. Spatial resolution control

_{coding_down}denote the PSNR of a video that has been down-sampled and encoded. Then, PSNR

_{coding_down}is formulated as

where *q* 1, *q* 2, and *q* 3 are constants which depend on the video content and *R* is the bitrate of the encoded stream. The term *sa*, referred to here as the spatial resolution ratio, represents the ratio of the down-sampled frame area to the original frame area. When *sa* is smaller than 1, the frame is down-sampled. Equation (1) describes the relationship between *sa* and PSNR and thus is used for calculating the optimal spatial resolution. However, the parameters such as *q* 1, *q* 2, and *q* 3 in Equation (1) depend on the video content and cannot be known prior to encoding. Thus, this optimal solution cannot be applied to real-time systems on the fly.

*Station2*, at various spatial resolution ratios and at three bitrates: 600 kbps, 1 Mbps, and 2 Mbps. In Figure 2, the solid curves show the PSNR values obtained by simulation. Each graph has a peak PSNR value at a certain

*sa*. The initial PSNR without spatial down-sampling is denoted by PSNR

_{full}(when

*sa*= 1). The peak PSNR of each graph is denoted by PSNR

_{peak}and the

*sa*that gives PSNR

_{peak}is denoted by

*sa*

_{peak}. In the graph for the 600 kbps bitrate in Figure 2, PSNR

_{full}is marked with a circle while PSNR

_{peak}and

*sa*

_{peak}are marked with a triangle and a rectangle, respectively. If the spatial resolution ratio is adjusted to

*sa*

_{peak}, then the highest PSNR is achieved for a given bitrate.

*sa*

_{peak}, a method for finding

*sa*

_{peak}based on a simplified model obtained from Figure 2 is proposed. In Figure 2, dotted lines connect PSNR

_{full}and PSNR

_{peak}. Within the range of

*sa*from 1 to

*sa*

_{peak}, the dotted lines are very close to the measured data. Based on this proximity, the equation for PSNR

_{coding_down}in Equation (1) is reformulated as a simplified model for low bitrate control as shown in Equation (2) in which α and β are positive and α represents the slope in the modeled graphs. If the information for α, PSNR

_{full}, and PSNR

_{peak}are given, the

*sa*

_{peak}can be estimated using the linear model in Equation (2).

*sa*values obtained from encoded frames. Let PSNR

_{prev}and

*sa*

_{prev}denote the PSNR and

*sa*of the previous GOP, respectively, and let PSNR

_{full}be obtained when the first GOP is encoded with

*sa*= 1. Based on the linear model in Equation (2), PSNR

_{prev}and PSNR

_{full}are expressed as given in Equations (3) and (4), respectively.

_{est}, is obtained by (5). The slope α

_{est}obtained from (5) is the α value delayed by one GOP. It is based on the assumption that spatial resolution of the current frame can be applied to the next GOP, due to similarity between successive GOPs.

_{peak}cannot be simply determined because it varies with both the image content and the target bitrate. In order to estimate PSNR

_{peak}, PSNR

_{peak}/PSNR

_{full}ratios of 12 video sequences are measured under various target bitrates and at various values of slope α. The used videos are as follows:

*Akiyo*and

*Coast Guard*, with CIF (352 × 288) resolution,

*City, Crew*, and

*Ice*, with 4CIF (704 × 576) resolution,

*Aspen, Factory, Old Town Cross, Parkrun*, and

*Pedestrian Area*with HD (1280 × 720) resolution and

*West Windy Easy*and

*Touchdown Pass*, both with full HD (1920 × 1080) resolution are used in the evaluation. A sample of such results is shown in Figure 3 which shows that PSNR

_{peak}/PSNR

_{full}ratio is approximately proportional to the slope α regardless of video sequence types and bitrates. From this observation, PSNR

_{peak}is calculated as given in (6), with the coefficients being chosen experimentally.

*sa*

_{peak}is calculated within the range from 0.1 to 1 as

where α and PSNR_{peak} are obtained from (5) and (6), respectively.

_{prev}and

*sa*

_{prev}in (5) are not given yet. In this case, the initial value of α is set to a small value from 0.1 to 0.3. This initial α is referred to as α

_{init}. If PSNR

_{peak}in (6) is substituted in (7),

*sa*

_{peak}is in inverse proportional to α. Thus, if a small α

_{init}is used in (7),

*sa*

_{peak}value is relatively large and thus, the PSNR degradation caused by an excessive down-sampling operation can be avoided. Once PSNR

_{prev}and

*sa*

_{prev}are obtained from the results for the second GOP encoded with α

_{init}, then α

_{est}calculated from (5) is used as follows:

*sa*

_{target}, is adjusted only when the video quality is lower than an acceptable level. Let PSNR

_{target}denote the PSNR for the acceptable image quality as required by users or applications. It is assumed that the reconstructed image is visually indistinguishable from the original one if the PSNR is greater than from 35 to 40 dB [40–43]. Thus, PSNR

_{target}is set to 40 dB in this article. If PSNR

_{full}is larger than PSNR

_{target}, no spatial resolution adjustment is necessary, that is, the target spatial resolution ratio is set to 1. Otherwise,

*sa*

_{target}is adjusted to the

*sa*

_{peak}as in (7). Thus,

*sa*

_{target}is given by

### 3.3. The proposed bitrate control algorithm

_{target}. The QP value is decided frame-by-frame, whereas the spatial resolution ratios are determined for each GOP. At initialization, the PSNR

_{target}is defined, and the algorithm step is started at Step 1. In the first GOP of Step 1, the spatial resolution ratio is not changed but only the QP is controlled to meet the target bitrate like a conventional QP-based rate control algorithm. If the target bitrate denoted by bitrate

_{target}in Figure 4 cannot be satisfied in Step 1, the spatial resolution for the next GOP is simply down-sampled by a factor of 2, compared to that of the current GOP because meeting the bitrate

_{target}is paramount. The encoding for the next GOP is started from Step 1 with a half-reduced spatial resolution. If the generated bitrate of the GOP in Step 1 meets the bitrate

_{target}in Figure 4, Step 1 proceeds to Step 2. In Step 2, the

*sa*

_{peak}and PSNR

_{peak}for the GOP to be encoded are calculated from (6) and (7), where the PSNR obtained from Step 1 and α

_{init}are used. The

*sa*

_{target}is then determined from (9). In Step 3, the PSNR obtained from Step 2 is evaluated and used as PSNR

_{prev}in (5) to adjust the proper α. Using the adjusted α denoted by α

_{est}, the

*sa*

_{peak}, PSNR

_{peak}, and

*sa*

_{target}for the GOP in Step 3 are calculated from (6), (7), and (9), respectively. The

*sa*

_{target}, determined once in Step 3, is used continually for the subsequent GOPs in Step 4. As long as the R-D characteristics of successive frames are similar, encoding with a

*sa*

_{target}works well. To cope with varying R-D characteristics, actions for bitrate change and QP change are described in Figure 4. If bitrate

_{target}is changed, the relation between the spatial resolution and the PSNR becomes different, thus the slope α

_{est}value needs to be refreshed through Steps 1, 2, and 3. Before going back to Step 1, the full size is set to 1 which is a full resolution with no reduction of spatial resolution when the increase of bitrate

_{target}is greater than TH

_{BR}. If the decrease of bitrate

_{target}is greater than TH

_{BR}, the full size is set to the current

*sa*

_{target}. A new

*sa*

_{target}for the decreased target bitrate will be determined as a value less than the current

*sa*

_{target}. In this article, TH

_{BR}is calculated by using 0.02 × frame rate × original spatial resolution. Even though the bitrate

_{target}is the same, the motion characteristics of video can be changed. If the motion is faster, the average QP value of the recent frames becomes higher than that of the previous ones, and vice versa. Therefore,

*sa*

_{target}is adjusted in a fine-grain manner by the change in the average QP. As shown in Figure 4, the whole flow to decide the proper spatial resolution is processed automatically and does not depend on advanced information about characteristics of the video content and the specific coding methods. Therefore, the proposed algorithm can easily be applied to real-time applications.

## 4. Experimental results

The proposed bitrate control scheme is implemented and integrated into the JM 13.2 reference software which adopts the QP-based bitrate control. To resize the spatial resolution, the up/down-sampling algorithms recommended in the SVC are used. For down-sampling, the algorithm based on the Sine-windowed Sinc-function is applied where a set of seven filters is used to support the extended range of the spatial scaling ratio. For up-sampling, the SVC normative up-sampling algorithm is applied which is based on a set of 6-taps filters derived from the Lanczos-3 filter. In this experiment, five HD video sequences, *Pedestrian Area, Tractor, Station2, Sunflower*, and *Blue Sky*, two full HD video sequences, *Speed Bag* and *Life* and two 4CIF videos, *Harbor* and *Soccer*, are used. The chosen length of a GOP is 30 frames and 150 frames are encoded. The GOP structure is IPPP.

*Conventional'*represents that the nine video sequences are encoded in the full size without spatial resolution control. When the target bitrate cannot be satisfied even with the maximum QP value, the frame rate is decreased by a half. The frame drop is realized by a factor of 2 in the encoder side which encodes all the macroblocks as the SKIP mode in the dropped frame. Thus, in the decoder side, frames are displayed at 30 fps with the dropped frame by a repetition of the previous frame. '

*Proposed'*and '

*Optimal'*represent, respectively, the proposed spatial resolution control and the optimal spatial resolution values. The optimal values are obtained experimentally by changing the resolution ratios from 0.1 to 1 and by measuring the PSNR values. Unlike a full size encoding, the rate control schemes represented by '

*Proposed'*and '

*Optimal'*do not use a frame-drop to meet the target bitrate. Note that the optimal resolution is a theoretical upper-bound of the spatial control and cannot be calculated on the fly. Experiments are conducted using these three rate control schemes with various bitrates: 250, 400, 600, 800, and 1000 kbps for HD, 400, 800, 1500, and 2000 kbps for full HD and 150, 300, 400, and 1000 kbps for 4CIF. In the result of HD with the high target bitrates of 1000 kbps, PSNR and VQM enhancements by the proposed rate control are 0.82 and 0.12 dB, respectively. As the target bitrate decreases, the VQM enhancement by the proposed rate control is increased by 5.15. Note that the PSNR difference between the proposed rate control and the conventional QP control is not large at the target bitrates of 400 and 250 kbps, whereas the PSNR enhancement by the proposed algorithm is 1.85 dB at the target bitrate of 600 kbps. This is because the frame rate is decreased in the ultra-low bitrate in the case of the conventional algorithm. Thus, at the target bitrates of 400 and 250 kbps, the PSNR of each frame encoded with the conventional algorithm is a little enhanced because the allocated bits per frame are increased. In this experiment, the optimal spatial resolution is chosen because it has the highest PSNR value than the other resolutions. The VQM value with the optimal spatial resolution could be a little worse than the one calculated from the proposed bitrate control like the cases at bitrate 250 and 800 kbps because of the difference between the two calculation methods, PSNR and VQM. However, the optimal PSNR values are always higher than the results from the conventional or the proposed bitrate control. For full HD and 4CIF as well as HD, the PSNR and VQM of the proposed spatial resolution are very close to that of the optimal one as shown in Table 2.

Comparison of the PSNR and VQM among the conventional control, the proposed spatial resolution and the optimal spatial resolution at various target bitrates

Measurement tools | Rate control methods | 1280 × 720 | 1920 × 1072 | 704 × 576 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Target bitrate (kbps) | ||||||||||||||

250 | 400 | 600 | 800 | 1000 | 400 | 800 | 1500 | 2000 | 150 | 300 | 400 | 1000 | ||

PSNR (Y) | Conventional | 29.14 | 30.88 | 30.49 | 32.53 | 33.75 | 19.56 | 26.99 | 29.75 | 30.90 | 23.11 | 27.43 | 27.20 | 30.25 |

Proposed | 30.14 | 31.08 | 32.35 | 33.74 | 34.57 | 31.70 | 33.55 | 35.36 | 36.11 | 25.40 | 27.20 | 27.97 | 30.70 | |

Optimal | 30.36 | 31.54 | 32.74 | 33.88 | 34.68 | 32.07 | 33.90 | 35.75 | 36.57 | 25.78 | 27.45 | 27.99 | 31.06 | |

Difference(Full, Proposed) | 1 | 0.2 | 1.85 | 1.21 | 0.82 | 12.15 | 6.56 | 5.60 | 5.20 | 2.29 | -0.24 | 0.77 | 0.45 | |

Difference(Optimal, Proposed) | 0.22 | 0.46 | 0.39 | 0.14 | 0.12 | 0.37 | 0.35 | 0.39 | 0.46 | 0.39 | 0.25 | 0.02 | 0.37 | |

VQM (Y) | Conventional | 7.78 | 4.45 | 2.65 | 2.04 | 1.8 | 9.61 | 5.46 | 2.82 | 2.45 | 7.32 | 4.99 | 2.94 | 2.16 |

Proposed | 2.63 | 2.36 | 2.13 | 1.81 | 1.68 | 2.82 | 2.20 | 1.78 | 1.59 | 3.50 | 2.89 | 2.76 | 2.02 | |

Optimal | 2.65 | 2.35 | 2.07 | 1.84 | 1.68 | 2.63 | 2.11 | 1.65 | 1.50 | 3.46 | 2.82 | 2.75 | 2.00 | |

Difference(Full, Proposed) | 5.15 | 2.09 | 0.51 | 0.23 | 0.12 | 6.79 | 3.26 | 1.04 | 0.86 | 3.82 | 2.10 | 0.18 | 0.14 | |

Difference(Optimal, Proposed) | -0.02 | 0.01 | 0.06 | -0.03 | 0 | 0.19 | 0.09 | 0.13 | 0.10 | 0.04 | 0.07 | 0.02 | 0.02 |

*'Optimal'*and

*'Proposed'*are negligible. As the target bitrate increases, PSNR improvement decreases. When the target bitrate is extremely low, such as 250 and 400 kbps, the allocated bits per pixel (bpp) are just 0.009 and 0.014 bpp, respectively. A conventional QP-based bitrate control cannot meet the target bitrate, even when using the maximum QP value. In the '

*Conventional'*method of Figure 5, frames are dropped to meet the target bitrate. Thus, video sequences are encoded at 15 and 7.5 fps for 400 and 250 kbps, respectively, whereas the frame rate is 30 fps in 600, 800, and 1000 kbps. For three video sequences,

*Sunflower, Pedestrian Area*, and

*Tractor*at 400 kbps, frame skips work for PSNR enhancement, to a limited amount, because the allocated bits per pixel are increased. However, additional frame skips, used to meet the target bitrate of 250 kbps, do not help the quality enhancement of each frame for the

*Sunflower*and

*Tractor*videos. Because the temporal correlation between frames becomes low, it results in low compression efficiency. The PSNR of the

*Pedestrian Area*video is increased a little more at 250 kbps. In general, the results depend on the characteristic of the video sequence. In Figure 5d-f, VQMs clearly show that the proposed algorithm produces a significant improvement compared to the conventional rate control. The proposed rate control algorithm maintains a similar VQM quality from 1000 to 250 kbps, while the VQM of the conventional rate control increases drastically as the target bitrate decreases.

*sa*, as determined in the experiments summarized in Table 2, are shown. As the target bitrate increases, the

*'Optimal' sa*also increases. The values of

*sa*determined by the

*'Proposed'*algorithm are very similar to those from the

*'Optimal'*one. In this experiment, the difference between

*'Optimal'*and

*'Proposed' sa*values is, on average, just 0.05.

Comparison of the SA between the proposed spatial resolution and the optimal spatial resolution at various target bitrates

Rate control methods | 1280 × 720 | 1920 × 1072 | 704 × 576 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

| |||||||||||||

250 | 400 | 600 | 800 | 1000 | 400 | 800 | 1500 | 2000 | 150 | 300 | 400 | 1000 | |

SA | |||||||||||||

Proposed | 0.06 | 0.10 | 0.33 | 0.36 | 0.41 | 0.19 | 0.36 | 0.73 | 0.73 | 0.08 | 0.23 | 0.51 | 0.51 |

Optimal | 0.04 | 0.10 | 0.26 | 0.26 | 0.36 | 0.16 | 0.34 | 0.64 | 0.82 | 0.12 | 0.26 | 0.44 | 0.58 |

Difference (optimal, proposed) | 0.02 | 0.00 | 0.07 | 0.10 | 0.05 | 0.02 | 0.02 | 0.09 | 0.09 | 0.04 | 0.03 | 0.07 | 0.08 |

*sa*for each GOP are presented for three HD video sequences,

*Sunflower, Pedestrian Area*, and

*Tractor*, which are encoded with various bitrates. In Figure 6a-c, the vertical axis shows

*sa*while the horizontal axis shows the GOP number. Step 1 of Figure 4 starts from GOP 1. For 400 and 250 kbps, the extremely low bitrate control is already carried out to meet the target bitrate, as explained in Figure 4. Thus, the full sizes in GOP 1 for 400 and 250 kbps are 0.5 and 0.25, respectively. Through the proposed spatial resolution adjustment from GOP 1 to GOP 3, the

*sa*is determined and stabilized. In Figure 6d-f, the

*sa*for each GOP is compared to the optimal one. The vertical axis in Figure 6d-f shows the difference between the proposed and optimal

*sa*values. To determine the difference, the optimal

*sa*is subtracted from the proposed one. As shown in these graphs, the differences are very small.

*Sunflower*sequence and the 135th frame of the

*Pedestrian Area*sequence, respectively. The result of applying the conventional bitrate control, which uses only QP change, is shown in Figures 7a and 8a. As observed in these figures, much of the details in the frame is destroyed or decreased. In Figures 7b and 8b, QP and spatial resolution are controlled by the proposed rate control and, subjectively, the quality is better than the quality in Figures 7a and 8a.

*Soccer*video sequence are encoded. The length of the GOP is 30. The values of

*sa*and PSNR for each GOP are presented in Figure 9a, b, respectively. In that figure, the dotted graphs represent the results obtained from the proposed algorithm while the solid graphs represent the optimal values. In the 300 frames of

*Soccer*, the first half has a slower motion than the second half. Until GOP 5, the

*sa*value is determined to be 0.26 by the proposed algorithm. After that, the motion becomes faster, and consequently, the QP values are increased. Therefore, the

*sa*value for the second half of the sequence is adjusted downward. In Figure 9a,

*sa*is set to 0.19 for GOPs 6, 7, 8, and 9 and 0.13 for GOP 10. From GOP 3 to GOP 10, the differences between

*'Proposed'*and

*'Optimal'*values are just 0.02, on average. In Figure 9b, the PSNR values for

*'Proposed'*and

*'Optimal'*cases are compared and the differences are negligible.

*Harbor*video sequence is used as an example. In Figure 10a, b, the target bitrate for the first five GOPs is 300 kbps, whereas the target bitrate for the last five GOPs is 1500 kbps. In Figure 10a, the optimal spatial resolutions are 0.25 and 0.66 for the first and last half, respectively. In the proposed algorithm,

*sa*is chosen initially as 0.5 because the 300 kbps target bitrate is extremely low; subsequently, the

*sa*is determined to be 0.2 by the proposed algorithm. From GOP 5 onward, the target bitrate is increased to 1500 kbps. Thus,

*sa*is set to 1 in GOP 5 and the process to determine the proper spatial resolution is restarted. On the basis of the bitrate change action explained in Figure 4,

*sa*for the 1500 kbps target bitrate is determined to be 0.5. In Figure 10b, the PSNR values for the

*'Proposed'*and

*'Optimal'*cases are compared and the difference is 0.5 dB, on average. In Figure 10c, d, the target bitrate is reduced from 1000 to 500 kbps. For the first five GOPs,

*sa*is decreased to 0.5 for 1000 kbps by the proposed algorithm. For the following five GOPs,

*sa*is reduced because the target bitrate is reduced to 500 kbps. As shown in Figure 10c, the shape of the dotted graph obtained from the proposed algorithm closely follows the trend of the optimal graph. The difference between the two PSNR values in Figure 10d is just 0.28 dB.

## 5. Conclusion

The main contribution of this article is a real-time bitrate control algorithm using spatial down-sampling for the low bitrate encoding. The previous resolution control schemes are too complex to be processed at run time. In this article, a simple estimation model which defines the relationship between the PSNR and the spatial resolution ratio is presented for low bitrate compression. This estimation model is used to find the resolution ratio for acceptable quality on the fly for real-time systems. Two scalability tools for the QP and spatial resolution ratio are determined sequentially to reach the target PSNR. Experimental results show that the proposed bitrate control algorithm is close to the optimal solution and yields the better PSNR and VQM quality at various bitrates compared to the conventional QP-based bitrate control algorithm.

## Declarations

### Acknowledgements

This study was supported by the Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MEST) (2011-0027502).

## Authors’ Affiliations

## References

- Li ZG, Pan F, Lim KP, Rahardja S: Adaptive rate control for H.264.
*Proc IEEE Int Conf Image Processing Singapore*2004, 2: 745-748.Google Scholar - Kamaci N, Altunbasak Y, Mersereau RM: Frame bit allocation for the H.264/AVC video coder via Cauchy-density-based rate and distortion models.
*IEEE Trans Circ Syst Video Technol*2005, 15(8):994-1006.View ArticleGoogle Scholar - Ma S, Gao W, Lu Y: Rate-distortion analysis for H.264/AVC video coding and its application to rate control.
*IEEE Trans Circ Syst Video Technol*2005, 15(12):, 1533-1544.View ArticleGoogle Scholar - He Z, Kim Y-K, Mitra SK: Low-delay rate control for DCT video coding via ρ-domain source modeling.
*IEEE Trans Circ Syst Video Technol*2001, 11(8):928-940. 10.1109/76.937431View ArticleGoogle Scholar - Yuan W, Lin S, Zhang Y, Yuan W, Luo H: Optimum bit allocation and rate control for H.264/AVC.
*IEEE Trans Circ Syst Video Technol*2006, 16(6):705-715.View ArticleGoogle Scholar - Kwon D-K, Shen M-Y, Jay Kuo C-C: Rate control for H.264 video with enhanced rate and distortion models.
*IEEE Trans Circ Syst Video Technol*2007, 17(5):517-529.View ArticleGoogle Scholar - An C, Nguyen TQ: Iterative rate-distortion optimization of H.264 with constant bit rate constraint.
*IEEE Trans Image Process*2008, 17(9):1605-1615.MathSciNetView ArticleGoogle Scholar - Sullivan G, Wiegand T, Lim K-P: Joint model reference encoding methods and decoding concealment methods.
*Section 2.6 rate control JVT-I049*2003.Google Scholar - Ma S, Gao W, Wu F, Lu Y: Rate control for JVT video coding scheme with HRD considerations.
*Proc IEEE ICIP Spain*2003, 3: 793-796.Google Scholar - Yu HT, Pan F, Lin ZP: A new bit estimation scheme for H.264 rate control.
*Proc IEEE Int Symp Consumer Electronics, UK*2004, 396-399.Google Scholar - Yang X, Lin W, Lu Z, Lin X, Rahardja S, Ong E, Yao S: Rate control for videophone using local perceptual cues.
*IEEE Trans Circ Syst Video Technol*2005, 15(4):496-507.View ArticleGoogle Scholar - Liu Y, Li ZG, Soh YC: Region-of-interest based resource allocation for conversational video communication of H.264/AVC.
*IEEE Trans Circ Syst Video Technol*2008, 18(1):134-139.View ArticleGoogle Scholar - Li H, Wang Z, Cui H, Tang K: An improved ROI-based rate control algorithm for H.264/AVC.
*Proc Int Conf Signal Processing China*2006, 2: 16-20.Google Scholar - Hrarti M, Saadane H, Larabi M, Tamtaoui A, Aboutajdine D: A macroblock-based perceptually adaptive bit allocation for H264 rate control.
*Proc Int Symposium on I/V Communications and Mobile Network, Morocco*2010, 1-4.Google Scholar - Huang C-M, Lin C-W: A novel 4-D perceptual quantization modeling for H.264 bit-rate control.
*IEEE Trans Multimed*2007, 9(6):1113-1124.View ArticleGoogle Scholar - Meng Q, Meng Q: Improved macroblock-level rate control algorithm with visual properties.
*Proc Int Workshop Intelligent Systems and Applications, China*2010, 1-5.Google Scholar - Cui Z, Zhu X: SSIM-based content adaptive frame skipping for low bit rate H.264 video coding.
*Proc Int Conf Communication Technology, China*2010, 484-487.Google Scholar - Ou T-S, Huang Y-H, Chen HH: SSIM-based perceptual rate control for video coding.
*IEEE Trans Circ Syst Video Technol*2011, 21(5):682-691.View ArticleGoogle Scholar - Jin R, Chen J: The coding rate control of consistent perceptual video quality in H.264 ROI.
*Proc Int Symposium Computer Network and Multimedia Technology, China*2009, 1-4.Google Scholar - Wolf S, Pinson M: VQM software and measurement techniques.
*National Telecommunications and Information Administration Report*2002.Google Scholar - Pan F, Lin X, Rahardja S, Lim KP, Li ZG, Wu DJ, Wu S: Proactive frame-skipping decision scheme for variable frame rate video coding.
*Proc Int Conf Multimedia and Expo Taiwan*2004, 3: 1903-1906.Google Scholar - Pan F, Lin ZP, Lin X, Rahardja S, Juwono W, Slamet F: Adaptive frame skipping based on spatio-temporal complexity for low bit-rate video coding.
*J Vis Commun Image R*2006, 17(3):554-563. 10.1016/j.jvcir.2005.07.006View ArticleGoogle Scholar - Jun J, Lee S, He Z, Lee M, Jang ES: Adaptive key frame selection for efficient video coding.
*LNCS*2007, 4872: 853-866.Google Scholar - Liu S, Kuo CJ: Joint temporal-spatial bit allocation for video coding with dependency.
*IEEE Trans Circ Syst Video Technol*2005, 15(1):15-26.View ArticleGoogle Scholar - Vetro A, Wang Y, Sun HF: Rate-distortion optimized video coding considering frameskip.
*Proc Int Conf Image Processing Greece*2001, 3: 534-537.Google Scholar - Kim J, Kim Y-G, Song H, Kuo T-Y, Chung YJ, Kuo C-CJ: TCP-friendly Internet video streaming employing variable frame-rate encoding and interpolation.
*IEEE Trans Circ Syst Video Technol*2000, 10(7):1164-1177. 10.1109/76.875520View ArticleGoogle Scholar - Song H, Kuo C-CJ: Rate control for low-bit-rate video via variable-encoding frame rates.
*IEEE Trans Circ Syst Video Technol*2001, 11(4):512-521. 10.1109/76.915357View ArticleGoogle Scholar - Thaipanich T, Wu P-H, Kuo C-CJ: Low complexity algorithm for robust video frame rate up-conversion (FRUC) technique.
*IEEE Trans Consum Electron*2009, 55(1):220-228.View ArticleGoogle Scholar - Jackson AHAM, McEwan R, Mullin J: Impact of video frame rate on communicative behavior in two and four party groups.
*Proc ACM Conf Comput Supported Cooperative Work, Philadelphia, PA*2000, 11-20.Google Scholar - Information technology - Coding of Audio-visual Objects - Part 2: Visual International Organization for Standardization 2000. ISO/IEC 14496-2:1999/Amd.1:2000(E)Google Scholar
- Cote G, Erol B, Gallant M, Kossentini F: H.263+: video coding at low bit rates.
*IEEE Trans Circ Syst Video Technol*1998, 8(7):849-866. 10.1109/76.735381View ArticleGoogle Scholar - Segall CA, Elad M, Milanfar P, Webb R, Fogg C: Improved high-definition video by encoding at an intermediate resolution.
*Proc Conf Visual Communications and Image Processing USA*2004, 5308: 1007-1018.Google Scholar - Bruckstein AM, Elad M, Kimmel R: Down-scaling for better transform compression.
*IEEE Trans Image Process*2003, 12(9):1132-1145. 10.1109/TIP.2003.816023MathSciNetView ArticleMATHGoogle Scholar - Ilgin HA, Chaparro LF: Low bit rate video coding using DCT based fast decimation/interpolation and embedded zero tree coding.
*IEEE Trans Circ Syst Video Technol*2007, 17(7):833-844.View ArticleGoogle Scholar - Nguyen VA, Tan YP, Lin WS: Adaptive downsampling/upsampling for better video compression at low bit rate.
*Proc of Int Symposium on Circuits and Systems, USA*2008, 1624-1627.Google Scholar - Shu HY, Chau LP: The realization of arbitrary downsizing video transcoding.
*IEEE Trans Circ Syst Video Technol*2006, 16(4):540-546.View ArticleGoogle Scholar - Tan Y-P, Liang Y, Sun H: On the methods and performances of rational downsizing video transcoding.
*Signal Process Image Commun*2004, 19: 47-65. 10.1016/j.image.2003.08.017View ArticleGoogle Scholar - Wang R-J, Chien M-C, Chang P-C: Adaptive down-sampling video coding.
*Proc SPIE Multimedia on Mobile Devices*2010, 7542: 1-8.Google Scholar - Lee H, Lee Y, Lee J, Lee D, Shin H: Design of a mobile video streaming system using adaptive spatial resolution control.
*IEEE Trans Consum Electron*2009, 55(3):1682-1689.View ArticleGoogle Scholar - Sirhindi R, Murtaza S, Afzal M: Improved data hiding technique for shares in extended visual secret sharing schemes.
*LNCS Inf Commun Secur*2008, 5308: 376-386. 10.1007/978-3-540-88625-9_25Google Scholar - Samuel S, Penzhorn WT: Digital watermarking for copyright protection.
*Proc Conf AFRICON*2004, 2: 953-957.Google Scholar - Kasmani SA, Naghsh-Nilchi A: A new robust digital image watermarking technique based on joint DWT-DCT transformation.
*Proc Int Conf on Convergence and Hybrid Information Technology, Korea*2008, 539-544.Google Scholar - Liu R, Wang G, Wang P, Huang W: An image authentication scheme based on sliding window.
*Proc Conf Control and Decision, China*2008, 2937-2940.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.