- Research Article
- Open Access
On Optimizing H. 264/AVC Rate Control by Improving R-D Model and Incorporating HVS Characteristics
© Zhongjie Zhu et al. 2010
- Received: 18 March 2010
- Accepted: 24 August 2010
- Published: 29 August 2010
The state-of-the-art JVT-G012 rate control algorithm of H.264 is improved from two aspects. First, the quadratic rate-distortion (R-D) model is modified based on both empirical observations and theoretical analysis. Second, based on the existing physiological and psychological research findings of human vision, the rate control algorithm is optimized by incorporating the main characteristics of the human visual system (HVS) such as contrast sensitivity, multichannel theory, and masking effect. Experiments are conducted, and experimental results show that the improved algorithm can simultaneously enhance the overall subjective visual quality and improve the rate control precision effectively.
- Quadratic Model
- Contrast Sensitivity
- Video Code
- Human Visual System
- Quantization Parameter
Rate control plays an important role in the field of video coding and transmission, which has been extensively studied in the literature. Strictly speaking, rate control technology is not a normative part of video coding standards. However, due to its importance for video coding and transmission, rate control has been widely studied, and several rate control algorithms have been proposed for inclusion in the reference software implementations of the existing video coding standards such as the TM5 for MPEG-2, the TMN8 for H.263, the VM8 for MPEG-4, and the F086 and G012 for H.264 [1–4]. Of all those algorithms the JVT-G012 has attracted great attention over the last few years and is being widely used.
R-D models describe the relationships between the bitrates and the distortions in the reconstructed video, which can enable an encoder to determine the required bit rate to achieve a target quality. For a rate control algorithm, the R-D model is a key part, and its accuracy greatly affects the rate control performance. Hence, how to improve the prediction accuracy of R-D model is the key to enhance the performance of rate control algorithm. Quite a few R-D models have been proposed such as the logarithm model, exponential model, and quadratic model . Of all these models the quadratic model is adopted by JVT-G012 which is derived based on the assumption that video source follows Gaussian distribution. However, as analyzed in Section 2, it is not accurate enough in practical applications. There also exist several works where the quadratic models have been improved [6, 7]. However, almost all of them have just emphasized on how to improve the accuracy of frame complexity prediction, and their R-D models' structure is very similar to that of the original quadratic model. At the same time, studies on visual physiology and visual psychology indicate that observers usually have different sensitivities and interests to different video contents in video sequence, and the contents with more attention-attraction and higher visual sensitivity will be more sensitive to coding error. Hence, during video coding process, the regions with high sensitivities can be allocated more bits to acquire a higher overall subjective quality. Many scholars have been working on HVS-based video coding technologies, and many achievements have been made. For example, a novel subjective criteria for video quality evaluation based on the foveal characteristics has been developed in . Nguyen and Hwang have presented a rate control approach for low bit-rate streaming video by incorporating the HVS characteristic of smooth pursuit eye movement, which can improve the quality of moving scenes in a video sequence . Tang et al. have also proposed a novel rate control algorithm by considering the motion and texture structures of video contents . However, since HVS is extremely complex, it is very difficult to obtain perfect perceptually consistent result during rate control and video coding. Thus, this topic should be further studied [11–13].
Based on above observations, the G012 algorithm is optimized from two aspects in this paper. First, the traditional quadratic R-D model is improved based on both empirical observations and theoretical analysis. Second, based on current physiological and psychological research findings of HVS, the main HVS characteristics such as contrast sensitivity, multichannel theory, and masking effect are analyzed, and then the G012 algorithm is optimized by incorporating these characteristics.
This paper is organized as follows. In Section 2, the quadratic model is analyzed and improved. In Section 3, some main HVS characteristics and the HVS-based rate control optimization are introduced. Experiments are conducted in Section 4, and Section 5 concludes the paper.
where are model parameters, denotes quantization step, and is the residual.
The above derivation is not very precise for the following several reasons. Firstly, the assumption that the DCT coefficients follow Gaussian distribution is not always true in practical applications. Secondly, only the first and the second items in expanded Taylor series are retained in (2). The constant and high-order items are totally discarded, which will introduce errors. In addition, it is not accurate to just let .
Suppose that are existing samples from previously coded frames, where is the number of the previously coded frames and is the corresponding bit rate of the i th frame.
where is the transpose of and is the inverse matrix of .
In order to evaluate the performance of the new proposed R-D model, the actual R-Q data are also fitted by the proposed R-D model, and the fitted result is compared with that of the quadratic model. Some simulation results are shown in Figure 1, where the actual R-Q curve and the fitted R-Q curve of each frame with two models are drawn on the same figure. From the simulation results, it can be found that, compared with the old model, the new one can more accurately approach the actual R-Q relation.
G012 can be viewed as performing rate control at three different layers: GOP layer, frame layer, and basic unit (BU) layer. Within a BU, all the macroblocks (MB) are encoded with the same quantization parameter, which does not consider the visual sensitivity difference between MBs. Thus, it is hard to obtain perceptually consistent results in practical applications.
HVS is so complex that it is impossible to incorporate the whole HVS characteristics during rate control and video coding. This paper only discusses several important HVS characteristics including the contrast sensitivity, multichannel theory, and masking effect [15–17]. HVS-related research findings contend that there exists a threshold contrast for an observer to detect the change in intensity. The inverse of the contrast threshold is usually defined as contrast sensitivity, which is a function of spatial frequency with a bandpass nature . The multichannel theory of human vision states that different stimuli are processed in different channels in the human visual system. Both physiological and psychophysical experiments carried out on perception have given the evidence of the bandpass nature of the cortical cells' response in the spectral domain, and the human brain seems therefore to possess a collection of separate mechanisms, each being more sensitive to a portion of the frequency domain. Masking effect can be explained as the interaction among stimuli. The detection threshold of a stimulus may vary due to the existence of another. This kind of variation can be either positive or negative. Due to masking effect, similar artifacts may be disturbing in certain regions of an image while they are hardly noticeable elsewhere. Hence, during rate control the masking effect can be used to deal with different scenarios with different tips to acquire perceptually consistent result .
In the proposed algorithm, the quantization step for each MB will be adjusted by its visual sensitivity. MBs with high visual sensitivity will be encoded with small quantization parameters, and MBs with low visual sensitivities are encoded with large quantization parameters so that better subjective visual quality could be obtained under the same given bit rate constraint. In order to incorporate HVS characteristics, three new modules are added to the original G012 framework, which are preprocessing module, visual sensitivity calculation module, and quantization step modification module. The three new added modules are briefly introduced as follows.
where is the value of thei th pixel, denotes the image width, and denotes the image height.
where NUM denotes the number of MBs in the image, denotes the spatial frequency of thei th MB, and denote the horizontal and vertical frequencies, respectively, and denotes the th pixel' value.
3.2. Calculation of Visual Sensitivity
In the proposed algorithm, the quantization step for an MB will be adjusted according to its visual sensitivity. For each MB, we consider its visual sensitivity from four aspects: motion sensitivity, brightness sensitivity, contrast sensitivity, and position sensitivity.
(1) Motion Sensitivity
where denotes the motion sensitivity, and denote the sensitivities of background region and moving object, respectively. In this paper, is set to 0.5, and is set to 1.5. For moving objects detection, our earlier proposed object extraction and tracking algorithm based on spatio-temporal information is employed .
(2) Brightness Sensitivity
(3) Contrast Sensitivity
where denotes the peak frequency, denotes the contrast sensitivity, and f denotes the spatial frequency in cycles per degree.
(4) Position Sensitivity
where denotes the center of the current MB, denotes the center of the image, and max represents the maximum distance from the side to the center of the image.
3.3. Quantization Parameter Modification Based on Visual Sensitivity
where is a weighting factor of background intensity, and are thresholds for intensity.
where is the quantization step for the whole BU acquired during BU layer rate control. For practical rate control, MB-based quantization adjustment may lead to visual discomfort such as artifact. Hence, in our actual implementations, the quantization adjustment for each MB is confined to .
Number Reference Frames
Rate Control Enable
Number B Frames
4, 5, 6
The average intensities and spatial frequency of test sequences (unit: c/d).
Rate-control results of Claire sequence.
Rate-control results of Alex sequence.
Rate-control results of Train sequence (1).
Rate-control results ofTrain sequence (2).
However, rate control is a comprehensive technology whose performance is affected by many factors in practical applications. There also exist some limitations for our algorithm. For example, we also used the same temporal Mean Absolute Difference (MAD) prediction method as JVT-G012, which has the following drawbacks: (a) inaccurate MAD prediction for sudden changes; (b) inaccurate bit allocation since the header bits of the current basic unit are predicted from previous encoded basic units. Another limitation is the algorithm complexity. Compared with the original one, the proposed algorithm has higher computational load. Our future work will be on how to handle these limitations.
Rate control plays an important role in the field of video coding and transmission, and its performance is of great importance to the video coding effectiveness. In this paper, by adapting quadratic R-D model and incorporating the main HVS characteristics, the classic JVT-G012 rate control algorithm has been improved, which can both improve the rate control precision and enhance the overall subjective quality of the reconstructed video images. However, rate control is a comprehensive technology whose performance is affected by many factors in practical applications, and there still exist some limitations for the proposed algorithm. Hence, it should be further optimized if only better performance is expected.
This paper was supported in part by the National Natural Science Foundation of China (no. 60902066, 60872094, 60832003), the Zhejiang Provincial Natural Science Foundation (no. Y107740), the Projects of Chinese Ministry of Education (no. 200816460003), the Open Project Foundation of Ningbo Key Laboratory of DSP (no. 2007A22002), and the Scientific Research Fund of Zhejiang Provincial Education Department (Grant no. Z200909361).
- Merritt L, Vanam R: Improved rate control and motion estimation for H.264 encoder. Proceedings of the 14th IEEE International Conference on Image Processing (ICIP '07), September 2007 V309-V312.Google Scholar
- Kamaci N, Altunbasak Y, Mersereau RM: Frame bit allocation for the H.264/AVC video coder via cauchy-density-based rate and distortion models. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(8):994-1006.View ArticleGoogle Scholar
- Chen Z, Ngan KN: Recent advances in rate control for video coding. Signal Processing: Image Communication 2007, 22(1):19-38. 10.1016/j.image.2006.11.002Google Scholar
- Li ZG, Pan F, Lin KP, Feng G, Lin X, Rahardja S: Adaptive basic unit layer rate control for JVT. Proceedings of the 7th JVT Meeting, 2003, Pattaya II, ThailandGoogle Scholar
- Yuan W: Research for the rate control algorithm in H.264, M.S. thesis. Hefei University of Technology, China; 2006.Google Scholar
- Yi X, Ling N: Improved H.264 rate control by enhanced MAD-based frame complexity prediction. Journal of Visual Communication and Image Representation 2006, 17(2):407-424. 10.1016/j.jvcir.2005.04.005View ArticleGoogle Scholar
- Kim J-Y, Kim S-H, Ho Y-S: A frame-layer rate control algorithm for H.264 using rate-dependent mode selection. Proceedings of the 6th Pacific Rim Conference on Multimedia (PCM '05), 2005, Lecture Notes in Computer Science 3768: 477-488.Google Scholar
- Lee S, Pattichis MS, Bovik AC: Foveated video compression with optimal rate control. IEEE Transactions on Image Processing 2001, 10(7):977-992. 10.1109/83.931092MathSciNetView ArticleMATHGoogle Scholar
- Nguyen AG, Hwang J-N: A novel hybrid HVPC/mathematical model rate control for low bit-rate streaming video. Signal Processing: Image Communication 2002, 17(5):423-440. 10.1016/S0923-5965(02)00011-5Google Scholar
- Tang C-W, Chen C-H, Yu Y-H, Tsai C-J: Visual sensitivity guided bit allocation for video coding. IEEE Transactions on Multimedia 2006, 8(1):11-18.View ArticleGoogle Scholar
- Chen ZZ, Ngan KN: A unified framework of unsupervised subjective optimized bit allocation for video coding using visual attention model. Multimedia Systems and Applications VIII, 2005, Boston, Mass, USA, Proceedings of SPIEGoogle Scholar
- Liu Z, Karam LJ, Watson AB: JPEG2000 encoding with perceptual distortion control. IEEE Transactions on Image Processing 2006, 15(7):1763-1778.View ArticleGoogle Scholar
- Jiang M, Ling N: On lagrange multiplier and quantizer adjustment for H.264 frame-layer video rate control. IEEE Transactions on Circuits and Systems for Video Technology 2006, 16(5):663-669.View ArticleGoogle Scholar
- Chiang T, Zhang Y-Q: A new rate control scheme using quadratic rate distortion model. IEEE Transactions on Circuits and Systems for Video Technology 1997, 7(1):246-250. 10.1109/76.554439View ArticleGoogle Scholar
- Wang Z, Sheikh RH, Bovik CA: Objective video quality assessment. In The Handbook of Video Databases: Design and Applications. Edited by: Furht B, Marqure O. CRC Press, Boca Raton, Fla, USA; 2003:1041-1078.Google Scholar
- Simoncelli EP, Olshausen BA: Natural image statistics and neural representation. Annual Review of Neuroscience 2001, 24: 1193-1216. 10.1146/annurev.neuro.24.1.1193View ArticleGoogle Scholar
- Sheikh HR, Bovik AC: Image information and visual quality. IEEE Transactions on Image Processing 2006, 15(2):430-444.View ArticleGoogle Scholar
- Wei CK: Image quality assessment model via HVS, M.S. thesis. National University of Defense Technology, China; 2003.Google Scholar
- Mannos JL, Sakrison DJ: The effects of a visual fidelity criterion on the encoding of images. IEEE Transactions on Information Theory 1974, 20(4):525-536. 10.1109/TIT.1974.1055250View ArticleMATHGoogle Scholar
- Cheng W-H, Chu W-T, Kuo J-H, Wu J-L: Automatic video region-of-interest determination based on user attention model. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '05), May 2005 3219-3222.Google Scholar
- Osberger W, Maeder A, Bergmann N: A technique for image quality assessment based on a human visual system model. Proceedings of the European Signal Processing Conference, 1998 1049-1052.Google Scholar
- Zhu ZJ, Jiang GY, Yu M, Wu XW: New algorithm for extracting moving object based on spatio-temporal information. Chinese Journal of Image and Graphics 2003, 8(4):422-425.Google Scholar
- Ivkovic G, Sankar R: An algorithm for image quality assessment. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), May 2004 713-716.Google Scholar
- ITU-R : Methodology for the subjective assessment of the quality of television pictures. ITU-R Recommendation BT.500-10, Geneva, Switzerland; 2000.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.