H.264/SVC parameter optimization based on quantization parameter, MGS fragmentation, and user bandwidth distribution
- Xu CHEN^{1}Email author,
- Ji-hong ZHANG^{2},
- Wei LIU^{2},
- Yong-sheng LIANG^{2} and
- Ji-qiang FENG^{1}
https://doi.org/10.1186/1687-6180-2013-10
© CHEN et al.; licensee Springer. 2013
- Received: 3 February 2012
- Accepted: 21 December 2012
- Published: 31 January 2013
Abstract
In the situation of limited bandwidth, how to improve the performance of scalable video coding plays an important role in video coding. The previously proposed scalable video coding optimization schemes concentrate on reducing coding computation or trying to achieve consistent video quality; however, the connections between coding scheme, transmission environments, and users’ accesses manner were not jointly considered. This article proposes a H.264/SVC (scalable video codec) parameter optimization scheme, which attempt to make full use of limited bandwidth, to achieve better peak signal-to-noise ratio, based on the joint measure of user bandwidth range and probability density distribution. This algorithm constructs a relationship map which consists of the bandwidth range of multiple users and the quantified quality increments measure, QP _{ e }, in order to make effective use of the video coding bit-stream. A medium grain scalability fragmentation optimization algorithm is also presented with respect to user bandwidth probability density distribution, encoding bit rate, and scalability. Experiments on a public dataset show that this method provides significant average quality improvement for streaming video applications.
Keywords
- H.264/SVC
- Bandwidth distribution
- Quality increments
- Quantization parameter
- MGS fragmentation
1. Introduction
Network bandwidth and error rate changes frequently in wireless networks because of user mobility and dynamic channel conditions. These can have critical impact on video streaming applications because video data are generally very sensitive to delay and error. For this reason, video codecs need to be more aware of the network conditions and should have adaptive bit-rate functions [1]. To address these issues, the Telecommunication Standardization Sector of the International Telecommunication Union (ITU-T) and ISO/IECMPEG (Moving Picture Experts Group) have published a draft H.264/SVC (scalable video codec) standard. The main feature of H.264/SVC is that it provides bandwidth-optimized transmission for video streaming by observing current network conditions [2, 3]. There are three types of scalability for H.264/SVC: quality, spatial, and temporal. The codec provides quality scalability by using medium or coarse grain scalability (MGS/CGS) that delivers quality refinements to a preceding layer representation. Fine grain scalability has not been adopted in the scalable baseline profile, high profile, and high intra-profiles of SVC [4–6] due to its complexity. In terms of coding mechanisms, MGS is almost the same as CGS, except that the coded data corresponding to the quantization step size can be fragmented into 15 layers with MGS, while CGS only provides 1 layer [7].
The objective of H.264/SVC is to achieve scalability without a significant loss in coding efficiency. As the scalable extension of H.264/AVC, the SVC codec inherits all of the coding tools of H.264/AVC. However, the SVC extension distinguishes itself from the scalable systems in prior video coding standards by incorporating an adaptive inter-layer prediction mechanism that has superior coding efficiency. It is worth noting that, although the SVC encoder’s operations are non-normative, the codec implementation is still quite flexible [8, 9] as long as its bit-streams conform to the specifications. Thus, selecting an appropriate coding scheme to improve peak signal-to-noise ratio (PSNR) performance of the SVC codec for different applications is a strong research focus in the field of video encoding.
In an existing work on MGS configurations, Gupta et al. [10] examine and analyze the rate distortion (R-D) performance of MGS with different weights and quantization parameters, but did not give a suitable coding scheme instruction for different applications. Burak et al. [11] compared the performance of various MGS fragmentation configurations and recommended combining the use of up to five MGS fragments. They showed that the extraction performance improved at low bandwidth values as the number of MGS fragments in the configuration increased, even as the number of fragments had a negative impact on PSNR performance due to fragmentation overhead. Others have presented R-D models for MGS scalable video coded streams [12, 13]. Based on the statistical proprieties of the DCT model, Michele et al. [12] estimated the PSNR of MGS once the rate is given, which requires the knowledge of the two extreme points of the enhancement part of the stream and a measure of the temporal complexity calculated over the raw luminance sequence. On the other hand, Long et al. [13] provided a different R-D model for each kind of picture based on picture type or temporal decomposition level using statistical data of the residual frames.
In this article, we translate the problem of selecting a suitable coding scheme and making effective use of the video coding bit steam into a problem of overall quality optimization for the user. Our proposed framework consists of two main modules: parameter calculation and performance analysis. First, we construct a mapping relationship between the bandwidth range and quantization parameter QP _{ e } to calculate the appropriate parameters, and then propose a coding rate calculation module for different MGS fragmentation configurations. We then find an optimal solution by iterating a proposed object function to quantify the user’s overall quality optimization.
This article has the following sections: Section 2 outlines our proposed framework for parameter optimization. Section 3 describes calculating QP _{ e } and the rate estimation for different number of MGS fragments. Section 4 describes the coding performance analysis for parameter selection. Finally, Section 5 demonstrates the experimental results of our proposed framework with conclusions provided in Section 6.
2. Problem formulation
SVC provides a good solution platform for different applications. In the H.264/SVC coding framework, the quality increment as defined by the quantization parameter QP _{ e } correlates with the quality fineness that the enhancement layer bit stream can get to, while MGS fragmentation correlates to scalability and PSNR performance. Therefore, it is important to select a suitable QP _{ e } and set an appropriate number of MGS fragmentations in the codec. Official and operator testing, and statistics [17, 18] provide the typical user bandwidth ranges and probability density distribution functions for different cities and applications. Because bit streams have similar scalability and performance when coding with the same number of MGS fragments (such as 3, 7, 6 and 5, 3, 8, where the MGS stack is split into 3 MGS fragments), in this article we concentrate on optimizing the quantization parameter QP _{ e } and n, which denotes as the number of MGS fragments.
where Q _{ i } is the video quality of user i, and N is the number of users in the bandwidth range. If the coding scheme results in a higher Q using (1), it means that the coding scheme is more suitable for users.
However, because of the high degree of flexibility in the values of QP _{ e } and number of MGS fragments n in the selection, it is difficult to select a suitable coding scheme by multi-coding that does not also dramatically increase encoding time and computational complexity. To solve this problem, we propose an effective algorithm in this article that can find suitable values of QP _{ e } and n for applications, while still achieving low complexity.
3. Calculation of parameter
3.1. Qe Determination
In the first GOP encoding, we set the n=15. c _{1}, c _{2}, and c are models coefficients which need to be initialized for the I/P/B frame R-Q models by using (2), the coding information of the I/P/B coding rate, and the MAD of the increments. Thus, the coding rate is the sum of the coding rates of the I/P and B frames.
Here, u, v, and h are the coding numbers of the I, P, and B frame increments, respectively, and R _{base} is the coding rate size of the base layers and is closely related to QP _{ b }. Note that we should select a suitable QP _{ b } that satisfies all users. Therefore, we select QP _{ b } so as to make R _{base} close to the lower bound of the bandwidth distribution of the users. R _{Menh} is the coding rate of the quality increments, which is equivalent to the sum of the coding rate of the quality increments of the I/P/B frames (R _{Ienh}/R _{Penh}/R _{Benh}). In (4), c _{1d } and c _{2d } are quadratic R-Q model coefficients for different coding region units of the I frame, while c _{1q } and c _{2q } are quadratic R-Q model coefficients for different coding region units of the P frame. Finally, QP _{ e } is the quantization step of the quality increments and d, q, and s are the coding region numbers of the I/P/B frames, respectively.
Where QP _{0} is a given quantization parameter, N is the number of temporal layers, and QP _{ e } ^{ i }is the quantization parameter for temporal level t (0 ≤ i ≤ N − 1).
3.2. Rate Estimation for Different n
where R _{ top } is the coding bit rate when coding with n = 15, R _{cgs} is the coding bit rate when adopting a CGS coding mechanism (equivalent to n = 1), and R(Sub(R _{ i })) is the bit rate of the sub-stream that contains all increments with quality level <= i.
Actually, we initial QPe which makes the coding rate close to the upper of users’ bandwidth region when n = 15 in Section 3.1. The number of MGS fragments n selection after QP _{ e } determination will result in coding rate decrease.
4. Performance for various coding schemes
4.1. Performance analysis
Where N is the number of users in the bandwidth range, K is the set of coding configurations, Q(.) is each user’s video quality, and R is the encoding bit rate with R _{ i } the bandwidth restriction of user i when adopting coding configuration M. To maximize this objective function, we must consider the scalability and video quality when adopting different number of MGS fragments n by using the coding bit rate determined in Section 3.2. We accomplish this by calculating the scalability of the bit streams for different temporal levels according to the number of coding frames and quality increments. By combining this information with bandwidth probability density distribution of the users, we can optimize the number of MGS fragments n through the following process:
Because of inter-frame prediction, the data importance of the different temporal levels is different. T _{ n } is the maximum temporal level. As usual, the data level priorities should be I >P> B _{ 1 } > B _{ 2 } … where B _{ 1 } and B _{ 2 } denote increments of temporal levels 2 and 3, respectively.
where R _{ i } is the coding bit-rate of temporal level i when coding with n = 15, K _{ i } is the fidelity scalability of temporal level i when n = M. T _{ n } is the maximum temporal level and R _{ M } is the bit-rate when coding with n = M. R _{ top } is the bit-rate when n = 15, and Δ _{ i } is the average interval of fidelity scalability for temporal level i when n = M.
Where Q _{ M }(s) is the video quality for n = M when the user’s bandwidth <= s, R _{ b } is the coding bit-rate of the base layer, and D is the interval between points of the scalable fidelity. Stm(v) is the sub-stream when coding bit rate of the sub-stream close to v. The algorithm iterates step 3 until the algorithm reaches a predefined quality requirement.
4.2. Computational complexity analysis
where T _{ e } is the coding time of an increment and T _{ b } is the coding time of the base layer. M is the frame number of a GOP, J is GOP number of an intra-period, F is the number that frames to be encoded, L is the number of MGS fragments when coding in traditional.
where K is the point number of scalability, and U and V are addition and multiplication operations, respectively.
5. Experimental results
In this section, we evaluate our approach using a public video dataset [21] consisting of 28 video clips (QCIF, CIF, and VGA), and the joint scalable video model (JSVM) platform [22]. As shown in Table I, these video clips mainly contain video sequences in genres such as “Foreman,” “Bus,” and “Mobile.” The main objective of the three experiments conducted using these clips is to evaluate whether our approach achieves an appropriate coding configuration to improve overall video quality for users. We randomly selected nine video clips from the public video dataset to perform the process test, and selected the “Mobile” and “Bus” video sequences for performance testing. In the first experiment, we analyze the use of different coding bit-rate weights while using the same number of MGS fragments. The second experiment tests the proposed mapping relationship between the sub-stream bit rate and the actual coding bit-rate for bit rate estimation. Finally, in the third experiment, we compare our approach with results using standard compression schemes to demonstrate the suitability of this new method.
5.1. Coding bit rate comparison with same number of MGS fragments
Coding bit-rate state of different coding schemes when adopting the same n (QPb=45, QPe=35, coding frames=48, frame rate=30, GOPsize=8)
Kbps | [3 8 5] | [6 3 7] | [5 3 1 4 3] | [2 5 6 1 2] | [1 3 2 4 1 2 3] | [4 1 2 1 1 3 4] |
---|---|---|---|---|---|---|
Foreman | 561.6 | 571.3 | 623.2 | 625 | 685.9 | 677.9 |
Soccer | 692.4 | 702 | 750.2 | 755.5 | 847.2 | 828.4 |
Bus | 1115.3 | 1121.4 | 1176.6 | 1182 | 1299.9 | 1294.9 |
Coastguard | 903.8 | 900.2 | 942.7 | 987.4 | 1054.6 | 1030.9 |
Container | 494.9 | 499.2 | 549.4 | 550.4 | 616.9 | 614.8 |
Stefan | 1169 | 1172.9 | 1222.3 | 1232.3 | 1361.1 | 1363.4 |
Highway | 316.6 | 324.1 | 375.6 | 373.2 | 429.6 | 425.3 |
Mobile | 1491.4 | 1492.5 | 1566.5 | 1557.1 | 1699 | 1712 |
Bridge | 534.8 | 540.8 | 592 | 594.8 | 669.4 | 653.9 |
5.2. Coding bit-rate estimation with different number of MGS fragments
Our analysis shows this linear relationship holds true for a number of different correlations. First, experiment in Section 5.1 and a subsequent correlation analysis demonstrate that there is an approximately linear mapping relationship between MGS fragment coding bit rate and percentage of fragment quality in quality increments. Thus, payload data interception due to sub-stream extraction results in a mainly linear descending bit rate. Second, the sub-stream and the actual coding have the same NALUs and similar bit rate sizes for their respective NAL headers. Finally, as the percentage of fragment quality in quality increments increases, the error drift that results due to inter-frame prediction will also increase in a linear fashion.
From Figure 6b,d we can see the proportional relationship between the sub-stream bit rate and the percentage of fragment quality in quality increments. Figure 6a,c shows a comparison between the calculated coding bit rate after transformation using (7) and the actual coding bit rate. We found that there is less than 5% deviation between the actual and calculated coding bit rates.
5.3. Q _{ e }and n Determination
In this experiment, we analyze coding performance when using the optimized coding configuration determined through our method, and compare these results against the coding performance when using other coding configurations. This experiment continues to use the “Mobile” and “Bus” (CIF format) video sets as the test objects. Coding configuration optimization is mainly reflected in QP _{ e } and n determination for a given users’ bandwidth distribution.
After determining a suitable QP _{ e }, we next looked at n selection under different user bandwidth probability density distributions by using the algorithm described in Sections 3.2 and 4 to optimize n, then comparing the coding performance results to actual coding performance produced using other configurations. For this experiment, we conducted six tests using three different bandwidth probability density distributions with multiple users.
Take Mobile sequence coding for an example to analyze the process of our algorithm. In pre-encoding, we first determine a suitable QP _{ e } = 25 by (2)–(6) when users’ bandwidth is limited in 600–5300 kbps (Figure 7a). By setting frame rate = 15, GOP size = 16 and coding schemes of IBBBPBBB, we know that temporal level = 3 in the bit stream. Level of GOP quality scalability can also be calculated by (9) where M is the number of quality increments. In pre-encoding, do sub-stream extraction to get bit-rate information of different temporal level when pre-encoding with n = 15. According to the bit-rate calculation with different n in 3.2, build linear mapping relation to calculate the bit rate of different temporal level with n = M. With temporal levels of GOP quality scalability (calculated by Equation 10), bit rate of different temporal scalability and (11), we calculate average interval Δ _{ i } between two scalable points of temporal levels. According to Δ _{ i }, bit rate D of every scalable point can be calculated. Take curve relation of PSNR–Bitrate when pre-encoding with n = 15 as reference, we estimate Q _{ M }(s) with n = M when the user’s bandwidth <=s according to the linear mapping relation. Combined with the situation that f(x) which denotes as probability density distribution functions of users bandwidth subject to normal distribution (mean = 600, standard deviation = 1000), we can calculate max(.) with (13), which means overall video quality for all users with n = M. Find the maximum value when 1 ≤ M ≤ 15 and take M as optimum value n.
Performance of different n A (normal distribution): QP _{ e } =25; mobile mean=600 with standard deviation=1000; bus mean=500 with standard deviation=650
Sequence | n Num | Calculation | Actual | Sequence | n Num | Calculation | Actual |
---|---|---|---|---|---|---|---|
(dB) | (dB) | (dB) | (dB) | ||||
Mobile Sequence (600–5300 Kb) | 1 | 25.86 | 25.34 | Bus Sequence (500–3600 Kbps) | 1 | 24.83 | 26.45 |
2 | 26.18 | 25.98 | 2 | 25.99 | 27.12 | ||
3 | 26.24 | 26.04 | 3 | 26.69 | 27.28 | ||
4 | 26.228 | 26.11 | 4 | 26.73 | 27.39 | ||
5 | 26.18 | 26.05 | 5 | 27.43 | 27.41 | ||
6 | 26.11 | 26.06 | 6 | 27.42 | 27.444 | ||
7 | 26.01 | 26.06 | 7 | 27.42 | 27.457 | ||
8 | 25.98 | 26.03 | 8 | 27.4 | 27.448 |
Performance of different n B (uniform distribution)
Sequence | n Num | Calculation | Actual | Sequence | n Num | Calculation | Actual |
---|---|---|---|---|---|---|---|
(dB) | (dB) | (dB) | (dB) | ||||
Mobile Sequence (600–5300 Kb) | 1 | 31.32 | 31.06 | Bus Sequence (500–3600 Kbps) | 1 | 32.26 | 31.65 |
2 | 31.43 | 31.38 | 2 | 32.47 | 32.29 | ||
3 | 31.39 | 31.28 | 3 | 32.49 | 32.27 | ||
4 | 31.25 | 31.06 | 4 | 32.31 | 32.4 | ||
5 | 31.12 | 31.07 | 5 | 32.46 | 32.17 | ||
6 | 30.9 | 30.88 | 6 | 32.22 | 32.18 | ||
7 | 30.74 | 30.78 | 7 | 32.14 | 32.16 | ||
8 | 30.63 | 30.63 | 8 | 32 | 32.03 |
Performance of different n C (normal distribution): QP _{ e } = 25; mobile mean = 5300 with standard deviation = 1000; bus mean = 3600 with standard deviation = 650
Sequence | n Num | Calculation | Actual | Sequence | n Num | Calculation | Actual |
---|---|---|---|---|---|---|---|
(dB) | (dB) | (dB) | (dB) | ||||
Mobile Sequence (600–5300 Kb) | 1 | 36.01 | 36.21 | Bus Sequence (500–3600 Kbps) | 1 | 39.13 | 38.33 |
2 | 35.96 | 36.11 | 2 | 38.75 | 38.16 | ||
3 | 35.87 | 35.95 | 3 | 38.42 | 37.92 | ||
4 | 35.67 | 35.94 | 4 | 38.19 | 38.02 | ||
5 | 35.49 | 35.55 | 5 | 38 | 37.57 | ||
6 | 35.15 | 35.15 | 6 | 37.62 | 37.5 | ||
7 | 34.92 | 34.96 | 7 | 37.42 | 37.44 | ||
8 | 34.74 | 34.63 | 8 | 37.19 | 37.21 |
6. Conclusion and discussion
In this article, we proposed a parameters’ optimization scheme in order to eliminate uncertainties when selecting coding parameters. Compare with existing methods, our approach builds a relationship map of the quantization parameter QP _{ e }, user bandwidth range, and an optimum number of fragments n based on the bandwidth probability density distribution of the users. The algorithm first calculates the bit rate of the coding bit streams according to I/P/B frame R-D characteristics. Then, based on the result of this calculation, a coding bit rate estimation method when coding with different n is proposed for analyses the enhancement layer bit rates. Finally, an optimized coding configuration is calculated and combined with the user bandwidth probability density distribution and a bit-stream fidelity scalability analysis. The experimental results show that our approach can significantly improve average video quality for users.
- (1)
it constructs a mapping relationship between the bandwidth range of the users and quantization parameter QP _{ e } by presenting a coding rate calculation module that correlates coding rate and percentage of fragment quality in quality increments;
- (2)
The scheme takes overall quality for the users as a metric standard, and translates the problem of coding optimization into a problem of overall user quality optimization and effectively optimizing the number of MGS fragments;
- (3)
Provides a coding configuration optimization approach suitable for different bandwidth ranges and user bandwidth probability density distributions.
In future work, we will extend this approach to other coding parameter configurations that take transmission effects and user environments into consideration. Moreover, we will also explore ways to reduce computational complexity.
