Research Article | Open | Published:
Classified Region Algorithm for Fast Intermode Decision in H.264/AVC Encoder
EURASIP Journal on Advances in Signal Processingvolume 2010, Article number: 150809 (2010)
H.264, MPEG-4 Part 10, is the latest digital video coding standard that achieves very high data compression by using several new coding features. One of the new features is variable block sizes for interframe coding to increase compression efficiency. However, to achieve this, the H.264 encoder employs a complex mode decision technique based on rate-distortion optimization (RDO) that requires high computational complexity, which significantly increases the encoder complexity. In this paper, we propose a classified region algorithm(CRA) that analyzes the spatial and temporal homogeneity of the block by using cross differences to reduce the number of modes that are required for RDO calculation in inter mode decision. The proposed low computational complexity algorithm significantly reduces the number of inter modes without affecting the video quality. The experimental results show that the proposed method is able to reduce complexity by up to 67% on average with negligible degradation in both objective and subjective quality.
Compression technology plays a vital role in multimedia devices. Compression technology should compress a file without significantly degrading the video quality. A new video coding standard was developed by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG to be the next generation of video compression standards, known as H.264 or MPEG-4 Part 10 Advanced Video Coding (MPEG-4 AVC) . Compared to previous MPEG-2 and MPEG-4 video standards, the latest H.264 video coding with various new coding tools can improve coding efficiency by up to 50% . One of the novel features in H.264 video coding is the use of different coding modes for MB in P slice, such as SKIP, INTER 16 × 16, INTER 16 × 8, INTER 8 × 16, INTER 8 × 8, INTER 8 × 4, INTER 4 × 8, INTER 4 × 4, INTRA 16 × 16, and INTRA 4 × 4, to best present the temporal and spatial details in an MB. To select the best mode, rate-distortion optimization (RDO) is employed to achieve coding efficiency. In order to do this, all the MB modes are tried, and the one that leads to the lowest RD cost is selected to achieve the best tradeoff between rate and distortion performance [3, 4]. However, the RDO technique dramatically increases the computational complexity of the H.264 encoder. Therefore, a more efficient algorithm that reduces the computation of inter prediction is highly desirable. Several approaches have been proposed to achieve fast interprediction. The conditions that are used to detect the SKIP mode are quite familiar since all the other MB sizes and respective sub-MB sizes are skipped [5, 6]. Search point reduction is achieved by means of introducing fast motion estimation algorithms that are helpful in reducing the search point. Zhou and Sun.  introduced a new fast adaptive search strategy (ADSS), which combines different search strategies to reduce computation with negligible degradation in the rate-distortion (RD) performance. Moon et al.  proposed an early termination algorithm based on all-zero 4 × 4 blocks that reduce the computation complexity of the motion estimation process and computations in the discrete cosine transform (DCT) and the quantization process. Wang et al.  reduced the complexity of the mode decision by optimizing the ME and mode decision algorithms. Feng et al.  used a Rate-Distortion cost comparison to significantly reduce the complexity with significant loss of video quality. Some studies have used motion estimation information for inter mode decisions. Kuo et al.  proposed a multiresolution motion estimation scheme and an adaptive rate-distortion model with early termination rules to accelerate the search process. You et al.  suggested a method that analyzes the results of 16 × 16 ME (motion estimation) of a macroblock according to the proposed decision model, to estimate whether the macroblock partition size should be further divided. Crecos and Yang.  used neighborhood information with a set of skip mode conditions for enhanced skip mode decisions, which subsequently performs inter mode decision for the remaining macroblocks by using a gentle set of smoothness constraints. Kim et al.  proposed an algorithm using the property of an all-zero coefficients block that is produced by quantification and coefficient thresholding to effectively eliminate unnecessary intermodes. However, it needs to transform coefficients to decide on the inter mode. Wu et al.  effectively used the Sobel operator to reduce the total number of inter modes, by using intramode decision results. Therefore, in this approach, the inter mode decision partially depends on intra mode results.
Feature-based intra-/intercoding mode selection schemes have also been reported in . Choi et al.  used early SKIP mode conditions and selective intra mode decisions to decrease the complexity of inter mode decisions. Saldago proposed a sequence independent fast inter mode decision algorithm that decreases the encoding time . However, the bitrate increment is extremely high with significant video quality loss. Unlike the above methods, the homogeneity of a block is classified using the mean absolute difference (MAD) of an MB and the mean absolute frame difference (MAFD) . Although the method is quite useful, the average number of modes used is still high. In this paper, we propose a classified region algorithm that analyzes the spatial and temporal homogeneity of a block using 16 × 16 MB and 8× 8 block patterns, respectively. Since the proposed method hierarchically reduces the number of modes required for inter mode prediction, the regular structure leads to easy hardware implementation. The proposed fast algorithm reduces the total number of required modes for inter mode decisions with negligible degradation of video quality.
The rest of the paper is organized as follows. Section 2 gives an overview of the inter-and intrapredictions suggested in H.264. Section 3 explains the proposed algorithm in detail. Section 4 uses experimental results to evaluate the proposed algorithm. Finally, Section 5 contains the conclusion.
2. Overview of Inter-/Intraprediction
H.264/AVC employs two important techniques to exploit the temporal and spatial correlation of frames, namely, inter- and intra prediction, respectively. The H.264 standard allows intra mode in an inter mode prediction. The mode that produces lowest RD cost is chosen. The following sub-sections briefly describe the inter and intra mode decision process of the H.264 standard.
2.1. Intermode Decision
Inter prediction creates a prediction model from one or more previously encoded video frames. In order to represent scene movements more accurately, H.264 uses seven different block sizes. Thus, a luminance 16 × 16 macroblock (MB) is divided into seven variable different block sizes (VBSs), namely, 16 × 16, 16× 8, 8× 16, 8 × 8, 8 × 4, 4 × 8, and 4 × 4, as shown in Figure 1. In addition, P slice has a SKIP mode that is likely to be adopted in regions with a stationary or global motion features. Generally, in video sequences, large areas with similar motions are likely to be coded using a large block size. Areas containing complex motion are likely to be coded using a smaller block size. Even if the current macroblock belongs to an interslice, H.264 should examine all intra prediction directions of intra 4 × 4 and intra 16 × 16 and finally choose the mode that produces the lowest RD cost. The residuals of inter prediction modes are calculated by performing a motion estimation. Directional prediction is applied for intra prediction modes. The best prediction mode is selected by minimizing the following Lagrange function:
where QP is the quantization parameter, MODE is the Lagrange multiplier for mode decision, SSD is the sum of the squared differences between the original block luminance (denoted by ) and its reconstruction , and R(, MODE/QP) represents the number of bits associated with the chosen MODE, which includes the bits required for the coding of the selected prediction mode and the DCT coefficients for the given block.
Because the above RDO procedure for intra-/inter mode decisions in H.264 is employed, the computational cost is high, especially since the RDO procedure for inter modes is more complex than that of intra modes because it employs computationally intensive full search motion estimation. When full search motion estimation is employed for the seven block sizes, only one of the best block size motion vectors is used; all the other block size motion vectors are discarded. Therefore, a lot of computational resources are wasted by testing all seven block sizes.
2.2. Intramode Decision
The intra prediction in the H.264/AVC standard effectively reduces the spatial correlation among 16 × 16 and 4 × 4 blocks, where the current block is predicted by adjacent pixels in the upper and left blocks, which are coded earlier . Along various texture directions, H.264 offers a rich set of prediction patterns, called modes, for intra prediction. For the original intra prediction, prediction modes 9, 4, and 4 need to be tried for 4 × 4 luma, 16 × 16 luma, and 8 × 8 chroma blocks, respectively. Each mode has its own direction of prediction, and the predicted samples are obtained from a weighted average of decoded values of neighboring pixels . Generally, Intra_4 × 4 is well suited for an MB with detailed information, while Intra_16 × 16 is appropriate for smooth MBs. Figure 2 shows eight directional prediction modes, which are represented by Modes 0, 1, 2, 3, 4, 5, 6, and 7, with Mode 2 being the DC mode in Intra_4 × 4 prediction. For Intra_16 × 16 luma and Intra_8 × 8 chroma blocks, the prediction consists of vertical, horizontal, DC, and plane prediction modes represented by Modes 0, 1, 2, and 3, respectively. The decision modes in chroma prediction are similar to those of Intra_16 × 16 except they have different block sizes.
3. Proposed Classified Region Algorithm
H.264/AVC employs seven different block sizes that vary from 16 × 16 to 4 × 4. 16 × 16 MB partitions are used for homogeneous areas, that is, areas that have similar motion, and P8 × 8 sub-MBs are used for nonhomogeneous areas. The conventional inter mode decision of the H.264/AVC is depicted in Figure 2. The RD cost is determined for all seven inter modes. Finally, the mode that gives the lowest RD cost is selected. Therefore, it wastes computational resources by adopting a brute force search method. This problem can be solved by adopting an appropriate pre-processing stage, as shown in Figure 3. The preprocessing stage reduces the number of inter modes that are required for RD cost calculation by examining the homogeneity of the macroblock (MB). Thus, a conventional full mode search can be avoided, saving computational resources.
Before implementing our algorithm, we conducted a set of experiments on standard benchmark video sequences to determine the best inter mode selection in homogeneous and non-homogeneous regions of the video sequences.
3.1. Statistical Analysis of Block Sizes in Video Sequences
In order to analyze the inter mode occurrence percentage, we used a set of video sequences that had different motion properties. Figures 4,5,6,7 and 8 show the statistical results of various sequences (300 frames, , number of reference frame = 1).
From the above analysis, it can be seen that the SKIP and 16 × 16 MB size occurrences are greater than those of other rectangular blocks. It is noted that 8 × 8 block occurrence is also considerably high in the sub-MB size partition. Theoretically, a 16 × 16 block size should be chosen for a homogeneous region. However, some homogeneous blocks used rectangular block sizes (16 × 8 or 8 × 16). Therefore, in the proposed algorithm, we not only choose 16 × 16 but also 16 × 8 or 8 × 16 block size for RDO calculation if the detected region is homogeneous. The results show that the SKIP, 16 × 16 and 8 × 8 block size occurrences are always greater than those of other block sizes and that homogeneous blocks do not always choose the 16 × 16 block size but sometimes choose 8 × 16 or 16 × 8. Although intra modes are allowed in inter mode prediction, the overall occurrence of intra mode is less than 3%, which is very low. Intra prediction significantly increases complexity, which could be avoided by having a suitable criterion in inter mode decision.
3.2. Proposed Classified Region Algorithm (CRA)
The texture in a block plays a vital role in determining whether a region is homogeneous or not [22, 23]. A number of effective methods have been proposed in the literature to detect homogeneous regions, but their implementation increases the complexity, which limits a fast algorithm's implementation. Our proposed low computational complexity algorithm uses edge information to determine whether region is homogeneous and chooses the appropriate block size for the region. The proposed classified region algorithm (CRA) is based on a computation of the gradient function of the current block. In order to implement our prediction algorithm, a 16 × 16 MB is formed by using blocks, and the average of the each block is found using (2), as shown in Figure 9(a). The prediction algorithm is then applied to the block as shown in Figure 10(b). Each position of the block is marked by the and indices as , and :
where "" and "" indicate the starting positions of the luma blocks.
The implementation procedure for the proposed CRA algorithm is as follows.
We use an intensity gradient filter to explore the edge orientation of an image. The intensity gradient filter is as follows:
where represents the intensity gradient of pixel . When the value of is close or equal to zero, it means that there is an orientation.
The intensity gradient filter for vertical orientation is applied to four pixels (, , , and ) as shown in Figure 10(a) and calculated using the following equation.
Figure 10(b) is an intensity gradient filter orientation for four pixels (, , , and ) using(5)
The intensity differences in the vertical and horizontal directions are computed. Using these values, we determine the homogeneity of the block using an appropriate threshold value. By using all the above conditions, we then check the homogeneity of the block, which is explained in the following sub-sections.
3.3. Spatial Homogeneity
In order to decide whether a block is homogeneous, we use the sum of the edge amplitudes of the block, as shown in (7). If the block pixel amplitude is less than that of the predefined threshold, then the block is homogeneous; otherwise, it is non-homogeneous:
The above condition is the same for 8 × 8 block sizes except for the value of the pre-defined threshold. The threshold values for 16 × 16 and 8 × 8 block sizes are 295 and 8, respectively.
3.4. Temporal Homogeneity
Spatial homogeneity conditions are applicable inside the frame, namely, the spatial domain. In order to exploit the homogeneous blocks between frames, in the temporal domain, we use the difference between the current macroblock (MB) and the corresponding MB that is shown in (8). If the difference between two MBs is less than the pre-defined threshold then the block is temporally homogeneous; otherwise, it is non-homogeneous as shown in
where , and , denote the pixel intensities in the previous MB and the present MB, respectively. We adopt this common method since our main aim is to develop a pre-processing algorithm suitable for VLSI realization. Therefore, it helps to maintain a regular structure without increasing the complexity of the proposed algorithm. The above condition is applicable for 8 × 8 block sizes except for the threshold value. The threshold values are 420 and 115 for 16 × 16 and 8 × 8, respectively.
3.5. Overall Flow of the Classified Region Algorithm (CRA)
In order to maintain a regular structure in the proposed algorithm, the hierarchical order is maintained by choosing 16 × 16 block conditions first and 8 × 8 blocks later on (see Figure 12).
In our classified region pre-processing algorithm, the homogeneous blocks not only choose 16 × 16, but also 8 × 16 or 16 × 8 block sizes. Therefore, block sizes 8 × 16 and 16 × 8 are chosen according to the edge direction of the block. For example, as shown in Figure 11(a), if a block has a vertical edge we choose the 8×16 block size. If a block has a horizontal edge as shown in Figure 11(b), we choose the 16×8 block size. In order to detect the edge direction, we use the precomputed edge direction results from (4) and (5). We then compare the two results to decide the block size as follows:
Although we choose the rectangular block using edge direction, 16 × 16 is chosen for some blocks with edges. In order to avoid this situation, our algorithm always chooses 16 × 16 with 8 × 16 or 16 × 8 for RDO calculations. The performance of the proposed CRA algorithm is thus maintained. It should be noted that we do not increase the overhead to decide between 8 × 16 or 16 × 8 since we use the pre-computed edge direction values from CRA that are shown in (4) and (5). If a block is temporally homogeneous, we do not include intra mode for RDO calculations because intra mode is used to exploit spatial correlations. The mode decision procedure for 8 × 8 blocks is the same as that for 16 × 16 MB.
3.6. Complexity Analysis
In this section, we analyze the complexity of the proposed method. Our method greatly reduces complexity, about 67%, compared to the conventional method. A conventional encoder employs the RDO method, which includes full mode search to choose the best block size. Therefore, only one block size and the associated motion vector (MV) can be chosen to encode a macroblock; all other block sizes and associated MVs are discarded. Hence, a conventional JM increases the complexity whereas our proposed method reduces the number of inter modes without increasing the overhead of the inter mode decision. The proposed method avoids the exhaustive full mode search method, which includes all seven modes plus two intra modes for RDO calculation.
4. Experimental Results and Discussion
The proposed CRA fast mode decision algorithm was implemented on JM11.0, provided by JVT , with the following test conditions. In our simulation, the total number of frames was 200, the number of reference frames was one, RD optimization was enabled, main profile had sequence type IPPP, the search range was ± 16, and CAVLC was enabled. Tables 1 and 2 show the summary of performance, which is calculated according to the numerical averages with different QP (24, 28, and 32) values. We defined four measures for evaluating the encoding performance, including average PSNR, average BR, average mode number saving factor , and an encoding-time saving factor , which are all defined as follows:
If mode number saving factor () and time saving factor (Time) value increase, performance speed increases. It must be noted that positive values for the average PSNR and bit-rate indicate increments and negative values indicate decrements. Table 3 shows the average values of PSNR, bitrate, mode saving factor, and encoding time reduction. From the results, we can see that the proposed algorithm achieves a better mode number factor and encoding time reduction with a minimal loss of image quality and a minimal bit increment. Here, the mode number saving factor and encoding time reduction are higher than those of the existing methods [16, 17]. The proposed algorithm reduces the complexity of the inter mode decision without increasing the complexity of the pre-processing stage with a negligible bit-rate increment, but provides a higher mode saving factor and encoding time reduction in most sequences. Our proposed method was compared with the recent work as shown in Table 4.
In this paper, we proposed a classified region fast inter mode decision algorithm for H.264/AVC. The proposed fast algorithm's simulation results showed that the proposed method achieved good encoding reduction up to 67% while maintaining negligible degradation in objective and subjective video quality. Therefore, this algorithm can be used as pre-processing unit for inter prediction units to decrease RDO complexity and encoding time.
Draft ITU-T Recommendation and Final Draft International Standard of Joint Video Specification (ITUT Rec. H.264/ISO/IEC 14496-10 AVC), March, 2003
Wiegand T, Sullivan GJ, Bjøntegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13(7):560-576.
Sullivan G, Wiegand T, Lim K-P: Joint model reference encoding methods and decoding concealment methods. Proceedings of the 9th Joint Video Term Meeting (JVT-I049d0), September 2003, San Diego, Calif, USA
Yang E-H, Yu X: Rate distortion optimization for H.264 interframe coding: a general framework and algorithms. IEEE Transactions on Image Processing 2007, 16(7):1774-1784.
Yang L, Keman Y, Li J, Li S: An effective variable block-size early termination algorithm for H.264 video coding. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(6):784-788.
Al Qaralleh EA, Chang T-S: Fast variable block size motion estimation by adaptive early termination. IEEE Transactions on Circuits and Systems for Video Technology 2006, 16(8):1021-1026.
Zhou Z, Sun M-T: Fast macroblock inter mode decision and motion estimation for H.264/MPEG-4 AVC. Proceedings of the International Conference on Image Processing (ICIP '04), October 2004 243-263.
Moon YH, Kim GY, Kim JH: An improved early detection algorithm for all-zero blocks in H.264 video encoding. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(8):1053-1057.
Wang H, Kwong S, Kok C-W: An efficient mode decision algorithm for H.264/AVC encoding optimization. IEEE Transactions on Multimedia 2007, 9(4):882-888.
Feng B, Zhu G-X, Liu W-Y: Fast adaptive inter-prediction mode decision method for H.264 based on spatial correlation. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '06), May 2006 1804-1807.
Kuo C-H, Shen M, Kuo C-CJ: Fast inter-prediction mode decision and motion search for H.264. Proceddings of the IEEE International Conference on Multimedia and Expo (ICME '04), June 2004 1: 663-666.
You J, Kim W, Jeong J: 16×16 macroblock partition size prediction for H.264 P slices. IEEE Transactions on Consumer Electronics 2006, 52(4):1377-1383.
Grecos C, Yang MY: Fast inter mode prediction for P slices in the H264 video coding standard. IEEE Transactions on Broadcasting 2005, 51(2):256-263. 10.1109/TBC.2005.846192
Kim Y-H, Yoo J-W, Lee S-W, Shin J, Paik J, Jung H-K: Adaptive mode decision for H.264 encoder. Electronics Letters 2004, 40(19):1172-1173. 10.1049/el:20046155
Wu D, Pan F, Lim KP, Wu S, Li ZG, Lin X, Rahardja S, Ko CC: Fast intermode decision in H.264/AVC video coding. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(7):953-958.
Jing X, Chau L-P: Fast approach for H.264 inter mode decision. Electronics Letters 2004, 40(17):1050-1052. 10.1049/el:20045243
Choi I, Lee J, Jeon B: Fast coding mode selection with rate-distortion optimization for MPEG-4 Part-10 AVC/H.264. IEEE Transactions on Circuits and Systems for Video Technology 2006, 16(12):1557-1561.
Turaga DS, Chen T: Classification based mode decisions for video over networks. IEEE Transactions on Multimedia 2001, 3(1):41-52. 10.1109/6046.909593
Salgado L, Nieto M: Sequence independent very fast mode decision algorithm on H.264/AVC baseline profile. Proceedings of the IEEE The International Conference on Image Processing (ICIP '06), October 2006 41-44.
Ri S-H, Vatis Y, Ostermann J: Fast inter-mode decision in an H.264/AVC encoder using mode and Lagrangian cost correlation. IEEE Transactions on Circuits and Systems for Video Technology 2009, 19(2):302-306.
Liu D, Sun X, Wu F, Zhang Y-Q: Edge-oriented uniform intra prediction. IEEE Transactions on Image Processing 2008, 17(10):1827-1836.
Girod B: Efficiency analysis of multihypothesis motion-compensated prediction for video coding. IEEE Transactions on Image Processing 2000, 9(2):173-183. 10.1109/83.821595
Liu X, Wang DLL, Srivastava A: Image segmentation using local spectral histograms. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001 70-73.
Joint Video Term (JVT) : reference software JM 11.0. http://iphome.hhi.de/suehring/tml
This paper is supported in part by Korea University research grant.