 Research Article
 Open Access
Classified Region Algorithm for Fast Intermode Decision in H.264/AVC Encoder
 K. Bharanitharan^{1}Email author,
 BinDa Liu^{2} and
 JarFerr Yang^{2}
https://doi.org/10.1155/2010/150809
© K. Bharanitharan et al. 2010
 Received: 4 February 2010
 Accepted: 2 September 2010
 Published: 13 September 2010
Abstract
H.264, MPEG4 Part 10, is the latest digital video coding standard that achieves very high data compression by using several new coding features. One of the new features is variable block sizes for interframe coding to increase compression efficiency. However, to achieve this, the H.264 encoder employs a complex mode decision technique based on ratedistortion optimization (RDO) that requires high computational complexity, which significantly increases the encoder complexity. In this paper, we propose a classified region algorithm(CRA) that analyzes the spatial and temporal homogeneity of the block by using cross differences to reduce the number of modes that are required for RDO calculation in inter mode decision. The proposed low computational complexity algorithm significantly reduces the number of inter modes without affecting the video quality. The experimental results show that the proposed method is able to reduce complexity by up to 67% on average with negligible degradation in both objective and subjective quality.
Keywords
 Block Size
 Discrete Cosine Transform
 Motion Estimation
 Video Quality
 Mode Decision
1. Introduction
Compression technology plays a vital role in multimedia devices. Compression technology should compress a file without significantly degrading the video quality. A new video coding standard was developed by the Joint Video Team (JVT) of ISO/IEC MPEG and ITUT VCEG to be the next generation of video compression standards, known as H.264 or MPEG4 Part 10 Advanced Video Coding (MPEG4 AVC) [1]. Compared to previous MPEG2 and MPEG4 video standards, the latest H.264 video coding with various new coding tools can improve coding efficiency by up to 50% [2]. One of the novel features in H.264 video coding is the use of different coding modes for MB in P slice, such as SKIP, INTER 16 × 16, INTER 16 × 8, INTER 8 × 16, INTER 8 × 8, INTER 8 × 4, INTER 4 × 8, INTER 4 × 4, INTRA 16 × 16, and INTRA 4 × 4, to best present the temporal and spatial details in an MB. To select the best mode, ratedistortion optimization (RDO) is employed to achieve coding efficiency. In order to do this, all the MB modes are tried, and the one that leads to the lowest RD cost is selected to achieve the best tradeoff between rate and distortion performance [3, 4]. However, the RDO technique dramatically increases the computational complexity of the H.264 encoder. Therefore, a more efficient algorithm that reduces the computation of inter prediction is highly desirable. Several approaches have been proposed to achieve fast interprediction. The conditions that are used to detect the SKIP mode are quite familiar since all the other MB sizes and respective subMB sizes are skipped [5, 6]. Search point reduction is achieved by means of introducing fast motion estimation algorithms that are helpful in reducing the search point. Zhou and Sun. [7] introduced a new fast adaptive search strategy (ADSS), which combines different search strategies to reduce computation with negligible degradation in the ratedistortion (RD) performance. Moon et al. [8] proposed an early termination algorithm based on allzero 4 × 4 blocks that reduce the computation complexity of the motion estimation process and computations in the discrete cosine transform (DCT) and the quantization process. Wang et al. [9] reduced the complexity of the mode decision by optimizing the ME and mode decision algorithms. Feng et al. [10] used a RateDistortion cost comparison to significantly reduce the complexity with significant loss of video quality. Some studies have used motion estimation information for inter mode decisions. Kuo et al. [11] proposed a multiresolution motion estimation scheme and an adaptive ratedistortion model with early termination rules to accelerate the search process. You et al. [12] suggested a method that analyzes the results of 16 × 16 ME (motion estimation) of a macroblock according to the proposed decision model, to estimate whether the macroblock partition size should be further divided. Crecos and Yang. [13] used neighborhood information with a set of skip mode conditions for enhanced skip mode decisions, which subsequently performs inter mode decision for the remaining macroblocks by using a gentle set of smoothness constraints. Kim et al. [14] proposed an algorithm using the property of an allzero coefficients block that is produced by quantification and coefficient thresholding to effectively eliminate unnecessary intermodes. However, it needs to transform coefficients to decide on the inter mode. Wu et al. [15] effectively used the Sobel operator to reduce the total number of inter modes, by using intramode decision results. Therefore, in this approach, the inter mode decision partially depends on intra mode results.
Featurebased intra/intercoding mode selection schemes have also been reported in [18]. Choi et al. [17] used early SKIP mode conditions and selective intra mode decisions to decrease the complexity of inter mode decisions. Saldago proposed a sequence independent fast inter mode decision algorithm that decreases the encoding time [19]. However, the bitrate increment is extremely high with significant video quality loss. Unlike the above methods, the homogeneity of a block is classified using the mean absolute difference (MAD) of an MB and the mean absolute frame difference (MAFD) [16]. Although the method is quite useful, the average number of modes used is still high. In this paper, we propose a classified region algorithm that analyzes the spatial and temporal homogeneity of a block using 16 × 16 MB and 8 × 8 block patterns, respectively. Since the proposed method hierarchically reduces the number of modes required for inter mode prediction, the regular structure leads to easy hardware implementation. The proposed fast algorithm reduces the total number of required modes for inter mode decisions with negligible degradation of video quality.
The rest of the paper is organized as follows. Section 2 gives an overview of the interand intrapredictions suggested in H.264. Section 3 explains the proposed algorithm in detail. Section 4 uses experimental results to evaluate the proposed algorithm. Finally, Section 5 contains the conclusion.
2. Overview of Inter/Intraprediction
H.264/AVC employs two important techniques to exploit the temporal and spatial correlation of frames, namely, inter and intra prediction, respectively. The H.264 standard allows intra mode in an inter mode prediction. The mode that produces lowest RD cost is chosen. The following subsections briefly describe the inter and intra mode decision process of the H.264 standard.
2.1. Intermode Decision
Because the above RDO procedure for intra/inter mode decisions in H.264 is employed, the computational cost is high, especially since the RDO procedure for inter modes is more complex than that of intra modes because it employs computationally intensive full search motion estimation. When full search motion estimation is employed for the seven block sizes, only one of the best block size motion vectors is used; all the other block size motion vectors are discarded. Therefore, a lot of computational resources are wasted by testing all seven block sizes.
2.2. Intramode Decision
3. Proposed Classified Region Algorithm
Before implementing our algorithm, we conducted a set of experiments on standard benchmark video sequences to determine the best inter mode selection in homogeneous and nonhomogeneous regions of the video sequences.
3.1. Statistical Analysis of Block Sizes in Video Sequences
From the above analysis, it can be seen that the SKIP and 16 × 16 MB size occurrences are greater than those of other rectangular blocks. It is noted that 8 × 8 block occurrence is also considerably high in the subMB size partition. Theoretically, a 16 × 16 block size should be chosen for a homogeneous region. However, some homogeneous blocks used rectangular block sizes (16 × 8 or 8 × 16). Therefore, in the proposed algorithm, we not only choose 16 × 16 but also 16 × 8 or 8 × 16 block size for RDO calculation if the detected region is homogeneous. The results show that the SKIP, 16 × 16 and 8 × 8 block size occurrences are always greater than those of other block sizes and that homogeneous blocks do not always choose the 16 × 16 block size but sometimes choose 8 × 16 or 16 × 8. Although intra modes are allowed in inter mode prediction, the overall occurrence of intra mode is less than 3%, which is very low. Intra prediction significantly increases complexity, which could be avoided by having a suitable criterion in inter mode decision.
3.2. Proposed Classified Region Algorithm (CRA)
The implementation procedure for the proposed CRA algorithm is as follows.
where represents the intensity gradient of pixel . When the value of is close or equal to zero, it means that there is an orientation.
The intensity differences in the vertical and horizontal directions are computed. Using these values, we determine the homogeneity of the block using an appropriate threshold value. By using all the above conditions, we then check the homogeneity of the block, which is explained in the following subsections.
3.3. Spatial Homogeneity
The above condition is the same for 8 × 8 block sizes except for the value of the predefined threshold. The threshold values for 16 × 16 and 8 × 8 block sizes are 295 and 8, respectively.
3.4. Temporal Homogeneity
where , and , denote the pixel intensities in the previous MB and the present MB, respectively. We adopt this common method since our main aim is to develop a preprocessing algorithm suitable for VLSI realization. Therefore, it helps to maintain a regular structure without increasing the complexity of the proposed algorithm. The above condition is applicable for 8 × 8 block sizes except for the threshold value. The threshold values are 420 and 115 for 16 × 16 and 8 × 8, respectively.
3.5. Overall Flow of the Classified Region Algorithm (CRA)
In order to maintain a regular structure in the proposed algorithm, the hierarchical order is maintained by choosing 16 × 16 block conditions first and 8 × 8 blocks later on (see Figure 12).
Although we choose the rectangular block using edge direction, 16 × 16 is chosen for some blocks with edges. In order to avoid this situation, our algorithm always chooses 16 × 16 with 8 × 16 or 16 × 8 for RDO calculations. The performance of the proposed CRA algorithm is thus maintained. It should be noted that we do not increase the overhead to decide between 8 × 16 or 16 × 8 since we use the precomputed edge direction values from CRA that are shown in (4) and (5). If a block is temporally homogeneous, we do not include intra mode for RDO calculations because intra mode is used to exploit spatial correlations. The mode decision procedure for 8 × 8 blocks is the same as that for 16 × 16 MB.
3.6. Complexity Analysis
In this section, we analyze the complexity of the proposed method. Our method greatly reduces complexity, about 67%, compared to the conventional method. A conventional encoder employs the RDO method, which includes full mode search to choose the best block size. Therefore, only one block size and the associated motion vector (MV) can be chosen to encode a macroblock; all other block sizes and associated MVs are discarded. Hence, a conventional JM increases the complexity whereas our proposed method reduces the number of inter modes without increasing the overhead of the inter mode decision. The proposed method avoids the exhaustive full mode search method, which includes all seven modes plus two intra modes for RDO calculation.
4. Experimental Results and Discussion
Experimental results of the proposed algorithm.
Sequence 


 

ΔPSNR  ΔBR  ΔTime  ΔPSN R  ΔBR  ΔTime  ΔPSNR  ΔBR  ΔTime  
Salesman (176 × 144)  Jing [16]  −0.010  0.317  8.82  −0.019  −0.239  16.19  −0.030  −0.468  17.40 
Choi [17]  −0.010  0.104  60.10  −0.010  −0.656  59.69  −0.030  −0.777  59.26  
Proposed  −0.018  0.116  65.04  −0.024  −0.782  65.57  −0.001  −0.854  66.57  
Mobile (352 × 288)  Jing [16]  −0.041  0.386  7.11  −0.045  0.485  8.23  −0.050  0.337  11.91 
Choi [17]  −0.010  0.110  34.59  −0.006  0.002  23.16  −0.013  −0.273  29.49  
Proposed  −0.010  0.091  54.46  −0.007  0.096  55.58  −0.033  0.141  59.46  
Paris(352 × 288)  Jing [16]  −0.043  0.399  8.80  −0.045  0.264  9.24  −0.037  0.399  10.22 
Choi [17]  −0.496  0.040  49.09  −0.257  3.425  42.24  −0.028  8.316  39.36  
Proposed  −0.025  0.281  66.66  −0.054  0.274  64.88  −0.076  1.110  65.48  
Foreman (352 × 288)  Jing [16]  −0.040  0.275  7.60  −0.049  0.252  8.88  −0.008  0.273  14.80 
Choi [17]  −0.009  −0.130  54.68  −0.031  −0.024  53.17  −0.005  −0.217  54.69  
Proposed  −0.031  0.211  57.08  −0.062  0.236  64.02  −0.154  0.258  68.03  
HallMonitor (352 × 288)  Jing [16]  −0.014  0.027  5.79  −0.019  −0.266  9.46  −0.004  −0.318  12.93 
Choi [17]  −0.044  −0.413  40.87  −0.042  −0.880  51.72  −0.012  −1.545  56.97  
Proposed  −0.068  0.015  64.47  −0.075  −1.145  66.51  −0.102  −1.956  68.04  
Akiyo (352×288)  Jing [16]  −0.006  0.041  10.73  −0.004  −0.264  10.88  −0.010  0.084  10.81 
Choi [17]  −0.013  −0.726  58.41  −0.020  −1.901  60.96  −0.020  −0.585  62.10  
Proposed  −0.004  −0.802  65.34  −0.061  −1.678  66.86  −0.059  −0.129  69.91  
Mother and Daughter (352×288)  Jing [16]  −0.016  0.065  8.02  −0.004  −0.286  13.74  −0.002  −0.271  14.84 
Choi [17]  −0.047  −0.307  51.19  −0.015  −0.431  53.44  −0.024  −0.403  54.57  
Proposed  −0.066  0.030  68.52  −0.066  −1.784  64.83  −0.096  −1.102  67.46 
Experimental results of mode saving factor and encoding time reduction.
Sequence  QP 24  QP 28  QP 32  

ΔM  ΔTime  ΔM  ΔTime  ΔM  ΔTime  
Salesman (176×144)  Jing [16]  24.82  8.82  32.43  16.19  38.22  17.40 
Choi [17]  53.09  60.10  68.21  59.69  73.75  59.26  
Proposed  69.67  65.04  66.60  65.57  69.58  66.57  
Mobile (352×288)  Jing [16]  17.15  7.11  21.11  8.23  27.01  11.91 
Choi [17]  12.41  34.59  13.84  23.16  18.03  29.49  
Proposed  56.54  54.46  57.18  55.58  61.45  59.46  
Paris (352×288)  Jing [16]  35.00  8.80  35.95  9.24  37.15  10.22 
Choi [17]  55.43  49.09  47.83  42.24  46.43  39.36  
Proposed  68.79  66.66  67.65  64.88  67.49  65.48  
Foreman (352×288)  Jing [16]  35.00  7.60  35.95  8.88  37.15  14.80 
Choi [17]  39.57  49.09  48.97  53.17  51.47  54.69  
Proposed  59.76  57.08  65.89  64.02  69.78  68.03  
Hall monitor (352×288)  Jing [16]  14.70  5.79  27.02  9.46  36.62  12.93 
Choi [17]  18.33  40.87  44.98  51.72  60.55  56.97  
Proposed  66.89  64.47  68.67  66.51  70.23  68.04  
Akiyo (352×288)  Jing [16]  30.30  10.73  31.25  10.88  33.81  10.81 
Choi [17]  53.18  58.41  66.88  60.96  73.51  62.10  
Proposed  68.89  65.34  67.86  66.86  69.87  68.91  
Motherand Daughter (352×288)  Jing [16]  26.66  8.02  31.19  13.74  35.77  14.84 
Choi [17]  43.47  51.19  51.77  53.44  58.33  54.57  
Proposed  69.93  68.52  66.78  64.83  70.56  67.46 
Average results of proposed and compared algorithms.
Experimental Results of the proposed algorithm.
Methods  Method 1 [20]  Method 2 [20]  Method 3 [20]  Proposed method  

CIF  ΔPSNR  ΔBR  ΔTime  ΔPSNR  ΔBR  ΔTime  ΔPSNR  ΔBR  ΔTime  ΔPSNR  ΔBR  ΔTime 
Mobile  −0.27  4.1  45.1  −0.01  2.5  34.1  −0.01  2.9  38.6  −0.01  0.09  54.46 
Foreman  −0.28  4.7  51.8  −0.04  2.0  37.3  −0.05  2.1  40.9  −0.03  0.21  57.08 
Mother & daughter  −0.31  4.1  61.8  −0.03  1.5  47.6  −0.03  1.6  53.2  −0.06  0.03  68.52 
Average  − 0.29  4.3  52.9  − 0.03  2.0  39.66  − 0.03  2.2  44.23  0.03  0.11  60.02 
5. Conclusion
In this paper, we proposed a classified region fast inter mode decision algorithm for H.264/AVC. The proposed fast algorithm's simulation results showed that the proposed method achieved good encoding reduction up to 67% while maintaining negligible degradation in objective and subjective video quality. Therefore, this algorithm can be used as preprocessing unit for inter prediction units to decrease RDO complexity and encoding time.
Declarations
Acknowledgment
This paper is supported in part by Korea University research grant.
Authors’ Affiliations
References
 Draft ITUT Recommendation and Final Draft International Standard of Joint Video Specification (ITUT Rec. H.264/ISO/IEC 1449610 AVC), March, 2003Google Scholar
 Wiegand T, Sullivan GJ, Bjøntegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13(7):560576.View ArticleGoogle Scholar
 Sullivan G, Wiegand T, Lim KP: Joint model reference encoding methods and decoding concealment methods. Proceedings of the 9th Joint Video Term Meeting (JVTI049d0), September 2003, San Diego, Calif, USAGoogle Scholar
 Yang EH, Yu X: Rate distortion optimization for H.264 interframe coding: a general framework and algorithms. IEEE Transactions on Image Processing 2007, 16(7):17741784.MathSciNetView ArticleGoogle Scholar
 Yang L, Keman Y, Li J, Li S: An effective variable blocksize early termination algorithm for H.264 video coding. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(6):784788.View ArticleGoogle Scholar
 Al Qaralleh EA, Chang TS: Fast variable block size motion estimation by adaptive early termination. IEEE Transactions on Circuits and Systems for Video Technology 2006, 16(8):10211026.View ArticleGoogle Scholar
 Zhou Z, Sun MT: Fast macroblock inter mode decision and motion estimation for H.264/MPEG4 AVC. Proceedings of the International Conference on Image Processing (ICIP '04), October 2004 243263.Google Scholar
 Moon YH, Kim GY, Kim JH: An improved early detection algorithm for allzero blocks in H.264 video encoding. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(8):10531057.View ArticleGoogle Scholar
 Wang H, Kwong S, Kok CW: An efficient mode decision algorithm for H.264/AVC encoding optimization. IEEE Transactions on Multimedia 2007, 9(4):882888.View ArticleGoogle Scholar
 Feng B, Zhu GX, Liu WY: Fast adaptive interprediction mode decision method for H.264 based on spatial correlation. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS '06), May 2006 18041807.Google Scholar
 Kuo CH, Shen M, Kuo CCJ: Fast interprediction mode decision and motion search for H.264. Proceddings of the IEEE International Conference on Multimedia and Expo (ICME '04), June 2004 1: 663666.Google Scholar
 You J, Kim W, Jeong J: 16×16 macroblock partition size prediction for H.264 P slices. IEEE Transactions on Consumer Electronics 2006, 52(4):13771383.View ArticleGoogle Scholar
 Grecos C, Yang MY: Fast inter mode prediction for P slices in the H264 video coding standard. IEEE Transactions on Broadcasting 2005, 51(2):256263. 10.1109/TBC.2005.846192View ArticleGoogle Scholar
 Kim YH, Yoo JW, Lee SW, Shin J, Paik J, Jung HK: Adaptive mode decision for H.264 encoder. Electronics Letters 2004, 40(19):11721173. 10.1049/el:20046155View ArticleGoogle Scholar
 Wu D, Pan F, Lim KP, Wu S, Li ZG, Lin X, Rahardja S, Ko CC: Fast intermode decision in H.264/AVC video coding. IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(7):953958.View ArticleGoogle Scholar
 Jing X, Chau LP: Fast approach for H.264 inter mode decision. Electronics Letters 2004, 40(17):10501052. 10.1049/el:20045243View ArticleGoogle Scholar
 Choi I, Lee J, Jeon B: Fast coding mode selection with ratedistortion optimization for MPEG4 Part10 AVC/H.264. IEEE Transactions on Circuits and Systems for Video Technology 2006, 16(12):15571561.View ArticleGoogle Scholar
 Turaga DS, Chen T: Classification based mode decisions for video over networks. IEEE Transactions on Multimedia 2001, 3(1):4152. 10.1109/6046.909593View ArticleGoogle Scholar
 Salgado L, Nieto M: Sequence independent very fast mode decision algorithm on H.264/AVC baseline profile. Proceedings of the IEEE The International Conference on Image Processing (ICIP '06), October 2006 4144.Google Scholar
 Ri SH, Vatis Y, Ostermann J: Fast intermode decision in an H.264/AVC encoder using mode and Lagrangian cost correlation. IEEE Transactions on Circuits and Systems for Video Technology 2009, 19(2):302306.View ArticleGoogle Scholar
 Liu D, Sun X, Wu F, Zhang YQ: Edgeoriented uniform intra prediction. IEEE Transactions on Image Processing 2008, 17(10):18271836.MathSciNetView ArticleGoogle Scholar
 Girod B: Efficiency analysis of multihypothesis motioncompensated prediction for video coding. IEEE Transactions on Image Processing 2000, 9(2):173183. 10.1109/83.821595View ArticleGoogle Scholar
 Liu X, Wang DLL, Srivastava A: Image segmentation using local spectral histograms. Proceedings of the IEEE International Conference on Image Processing (ICIP '01), October 2001 7073.Google Scholar
 Joint Video Term (JVT) : reference software JM 11.0. http://iphome.hhi.de/suehring/tml
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.