Classiﬁed Region Algorithm for Fast Intermode Decision in H.264/AVC Encoder

. H.264, MPEG-4 Part 10, is the latest digital video coding standard that achieves very high data compression by using several new coding features. One of the new features is variable block sizes for interframe coding to increase compression e ﬃ ciency. However, to achieve this, the H.264 encoder employs a complex mode decision technique based on rate-distortion optimization (RDO) that requires high computational complexity, which signiﬁcantly increases the encoder complexity. In this paper, we propose a classiﬁed region algorithm (CRA) that analyzes the spatial and temporal homogeneity of the block by using cross di ﬀ erences to reduce the number of modes that are required for RDO calculation in inter mode decision. The proposed low computational complexity algorithm signiﬁcantly reduces the number of inter modes without a ﬀ ecting the video quality. The experimental results show that the proposed method is able to reduce complexity by up to 67% on average with negligible degradation in both objective and subjective quality.


Introduction
Compression technology plays a vital role in multimedia devices.Compression technology should compress a file without significantly degrading the video quality.A new video coding standard was developed by the Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG to be the next generation of video compression standards, known as H.264 or MPEG-4 Part 10 Advanced Video Coding (MPEG-4 AVC) [1].Compared to previous MPEG-2 and MPEG-4 video standards, the latest H.264 video coding with various new coding tools can improve coding efficiency by up to 50% [2].One of the novel features in H. 264 4, to best present the temporal and spatial details in an MB.To select the best mode, rate-distortion optimization (RDO) is employed to achieve coding efficiency.In order to do this, all the MB modes are tried, and the one that leads to the lowest RD cost is selected to achieve the best tradeoff between rate and distortion performance [3,4].However, the RDO technique dramatically increases the computational complexity of the H.264 encoder.Therefore, a more efficient algorithm that reduces the computation of inter prediction is highly desirable.Several approaches have been proposed to achieve fast interprediction.The conditions that are used to detect the SKIP mode are quite familiar since all the other MB sizes and respective sub-MB sizes are skipped [5,6].Search point reduction is achieved by means of introducing fast motion estimation algorithms that are helpful in reducing the search point.Zhou and Sun.[7] introduced a new fast adaptive search strategy (ADSS), which combines different search strategies to reduce computation with negligible degradation in the rate-distortion (RD) performance.Moon et al. [8] proposed an early termination algorithm based on all-zero 4 × 4 blocks that reduce the computation complexity of the motion estimation process and computations in the discrete cosine transform (DCT) and the quantization process.Wang et al. [9] reduced the complexity of the mode decision by optimizing the ME and mode decision algorithms.Feng et al. [10] used a Rate-Distortion cost comparison to significantly reduce the complexity with significant loss of video quality.Some studies have used motion estimation information for inter mode decisions.Kuo et al. [11] proposed a multiresolution motion estimation scheme and an adaptive ratedistortion model with early termination rules to accelerate the search process.You et al. [12] suggested a method that analyzes the results of P16 × 16 ME (motion estimation) of a macroblock according to the proposed decision model, to estimate whether the macroblock partition size should be further divided.Crecos and Yang.
[13] used neighborhood information with a set of skip mode conditions for enhanced skip mode decisions, which subsequently performs inter mode decision for the remaining macroblocks by using a gentle set of smoothness constraints.Kim et al. [14] proposed an algorithm using the property of an all-zero coefficients block that is produced by quantification and coefficient thresholding to effectively eliminate unnecessary intermodes.However, it needs to transform coefficients to decide on the inter mode.Wu et al. [15] effectively used the Sobel operator to reduce the total number of inter modes, by using intramode decision results.Therefore, in this approach, the inter mode decision partially depends on intra mode results.Feature-based intra-/intercoding mode selection schemes have also been reported in [18].Choi et al. [17] used early SKIP mode conditions and selective intra mode decisions to decrease the complexity of inter mode decisions.Saldago proposed a sequence independent fast inter mode decision algorithm that decreases the encoding time [19].However, the bitrate increment is extremely high with significant video quality loss.Unlike the above methods, the homogeneity of a block is classified using the mean absolute difference (MAD) of an MB and the mean absolute frame difference (MAFD) [16].Although the method is quite useful, the average number of modes used is still high.In this paper, we propose a classified region algorithm that analyzes the spatial and temporal homogeneity of a block using 16 × 16 MB and 8 × 8 block patterns, respectively.Since the proposed method hierarchically reduces the number of modes required for inter mode prediction, the regular structure leads to easy hardware implementation.The proposed fast algorithm reduces the total number of required modes for inter mode decisions with negligible degradation of video quality.The rest of the paper is organized as follows.Section 2 gives an overview of the inter-and intrapredictions suggested in H.264. Section 3 explains the proposed algorithm in detail.Section 4 uses experimental results to evaluate the proposed algorithm.Finally, Section 5 contains the conclusion.1.In addition, P slice has a SKIP mode that is likely to be adopted in regions with a stationary or global motion features.Generally, in video sequences, large areas with similar motions are likely to be coded using a large block size.Areas containing complex motion are likely to be coded using a smaller block size.Even if the current macroblock belongs to an interslice, H.264 should examine all intra prediction directions of intra 4 × 4 and intra 16 × 16 and finally choose the mode that produces the lowest RD cost.The residuals of inter prediction modes are calculated by performing a motion estimation.Directional prediction is applied for intra prediction modes.The best prediction mode is selected by minimizing the following Lagrange function:

Overview of Inter-/Intraprediction
where QP is the quantization parameter, λ MODE is the Lagrange multiplier for mode decision, SSD is the sum of the squared differences between the original block luminance (denoted by s) and its reconstruction c, and R(s,c MODE/QP) represents the number of bits associated with the chosen MODE, which includes the bits required for the coding of the selected prediction mode and the DCT coefficients for the given block.
Because the above RDO procedure for intra-/inter mode decisions in H.264 is employed, the computational cost is high, especially since the RDO procedure for inter modes is more complex than that of intra modes because it employs computationally intensive full search motion estimation.When full search motion estimation is employed for the seven block sizes, only one of the best block size motion vectors is used; all the other block size motion vectors are discarded.Therefore, a lot of computational resources are wasted by testing all seven block sizes.2. The RD cost is determined for all seven inter modes.Finally, the mode that gives the lowest RD cost is selected.Therefore, it wastes computational resources by adopting a brute force search method.This problem can be solved by adopting an appropriate preprocessing stage, as shown in Figure 3.The preprocessing stage reduces the number of inter modes that are required for RD cost calculation by examining the homogeneity of the macroblock (MB).Thus, a conventional full mode search can be avoided, saving computational resources.Before implementing our algorithm, we conducted a set of experiments on standard benchmark video sequences to determine the best inter mode selection in homogeneous and non-homogeneous regions of the video sequences.

Statistical Analysis of Block Sizes in Video
Sequences.In order to analyze the inter mode occurrence percentage, we used a set of video sequences that had different motion properties.Figures 4, 5, 6, 7 and 8 show the statistical results of various sequences (300 frames, QP = 28, number of reference frame = 1).
From the above analysis, it can be seen that the SKIP and 16 × 16 MB size occurrences are greater than those of other rectangular blocks.It is noted that 8 × 8 block occurrence is also considerably high in the sub-MB size partition.Theoretically, a 16 × 16 block size should be chosen for a homogeneous region.However, some homogeneous blocks used rectangular block sizes (16 × 8 or 8 × 16).Therefore, in the proposed algorithm, we not only choose 16 × 16 but also    than those of other block sizes and that homogeneous blocks do not always choose the 16 × 16 block size but sometimes choose 8 × 16 or 16 × 8.Although intra modes are allowed in inter mode prediction, the overall occurrence of intra mode is less than 3%, which is very low.Intra prediction significantly increases complexity, which could be avoided by having a suitable criterion in inter mode decision.

Proposed Classified Region Algorithm (CRA).
The texture in a block plays a vital role in determining whether a region is homogeneous or not [22,23].A number of effective methods have been proposed in the literature to detect homogeneous regions, but their implementation increases the complexity,   which limits a fast algorithm's implementation.Our proposed low computational complexity algorithm uses edge information to determine whether region is homogeneous and chooses the appropriate block size for the region.The proposed classified region algorithm (CRA) is based on a computation of the gradient function of the current block.
In order to implement our prediction algorithm, a 16 × 16 MB is formed by using blocks, and the average of the each block is found using (2), as shown in Figure 9(a).The prediction algorithm is then applied to the block as shown in Figure 10(b).Each position of the block is marked by the i and j indices as (0, 0), (0, 1), . . ., (3,2), and (3, 3): where "a" and "b" indicate the starting positions of the luma blocks.
The implementation procedure for the proposed CRA algorithm is as follows.
We use an intensity gradient filter to explore the edge orientation of an image.The intensity gradient filter is as follows: where G(x) represents the intensity gradient of pixel x.When the value of G(x) is close or equal to zero, it means that there is an orientation.The intensity gradient filter for vertical orientation is applied to four pixels (e, i, g, and k) as shown in Figure 10(a) and calculated using the following equation.

G(e)
Figure 10(b) is an intensity gradient filter orientation for four pixels (b, c, j, and k) using( 5) Edge amplitude = |G(ver) + G(hor)|.( The intensity differences in the vertical and horizontal directions are computed.Using these values, we determine the homogeneity of the block using an appropriate threshold value.By using all the above conditions, we then check the homogeneity of the block, which is explained in the following sub-sections.

Spatial Homogeneity.
In order to decide whether a block is homogeneous, we use the sum of the edge amplitudes of the block, as shown in (7).If the block pixel amplitude is less than that of the predefined threshold, then the block is homogeneous; otherwise, it is non-homogeneous: Edge amplitude < Th.
The above condition is the same for 8 × 8 block sizes except for the value of the pre-defined threshold.The threshold values for 16 × 16 and 8 × 8 block sizes are 295 and 8, respectively.
3.4.Temporal Homogeneity.Spatial homogeneity conditions are applicable inside the frame, namely, the spatial domain.
In order to exploit the homogeneous blocks between frames, in the temporal domain, we use the difference between the current macroblock (MB) and the corresponding MB that is shown in (8).If the difference between two MBs is less than the pre-defined threshold then the block is temporally homogeneous; otherwise, it is non-homogeneous as shown in

Temp homogenity
Temphomogenity < Th, where I(i, j) and J (i, j) denote the pixel intensities in the previous MB and the present MB, respectively.We adopt this common method since our main aim is to develop a preprocessing algorithm suitable for VLSI realization.Therefore, it helps to maintain a regular structure without increasing the complexity of the proposed algorithm.The above condition is applicable for 8 × 8 block sizes except for the threshold value.The threshold values are 420 and 115 for 16 × 16 and 8 × 8, respectively.

Overall Flow of the Classified Region Algorithm (CRA).
In order to maintain a regular structure in the proposed algorithm, the hierarchical order is maintained by choosing 16 × 16 block conditions first and 8 × 8 blocks later on (see Figure 12).In our classified region pre-processing algorithm, the homogeneous blocks not only choose 16 × 16, but also 8 × 16 or 16 × 8 block sizes.Therefore, block sizes 8 × 16 and 16 × 8 are according to the edge direction of the block.For example, as shown in Figure 11(a), if a block has a vertical edge we choose the 8×16 block size.If a block has a horizontal edge as shown in Figure 11(b), we choose the 16×8 block size.In order to detect the edge direction, we use the precomputed edge direction results from (4) and (5).We then compare the two results to decide the block size as follows: Although we choose the rectangular block using edge direction, 16 × 16 is chosen for some blocks with edges.In order to avoid this situation, our algorithm always chooses 16 × 16 with 8 × 16 or 16 × 8 for RDO calculations.The performance of the proposed CRA algorithm is thus maintained.It should be noted that we do not increase the overhead to decide between 8 × 16 or 16 × 8 since we use the pre-computed edge direction values from CRA that are shown in (4) and (5).If a block is temporally homogeneous, we do not include intra mode for RDO calculations because intra mode is used to exploit spatial correlations.The mode decision procedure for 8 × 8 blocks is the same as that for 16 × 16 MB.

Complexity Analysis.
In this section, we analyze the complexity of the proposed method.Our method greatly reduces complexity, about 67%, compared to the conventional method.A conventional encoder employs the RDO method, which includes full mode search to choose the best block size.Therefore, only one block size and the associated motion vector (MV) can be chosen to encode a macroblock; all other block sizes and associated MVs are discarded.Hence, a conventional JM increases the complexity whereas our proposed method reduces the number of inter modes without increasing the overhead of the inter mode decision.The proposed method avoids the exhaustive full mode search method, which includes all seven modes plus two intra modes for RDO calculation.

Experimental Results and Discussion
The proposed CRA fast mode decision algorithm was implemented on JM11.0, provided by JVT [24], with the following test conditions.In our simulation, the total number of frames was 200, the number of reference frames was one, RD optimization was enabled, main profile had sequence type IPPP, the search range was ± 16, and CAVLC was enabled.Tables 1 and 2 show the summary of performance, which is calculated according to the numerical averages with different QP (24, 28, and 32) values.We defined four measures for evaluating the encoding performance, including average PSNR, average BR, average mode number saving factor M, and an encoding-time saving factor T, which are all defined as follows: If mode number saving factor (M) and time saving factor (Time) value increase, performance speed increases.It must be noted that positive values for the average PSNR and bit-rate indicate increments and negative values indicate decrements.Table 3 shows the average values of PSNR, bitrate, mode saving factor, and encoding time reduction.From the results, we can see that the proposed algorithm achieves a better mode number factor and encoding time reduction with a minimal loss of image quality and a minimal bit increment.Here, the mode number saving factor and encoding time reduction are higher than those of the existing methods [16,17].The proposed algorithm reduces the complexity of the inter mode decision without increasing the complexity of the pre-processing stage with a negligible bit-rate increment, but provides a higher mode saving factor and encoding time reduction in most sequences.Our proposed method was compared with the recent work as shown in Table 4.

Conclusion
In this paper, we proposed a classified region fast inter mode decision algorithm for H.264/AVC.The proposed fast algorithm's simulation results showed that the proposed method achieved good encoding reduction up to 67% while maintaining negligible degradation in objective and subjective video quality.Therefore, this algorithm can be used as pre-processing unit for inter prediction units to decrease RDO complexity and encoding time.

Table 1 :
Experimental results of the proposed algorithm.

Table 2 :
Experimental results of mode saving factor and encoding time reduction.

Table 3 :
Average results of proposed and compared algorithms.
region is homogeneous.The results show that the SKIP, 16 × 16 and 8 × 8 block size occurrences are always greater

Table 4 :
Experimental Results of the proposed algorithm.