Improved Intra-coding Methods for H.264/AVC

The H.264/AVC design adopts a multidirectional spatial prediction model to reduce spatial redundancy, where neighboring pixels are used as a prediction for the samples in a data block to be encoded. In this paper, a recursive prediction scheme and an enhanced (block-matching algorithm BMA) prediction scheme are designed and integrated into the state-of-the-art H.264/AVC framework to provide a new intra coding model. Extensive experiments demonstrate that the coding e ﬃ ciency can be on average increased by 0.27dB with comparison to the performance of the conventional H.264 coding model.


Introduction
H.264/AVC [1] is the newest international video coding standard of ITU-T (as Recommendation H.264) and ISO/IEC (as International Standard 14496-10 akin MPEG-4 part 10) advanced video coding (AVC). It considerably reduces the bit rate by approximately 30 to 70 percent when compared with previous video coding standards such as MPEG-4 Part 2, H.263, H.262/MPEG-2 Part 2 and to name a few, while providing the same or better image quality.
The intracoding algorithm of H.264 exploits the spatial and spectral correlation present in an image. Intraprediction removes spatial redundancy between adjacent blocks by predicting one block from its spatially adjacent causal neighbors. A choice of coarse and fine intraprediction is allowed on a block-by-block basis. There are two types of prediction modes for the luminance samples, that is, the socalled Intra 4 × 4 mode which predicts each 4 × 4 block independently within a macroblock and the Intra 16 × 16 mode which predicts a 16×16 macroblock as a whole unit. As for Intra 4 × 4 mode, nine prediction modes are available for the encoding procedure, among which one represents a plain DC prediction and the remaining ones operate as directional predictors distributed along eight different angles, as shown in Figure 1. Intra 16 × 16 mode is suitable for smooth image areas, where four directional prediction modes are provided as well as the separate intraprediction mode for the chrominance samples of a macroblock.
H.264 achieves excellent compression performance and complexity characteristics in the intramode even when compared against the standard image codecs (JPEG and JPEG2000) [2,3]. In recent years, extended works have been developed to further improve the performance of intraprediction. Gang et al. proposed an intraprediction method based on subblock, altering the encoding order of the predictive subblocks so as to make the intraprediction adaptive to various textures [4]. However, this method needs to add new syntax elements and as well incurs nonnegligible complexity. Some authors introduced intramotion compensated prediction of macroblocks [5]. Block size and accuracy adaptation can be brought into the intra block-matching scheme to further improve the prediction results. In such a manner, the position of reference block should be coded into the bit stream. Thus a lot of extra side information would affect the performance significantly. To reduce this overhead information, special processing techniques are developed and result in a big change of intracoding structure in the H.264/AVC standard [6]. In [7], block-matching algorithm (BMA) is utilized to substitute for H.264 DC intraprediction mode with no need to code side information. However, prediction performance would be degraded if directly using previously reconstructed pixels for the matching procedure. Also, improved lossless intracoding methods are proposed to substitute for horizontal, vertical, diagonal-down-left (mode 3), diagonal-down-right (mode 4) of H.264/AVC [8,9]. They employ samplewise differential pulse code modulation 2 EURASIP Journal on Advances in Signal Processing (DPCM) method to conduct prediction of pixels in a target block. Yet this kind of methods can only be used in lossless mode.
From the above-mentioned analysis, current-enhanced intracoding methods still have problems remained, namely, either changing the coding structures a lot (e.g., [5,6]) or having limited usage (e.g., [9,10]) or alternatively less gain (e.g., [4]). In this paper, we focus on how to improve the performance of intracoding without incurring high cost of complexity and major changes for the design structure of H.264/AVC. Two prediction schemes are advanced to improve current intracoding performance. In the first scheme, more neighboring pixels contribute to recursively predict current pixel inside one block in a samplewise manner. Consequently, this scheme would match texture characteristics of the input source with high adaptation and minor extra complexity as well. The other prediction scheme is motivated by the fact that loop filter can significantly enhance the performance of the inter prediction. We propose to extend the classical BMA method [7] by imposing loop filtering on previously reconstructed macroblocks before BMA operation. Specifically, we change the orders of standard deblocking loop filter of H.264/AVC to achieve extra gains without incurring extra complexity. Extensive experiments show that intracoding of H.264 can be further improved in the proposed work for both lossy and lossless case.
The remaining parts of this paper are structured as follows. Section 2 describes the proposed recursive prediction scheme and the enhanced BMA prediction scheme. Codec-related issues are discussed in Section 3. Comparison experiments of the proposed intracoding model and the standard one in H.264/AVC are shown in Section 4. Finally, Section 5 concludes the paper.

Two Prediction Schemes for Intracoding of H.264/AVC
In this section, we will explain the improvement mechanism behind the recursive prediction scheme and the enhanced BMA prediction scheme. Both schemes join in the prediction modes of H.264/AVC with good compatibility and complementary merits. The resultant intracoding model can well improve the overall performance of H.264/AVC.

Mechanism of Recursive Prediction Scheme.
It is generally accepted that Gaussian-like distribution could approximate the local intensity variations in smooth image regions. The correlation between neighboring pixels would be attenuated while the distance is increasing and negligible when pixels are far enough apart. Furthermore, the assumption of the Gaussian distribution would become weak around the irregular texture areas and edge structures. The current prediction methods of H.264/AVC take an assumption that the intensity is uniform within the block to be predicted. Thus over-smoothness would be induced to the target block after prediction. As a result, the original intensity distribution is more or less destroyed. Especially for those natural images with abundant textures, the perception distortions are distinct. In all cases, high correlation can be expected among the nearest neighbors spaced one pixel apart except those within the image structures thinner than one pixel. Given a 4 × 4 luma block to be coded as shown in Figure 2(a), namely, the sequence of pixels from a-p, the mechanism of standard prediction mode and recursive prediction mode can then be, respectively, illustrated in Figures 2(b) and 2(c). Here we use gray color to mark the reference pixels, that is, the pixel set S = {A, B, C, D, Q, I, J, K, L}. Then pixels a-p will be predicted from these reference pixels. Now we explain the prediction procedure of pixels a, f, k, p referring to Figure 2. In standard prediction mode, these four pixels would take the same value which is deduced from reference pixels A, Q, and I. Residuals might have large values if the assumption of uniform intensity is violated. Alternatively, we select different reference pixels to recursively predict the value of a, f, k, and p. Only the left, the top, and the left-top pixels are actively involved in computing the center pixel value. Therefore the contribution of neighboring pixels is gradually decayed with distance increasing during the recursive prediction. The textures within the block would be retained, which results in smaller residual deviations.
In block-based H.264/AVC, we cannot obtain reconstruction of pixels inside current coding block except lossless case, where the reconstructed frame is identical to the original frame. In fact, only predicted value of neighboring pixels obtained in previous step is used to predict current pixel in our method. That is, it recursively predict each pixel inside block in the raster scan order.
Furthermore, we emphasize two facets in the implementation of the proposed recursive prediction method. On one hand, no modification should be imposed on the other parts of the design structure of H.264/AVC, besides part of intraprediction module. Specifically, we only change the five modes of H.264/AVC intraprediction module, among which are DDR mode (mode 4), HD mode (mode 5), VR mode (mode 6) for 4 × 4 luma blocks, plane mode (mode 3) for 16 × 16 luma blocks, and plane mode (mode 3) for chroma block. These five modes can easily support prediction neighborhood of our method. On the other hand, we would expect to find the tradeoff between the complexity and efficiency of the whole intracoding procedure.
For convenience in representation, we denote current pixel value as p, where (x, y) is the spatial position within the block, for example, (0, 0) indicates the left-top pixel. As shown in Figure 3, the value of the predicted pixel can be computed from where Round(·) is the numerical operation that returns the closest integer to "·," Clip(·) is another numerical operation which clamps the predicted value to the range of [0, 255]. The tap filter coefficients corresponding to the five-modified prediction modes, which are gotten from experiments, are listed in Table 1.

The Mechanism of Enhanced BMA Intraprediction
Scheme. Block matching is originally used in image restoration task to recover missing blocks [11]. The main assumption behind this application is that one block always has similar counterparts in the same frame. Yang et al. [7] integrated block-matching algorithm into DC mode of H.264/AVC standard prediction methods and generated an outcome of BMA mode for intraprediction. As coding is a sequential execution, one only can use the upper side, the left side, and the left up side of the boundary to perform blockmatching, that is, the pixel set consisting of p 1 -p 9 around block "X," asdepicted in Figure 4(a). The green block "M" in Figure 4(b) is the candidate block while the blue block "X" is the block to be predicted. The black pixels along the boundary are selected as the matching primitives. The valid search range is marked as the gray region. The matching process is formulated as the minimization of the following cost function: where p i and p i , respectively, represent the pixel values within block "X" and block "M." It is noted that original DC mode should be still used when the upper or left side is not available for the block to be predicted. Similar to the encoder, the decoder also needs to do block-matching.
The BMA prediction method has been proved as a good means to achieve gains in some video sequences [7], whereas there are still two open problems. BMA is accurate in high bitrates encoding case but not much good in low bitrate. The main reason is that the candidate macroblocks have not yet been passed to the loop filter, thus the best matches and the residuals would be greatly affected by the conspicuous blocking artifacts. Especially when the best match spans two or more encoding macroblocks, it might be considered as a false match or the prediction residuals would increase sharply due to the blocking artifact. In addition, only the upper side, left side, and left up side pixels along the boundary of the block contribute to the block-matching results. The limited number of primitives would result in high ambiguities in the matching process. It is important to rationally reduce the solution space to a more restricted one.
To alleviate the ill-effects incurred by blocking artifacts, we put the loop filtering at the rear of BMA intracoding step for each macroblock rather than perform it after the whole slice has already been coded. Thus all the previously coded macroblocks are well deblocked and provide more correct details for the subsequent blocks to find a good match. The prediction error propagation through all the macroblocks is then well controlled. The good compatibility with standard H.264/AVC can be expected since we only change the order of loop filtering step in the whole functional structure but not change the loop filtering itself. Also no extra complexity is induced by this improvement.
To further reduce the ambiguities involved in matching, we constrain the search space to a more restricted one than that in the original BMA method, as shown in Figure 5, only the left macroblock "M 1 ," the left-top macroblock "M 3 ," the top macroblock "M 0 ," the right-top macroblock "M 2 ," and those blocks numbered from 0-11 predicted ahead are considered as the candidate match of the current 4 × 4 luma block 12. Our extensive experiments proved that in most cases the globally optimal match can be captured by neighboring candidates M 0 -M 3 . It should be noted that macroblocks M 0 -M 3 have been loop filtered but the luma blocks 0-11 are not involved in deblocking before the current macroblock has been wholly predicted. Considering the compatibility with standard H.264/AVC, we restrict the search space to M 0 -M 3 .

Codec-Related Issues
We hybridize the two proposed schemes into an H.264/AVC functional structure as the new modes for intraprediction. For purpose of easy implementation and bit savings, we substitute mode, 4, 5, 6 of 4 × 4 luma prediction, mode 3 of 16 × 16 luma prediction, and 8 × 8 chroma prediction with corresponding recursive prediction mode. In addition, we replace mode 2 (which is DC mode in intraprediction for 4 × 4 blocks) with the enhanced BMA prediction mode without concern over those blocks on the upper or left frame boundary. Such a combination depends on the complementary properties of the two proposed schemes, which would be discussed in Section 4.
The encoder uses the new modes along with the other preserved modes to perform prediction for 4 × 4, 16 × 16, and 8 × 8 blocks. Among these prediction modes, the mode with the lowest rate-distortion cost would be selected as the optimal mode for prediction. Since there is no extra mode introduced, the syntax of the original standard of H.264/AVC remains unchanged. Only semantic or decoding processing needs to be modified correspondingly.
On the decoder part, we can directly perform the operations similar to those at the encoder for recursive prediction. As for mode 2, we first check whether the block is located at the upper or the left boundary of the frame. If so, we decode it using normal DC mode. Otherwise, we decode it using enhanced BMA intraprediction mode. Before decoding one block in enhanced BMA mode, loop filter is imposed on the nearest neighboring macroblocks to alleviate blocking effects, as shown in Figure 5. Afterward, the decoder runs a block search in the current frame. The best match would be utilized for prediction.

Experimental Results
To characterize the performance of two proposed prediction schemes, we select a variety of video sequences to execute the intracoding tests. Here we provide comparison experiments to evaluate the performance of five intracoding prediction schemes. Besides the proposed recursive intraprediction scheme (R scheme) and enhanced BMA intraprediction scheme (E-BMA scheme), the standard intraprediction scheme in H.264/AVC (S scheme) [1], the original BMA intracoding scheme (BMA scheme) [7], and the hybrid intraprediction scheme (H scheme) combined the two proposed methods are testified in terms of computational complexity, lossless compression, and variable bitrate. The baseline work is referred to the open H.264/AVC codec rev602 [12].
At first, we provide the common configuration parameters in the tests. Frame rate is set at 30 Hz. The total number of the encoded frames is 100 for each test sequence. Hadamard transform is enforced on these video frames. 8×8 transform is not chosen. As for the entropy coding, the CAVLC (context-based adaptive variable length coding) is used for the experiment, RDO is enabled and all I frames of video are encoded as intraframe with different QP (QP = 0 for lossless). As for other typical settings such as CABAC entropy coding, RDO disabled, rate control enabled, experiments consistently show similar gains of our proposed scheme. In the following experiments, we regard S scheme as the anchor and analyze the relative performance of the other four counterparts.    [13]. In the case of DDR mode (diagonal down right) of Intra 4×4 prediction, the pixels from a-p in Figure 1 are predicted from the uniform formulation (I + 2Q + A + 2)/4 as referred to formula (1) and the tap filters designated in Table 1. It needs 3 addition operations, 1 multiplication (bitwise left shift) operation and 1 division (bitwise right shift) operation to calculate the prediction sample. However, we can replace some multiplication operations with addition, for example, using Q + Q instead of 2 × Q. So we only need four times additions and one division operations for one pixel. Besides DDR mode, the other modes can be computed in the similar way. Table 2 presents the computational complexity analysis of recursive prediction relative to the normal mode in H.264/AVC (S scheme), which is obtained by counting addition/subtraction and multiplication/division for corresponding 4 × 4, 16 × 16, or chrome block. The difference of computational complexity between BMA scheme and E-BMA scheme mainly depends on the search range selected in both schemes since two computational structures are EURASIP Journal on Advances in Signal Processing 7 equal except the loop filtering order. Thus E-BMA can be expected with lower computational complexity than BMA scheme because of a narrower search range. Compared with S scheme, the increased complexity is high because of the additional block-matching step. The computational complexity in the encoder is similar to that of motion estimation, using a 9-pixel template inside the search region. Therefore the order of complexity in the encoder is similar to that of P-slice. In Figure 5, we need 748 times computation (formula (2)) and comparison for every 4×4 block with fullpixel block-matching in our current implementation, which makes both encoder and decoder 5∼8 times slower than standard H.264/AVC intraprediction.
It seems such high computational complexity will offset benefits of E-BMA scheme. However, fast search techniques similar to fast motion estimation in inter prediction and parallel algorithms can be employed in block-matching to greatly speed up our current full-pixel procedure. Such accelerated methods are out of the scope of this paper, but we conjecture complexity of H scheme which integrate R and EBMA scheme should be something between intra-(I slice) and inter prediction(P slice). Therefore, the complexity issue of proposed hybrid intramode is not so serious when used with inter prediction (P or B slice).
In the decoder, the increase of computational complexity depends on the number of the blocks that use this mode. In sequences where the E-BMA mode really helps in the coding efficiency, this mode is selected in the order of 15% ∼35% of the blocks. In sequences where the E-BMA mode is not selected, the additional computational complexity is negligible.

Experiment II: Performance Evaluation with Respect to
Lossless Compression. As analyzed in experiment I, the difference between BMA scheme and E-BMA scheme depends on two facets, namely, the order of loop filtering step in the whole functional structure and the search range of the best match to output the prediction residuals. Since the search range is a kind of parameter setting problem, the main difference related to the fundamental mechanism exists in the loop filtering order. However, there is no loop filtering adopted in the H.264/AVC video coding standard under the lossless compression case. Therefore the BMA scheme and the E-BMA scheme would be expected with similar performance evaluation with regard to lossless compression.
According to (3), we list in Table 3 the bitrate saved by E-BMA scheme, R scheme, and H scheme when compared to S scheme in a varied corpus of YUV video sequences recorded at QCIF, CIF, 4CIF, and HD resolutions, In the above formulation, B x denotes the bitrate required in the given scheme while B s represents the anchor one required in S scheme. From the above analysis in Table 3, it is shown that the bitrate is positively reduced in E-BMA scheme for lossless compression of all the test video sequences. As for R scheme, the bitrate is somewhat oscillatory with negative reduction in a few sequences which have more local directional smooth structures (e.g., background of "foreman"). The pixelwise recursive prediction is not effective in these areas. As a hybrid combination of E-BMA scheme and R scheme, H scheme achieves the highest bitrate savings with the average reduction of 1%. In general, the test sequences are coded at a slightly lower bitrate in E-BMA scheme, R scheme, and H scheme as compared to S scheme for achieving lossless quality.

Experiment III: Performance Evaluation with Respect to
Variable Bitrate. To cover a wide range of bitrates, we choose the QP values among 16, 20, 24, 28, 32, and 36. Thus the performance of the prediction schemes could be evaluated from high bitrate to low bitrate. Here PSNR tool is used to measure video quality under varied prediction schemes. Given PSNR measurement of S scheme, we define the PSNR gain of the other schemes as where PSNR x denotes the peak signal-to-noise ratio acquired in the given scheme while PSNR s represents the reference one acquired in S scheme. Similar to the calculation in [14], the outputs of ΔPSNR are averaged for all the QP options (16∼ 36) and listed in Table 4. BMA scheme shows its advantages in a few video sequences, such as "Foreman," "City," and "Highway." However, no distinct improvements can be observed in the major part of the test sequences. Even degradation is introduced to some sequences as the blocking artifact increases the cost of (2). In contrast, the proposed E-BMA scheme improves the video quality by 0.2 dB on average. As for the proposed R scheme, half of the sequences are improved by over 0.1 dB in quality. A few sequences are somehow degraded while using R scheme, such as "Foreman" and "Highway" (the possible reason has been explained in Section 4.2). As the hybrid scheme between E-BMA and R, H scheme presents its promising performance in all the cases. We even can get 0.35 dB improvement in some sequences, for example, "Carphone," "Foreman," "City," and "Harbor." The main reason of such a positive evolution can be found in the complementary properties of E-BMA scheme and R scheme. In our experiments, it is shown that E-BMA has better performance under low bitrates in that block-matching is well known for its good performance in smooth regions. On the contrary, R scheme is motivated to preserve the textures within the block, which shows more promising performance under high bitrates. Also the video contents would affect the performance of these two schemes due to the distribution of smooth regions and nonsmooth regions. For example, R scheme achieves the higher prediction accuracy as in those sequences like "Carphone," "Crew," "Ice," "Foreman," and "Paris." But it runs in the opposite way as in the sequences like "Bus," "Coastguard," and "Waterfall." Furthermore, we use three rate distortion (RD) curves to demonstrate the improvement induced by the hybrid combination of E-BMA scheme and R scheme, respectively,   for three sequences recorded at different resolutions, that is, "Carphone," "City," and "Harbor," as illustrated in Figures 6, 7, and 8.

Conclusion
In this paper, we propose two schemes to further improve the performance of intraprediction in H.264/AVC. The new modes developed by these schemes replace the classical direction prediction modes of H.264. The experimental results demonstrate that our schemes could improve the overall performance of compressed I frame by 0.1∼0.47 dB as compared to the H.264/AVC standard. In addition, our schemes have high compatibility with many existing prediction methods. However, for video sequences with directional structures, recursive prediction degrades its performance a little. In our future research, we will explore more complex context to improve its performance of prediction. As for E-BMA, further gains can also be expected if we introduce adaptive template and extend our block-matching to the subpixel accuracy case.