Skip to main content

Fast mode decision based on human noticeable luminance difference and rate distortion cost for H.264/AVC

Abstract

This article proposes a fast mode decision algorithm based on the correlation of the just-noticeable-difference (JND) and the rate distortion cost (RD cost) to reduce the computational complexity of H.264/AVC. First, the relationship between the average RD cost and the number of JND pixels is established by Gaussian distributions. Thus, the RD cost of the Inter 16 × 16 mode is compared with the predicted thresholds from these models for fast mode selection. In addition, we use the image content, the residual data, and JND visual model for horizontal/vertical detection, and then utilize the result to predict the partition in a macroblock. From the experimental results, a greater time saving can be achieved while the proposed algorithm also maintains performance and quality effectively.

1. Introduction

With sophisticated technology increasing, multimedia communication has become an important part of human life. In addition to general telecommunications, widespread Internet reliance has made video communication essential. However, the quality of video communication is highly dependent on the efficiency and quality of video transmission. Therefore, many international standards have been developed in recent years. H.264/AVC is one of the popular video coding standards [1]. It is widely applied in video transmission and compression products, e.g., mobile phones, video surveillance, digital TV, etc. Although H.264/AVC has a high coding efficiency, enormous computational complexity is required. In particular, the mode decision procedure occupies the majority of computational complexity due to the evaluation of several inter modes and nine intra predictive directions for Intra 4 × 4 as shown in Figures 1 and 2, respectively. Many studies related to the reduction of computational complexity of mode decision have been proposed.

Figure 1
figure 1

Several inter modes in mode decision for H.264/AVC.

Figure 2
figure 2

Nine intra predictive directions for Intra 4×4 and four directions (0–3) for Intra 16×16.

Bharanitharan et al. [2] proposed a classified region algorithm to reduce the inter mode candidates. The analyses of spatial/temporal homogeneity and edge direction were used to choose the inter modes needed for the rate distortion optimization (RDO) calculation. Choi et al. [3] considered those macroblocks (MBs) with the same motion vectors in the same object. Therefore, they utilized the characteristics of each 4 × 4 block after utilizing a Haar wavelet transform to test the homogeneity in an MB in order to select the candidate modes. Pan et al. [4] reordered the modes according to their probabilities and utilized the mean value and the standard deviation of rate distortion costs (RD costs) to be the early termination criterion of RDO. Lee and Lin [5] utilized the probabilities of several modes to calculate the average computation time in each mode. Yeh et al. [6] predicted the best mode based on Bayesian theory, and refined the prediction with the Markov process. The computational complexity was efficiently reduced. The SKIP mode condition was presented to make a consideration of the neighborhood and co-located information to achieve the reduction in the coding time in [7]. The relation between depth value and mode distribution was analyzed, and the mode candidates were chosen according to different levels of depth in an MB in [8]. Statistics were gathered for both of the RD cost and occurrence probability of each mode in [9]. The normal distribution of RD cost was adopted to calculate the thresholds for early termination. A 2D map was generated according to the neighboring motion vectors in [10]. Inter modes were reordered or removed via this 2D map. Ri et al. [11] defined a spatial-temporal mode prediction. The calculated RD cost and the co-located mode were utilized to produce the threshold for mode selection. Visual characteristics of tunnel surveillance videos were considered to analyze the structure of neighborhood inter/intra blocks for adapting the characteristics of static and fixed backgrounds in the observation systems in [12]. Codes in compliance with a coding order of previously neighboring blocks were assigned to increase the opportunity for an early termination in [13]. The relations between the discrete cosine transform (DCT), the sum of absolute difference, and the sum of square difference were established as the conditions for an early termination in [14]. Not only DCT but also the magnitude order of RD cost was discussed in [15]. The connection between the quantization parameter (QP) and the RD cost was experimented with to act as the threshold in [16]. The activity was calculated by utilizing the motion vectors of the neighboring and co-located blocks of the current block.

In addition to the methods of exploration of rate distortion and motion information, the human visual system (HVS) is also useful for improving video coding. Just-noticeable-difference (JND) is one of the important characteristics in HVS. A model of the luminance difference perception of human vision was developed in [17]. Knowledge about human visual luminance distortion was provided by this model. The human visual characteristics of JND were employed to analyze the content of video for the purpose of improving computational complexity in [18]. In [19], JND was utilized to re-measure image distortion. A perceptual rate distortion model was used to judge mode candidates. The necessary information was provided by gradient, variance, average contrast, and edge data in an MB for considering HVS [20].

This article proposes a fast mode decision algorithm which utilizes the correlation of JND pixels and RD cost to reduce the number of mode candidates. The rest of this article is organized as follows. In Section 2, the JND visual model and total number of non-JND pixels are discussed. The proposed fast mode decision algorithm is described in Section 3. The extensive experimental results are presented in Section 4. In Section 5, conclusion remarks are provided.

2. JND visual model

2.1. Human visual luminance difference

A JND model was applied as a human visual model as previously mentioned in [17]. JND refers to the visual threshold based on the background luminance. The difference between the foreground and background is smaller than that within a certain region, so the human eye is not able to detect it. That is, human eyes are allowed to tolerate certain luminance distortion. This type of feature can be incorporated into a fast mode decision if the observation of the human eye on a block can be characterized by such a feature. The redundancy of computational complexity will then be decreased. The JND visual model indicates the visual distortion of luminance and is shown in Figure 3.

JND Y i , j = T 0 × 1 Y i , j / 127 1 2 + 3 for Y i , j 127 JND Y i , j = γ × Y i , j 127 + 3 for Y i , j > 127
(1)

where Y(i,j) is the background luminance. T 0 and γ are constants (T 0 = 17, γ = 3/128). The horizontal axis represents the gray level of the background. Each value corresponds to a JND value on the vertical axis. If the gray level difference between the background and the object is smaller than that of the human visual distortion denoted as the JND value, the object could not be detected visually. This concept of the human visual distortion of the gray level difference can be extended to the temporal domain. We can utilize this characteristic of the noticeable difference to observe the variation of the gray level on the temporal domain. In a video stream, there similarly is the variation of the gray level on every pixel location. Through these variations of the gray level frame-by-frame, consequentially, part of the content is easy to be detected for the variation, but some part seems to be without any alteration. The JND model can thus detect this magnitude of variation of luminance for the human eye. Therefore, that is the purpose for applying noticeable difference to the temporal domain while this usage of pixel domain can also be expended to an MB. Therefore, the human visual distortion criterion of an MB is determined by this characteristic.

Figure 3
figure 3

Visibility thresholds due to background luminance [[17]].

In the proposed algorithm, the residual value of every pixel in an MB is compared. That is, the intensity values of the original pixels in the current MB are treated as the background luminance in the JND model, so that the JND value (visual threshold) can be obtained by using the model. If the residual value is less than the JND value, the variation of pixels cannot be perceived by human eyes.

2.2. Human visual characteristics in an MB

After describing the JND visual model, the details of how to utilize this visual characteristic in an MB will be discussed. First, an MB is divided into four 8 × 8 blocks. Pixels are subtracted into an 8 × 8 block from the block of the reference frame at the predicted location which is produced by the predictive motion vector as illustrated in Figure 4. The median motion vectors of the nearest neighboring coded MBs are taken as the predicted motion vectors. It is a little different from the motion vector predictor (MVP) in H.264/AVC, and an example is exhibited in Figure 5. In this step, the motion estimation of each 8 × 8 block does not need to be executed. The current MB is separated into four 8 × 8 blocks (C0, C1, C2, C3) while the coded neighboring MBs, respectively, are left (L), top (T), and upper right (UR). The left MB is the 8 × 8 mode labeled as L0, L1, L2, and L3. The top MB is the 8 × 16 mode labeled as T0 and T1. The upper right MB is the 16 × 8 mode labeled as UR0 and UR1. In this example, the predictive motion vector of block C0 is calculated with the motion vectors of L1, T0, and UR1 while that of block C1 is obtained with the motion vectors of L1, T1, and UR1. The predictive motion vector of block C2 is gotten with the motion vectors of L3, T0, and UR1 while that of block C3 is calculated with the motion vectors of L3, T1, and UR1.

Figure 4
figure 4

Subtraction from an 8×8 block of the reference frame at the predicted location.

Figure 5
figure 5

Example of the calculation of predictive motion vectors for the residual values of four 8×8 blocks in an MB for JND measurement.

An 8 × 8 block is chosen to be JND measurement instead of a 16 × 16 or 4 × 4 block due to the consideration of the structure of several modes in H.264/AVC. This mode structure can be imagined as a pyramid. Both the general and detail block partitions are considered. Hence, there are trade-offs when selecting the block in the mode structure. It should not be as unduly rough as a 16 × 16 block. Also it is not as excessively detailed as a 4 × 4 block. This concept of the different layers of block size is confined to the original mode structure of H.264/AVC. The mode structure of H.264/AVC is only supported by blocks of sizes 16, 8 and 4. No matter what the image resolution is, the biggest block size can only be 16 × 16 while the smallest can only be 4 × 4. Therefore, 8 × 8 is the nearest block size to the two sizes. If 16 × 16 is selected for JND measurement, this MB cannot estimate the different motions of smaller sizes, while if the size of 4 × 4 is adopted, the predictive motion vectors could be too diverse. If, on the other hand, 8 × 8 is selected, it can compensate the drawbacks for blocks that are too large or too small. Therefore, an 8 × 8 block is an applicable one as a basic block for measuring the visual distortion.

After the residual values of the four 8 × 8 blocks in an MB are obtained, the intensity values of the original pixels in these four 8 × 8 blocks are treated as the background luminance in the JND model which allows a JND value to be found for every pixel location. If the displaced residual value is smaller than the JND value, the change of the gray level cannot be detected by human eyes because the difference between the current pixel and the one at the predicted location is smaller than the human noticeable luminance distortion. Pixels are counted in this manner for every 8 × 8 block. The number of non-JND pixels (N NJND) in each 8 × 8 block is provided. The summation of four 8 × 8 blocks is the total number of non-JND pixels (TNNJND) in an MB. A criterion of the visual perception is provided by N NJND or TNNJND which comes from the original JND visual model. If an MB has more TNNJND, it will possess more characteristics of non-noticeable visual luminance distortion because the number of the points means the number of unnoticeable difference pixels. If there are lots of TNNJND in an MB, it belongs to a relatively low complexity of movement or image content since most of the temporal difference of the predicted location cannot be detected by human eyes. On the contrary, if there are few TNNJND in this MB, the temporal difference can be detected easily and thus this MB has a relatively high complexity of movement or image content.

The examples of N NJND and TNNJND under different conditions of image content and mode partition are exhibited in Figure 6. According to the process described above, the elements of visual judgment are obtained in an MB. N NJND0, N NJND1, N NJND2, and N NJND3 are the numbers of N NJND pixels in the four 8 × 8 blocks. TNNJND is the total number of N NJND pixels by summation. In this instance, Figure 6a depicts an example of low variability. It can be observed that the difference is not obvious according to the current MB, the motion compensated MB, and the residual data. Therefore, it processes lots of N NJND in each 8 × 8 block. The final mode partition is 8 × 16 which is a relatively large block type. Figure 6b gives an example of high variability. In this case, the content is obviously variable. It is part of the image in the Foreman sequence. Correspondingly, few N NJND are possessed in each 8 × 8 block due to its high temporal variability. Therefore, it is conceivable that its block type should be selected as a relatively detailed mode partition.

Figure 6
figure 6

Examples of low/high variation in an MB.

According to the above description, the relationship is obtained between the temporal difference, residual value, and N NJND in each 8 × 8 block. If the temporal discontinuity is larger, the block type has more opportunity for a comparatively detailed mode partition. Few N NJND and TNNJND are obtained due to the massive temporal variability while exceeding the visual unnoticeable distortion.

3. Proposed fast mode decision algorithm

The flowchart of the proposed algorithm is exhibited in Figure 7. The SKIP and 16 × 16 modes will be conducted first as the original flow of the coding standard. We can observe that if TNNJND is equal to 256 in an MB, it almost tends to be selected as SKIP or 16 × 16 mode because it means the difference of this MB in the temporal domain is negligible. Therefore, we just choose SKIP and 16 × 16 modes in mode candidates, and end up with the mode decision. We also provide some statistical examples of its accuracy in Table 1. In Table 1, the high accuracy can prove whether this process of early termination is practical. For usual cases, we try the intra 16 × 16 mode. When TNNJND is equal to zero, we add intra 4 × 4 mode into our mode candidates, because it means the temporal difference of this MB could be very large. The high accuracy is exhibited in Table 2 which indicates the probability of intra 4 × 4 when TNNJND is equal to zero. Since it does not have any of the characteristics of unnoticeable visual distortion, it cannot get a great coding efficiency from the motion compensation and spatial prediction of a large block (intra 16 × 16). In this case, we take the intra 4 × 4 mode into consideration of the intra prediction.

Figure 7
figure 7

The flowchart of the proposed mode decision algorithm.

Table 1 The accuracy of SKIP and 16 × 16 modes when TN NJND is 256 (QP24, 300 frames)
Table 2 The accuracy of intra 4 × 4 mode when TN NJND is 0 (QP24, 100 frames)

If the required inter mode candidates in H.264/AVC can be predicted accurately, the computational complexity can be reduced. We observe that the average RD cost of the 16 × 16 mode (RDcost16×16, the 16 × 16 mode must have been calculated for every MB in the original coding standard) since each final mode of the MB’s mode decision in inter frames is based on TNNJND which exhibits a trend that when the best mode belongs to a relatively bigger block size, for instance, SKIP or 16 × 16 mode, its RDcost16×16 is generally lower than the one that its best mode belongs to in detail partitions. We demonstrate this phenomenon in Figure 8 with six QCIF (Foreman, Grandma, Mother & Daughter (M/D), News, Salesman, Football), three CIF (Bike, Bridge, Highway), three 4CIF (Ice, Soccer, City), and two HD (Stockholm, Parkrun) sequences. Therefore, the distribution model is built between the average RDcost16×16 and TNNJND, and then the requisite mode candidates will be determined accurately. According to the relation between the average RDcost16×16 and TNNJND, the mode candidates are decided to be the relatively large or detailed mode partition.

Figure 8
figure 8

Average RDcost 16 × 16 versus TN NJND in each mode with 300 frames (QP 24).

The statistics of the average RDcost16×16 are gathered when the best mode is 16 × 16 based on each TNNJND using six QCIF sequences as Th1(TNNJND). The tendency judgement of the mode partition by the summation of Gaussian functions is defined as

Th j TN NJND = i = 1 3 a i , j × exp TN NJND b i , j c i , j 2 , j = 1 , 2
(2)

where j is the number of threshold. a i,j , b i,j , c i,j are coefficients for calculating Th j (TNNJND). Th1(TNNJND) can be modeled as the tendency judgement of the mode partition as shown in Figure 9. On the other hand, if there still are MBs in which the current RDcost16×16 is larger than Th1(TNNJND) with the best mode of 16 × 16, other statistics of the average RDcost16×16 can be obtained based on each TNNJND, which are also modeled by Equation (2) to produce Th2(TNNJND) excluding the specimens of MBs which conform to the sieve of Th1(TNNJND). Coefficients for Th1(TNNJND) and Th2(TNNJND) are listed in Tables 3 and 4, respectively. In Figure 9, the part of curves that are crossed is caused by curve fitting. Because the TNNJND samples which are close to 256 are much fewer, the curve trend would fall according to its smaller number of samples. In our experiments, the higher threshold is selected at this condition.

Figure 9
figure 9

The fitting curves of TN NJND versus average RDcost 16×16 . (a) QP24; (b) QP28; (c) QP32; (d) QP36; (e) QP40.

Table 3 Coefficients of Th 1 (TN NJND )
Table 4 Coefficients of Th 2 (TN NJND )

In Table 5, the distributions of various modes are compared. It can be observed that most modes belong to relatively large mode partitions. The total distribution is 60.47% for the SKIP and 16 × 16 modes. A distribution of 15.16% for both the 16 × 8 and 8 × 16 modes is obtained. The 8 × 8 mode and subMB occupy a total distribution of 20.09% in the various mode distributions.

Table 5 Distribution of various modes

Statistics of the same six QCIF sequences with 300 frames by QP 24 which analyze the accuracy of the proposed Th j (TNNJND) are listed in Table 6. Mode X is the best mode for an MB after conducting the RDO. P a (mode X), P b (mode X), and P c (mode X) are the probabilities of mode X given all MBs with mode X in inter frames under different conditions. P a (mode X) denotes that its current RDcost16×16 is equal to or lower than Th1(TNNJND). P b (mode X) means that its current RDcost16×16 is larger than Th1(TNNJND) and is equal to or lower than Th2(TNNJND). P c (mode X) indicates that its current RDcost16×16 is larger than Th2(TNNJND). The probability distribution can be displayed by Th1 and Th2 as listed in Table 6. The regions indicated in Table 6 are shown in Figure 10. The total probability distribution is 92.44% for SKIP mode and 85.65% for 16 × 16 mode in the entire inter frames. If the current RDcost16×16 is equal to or lower than Th j (TNNJND), some smaller modes are still required to be tested. In addition, most of the other smaller modes are distributed with TNNJND under 127. For P a (mode X), P b (mode X), P c (mode X), and the conditions of TNNJND of 127 exhibited in Table 6, there are other modes (16 × 8, 8 × 16, 8 × 8, and subMB) to be included according to Th j (TNNJND) as well as most of their probabilities dispensed with TNNJND under 127.

Table 6 The probability distribution of the proposed Th j (TN NJND ) for various modes
Figure 10
figure 10

Mode distribution by proposed algorithm according to TN NJND and Th j (TN NJND ).

In the proposed scheme, both 8 × 8 mode and subMB are not calculated when the condition that the current RDcost16×16 is equal to or lower than Th1(TNNJND) with TNNJND under 127 as shown by region 3 in Figure 10. The reason is that the occurrence probabilities of 8 × 8 mode and subMB are relatively low as demonstrated in Table 5 and the distribution of the proposed Th j (TNNJND) for various modes as demonstrated in Table 6. In Table 5, the total probabilities occupy only 20.09% among various modes. Most of the 8 × 8 mode and subMB are not distributed with high probabilities in region 3 according to the condition of Th j (TNNJND) in Table 6. Therefore, the occurrence probabilities of 8 × 8 mode and subMB determined in region 3 are small enough to be neglected.

The statistics that subMB are classified into four subMB_n where n is equal to 1, 2, 3, or 4 are shown in Tables 7 and 8. The n indicates the number of subMB in an MB. The importance of subMB is analyzed according to the utilization rate. For instance, if there is only one subMB either 8 × 4, 4 × 8, or 4 × 4 mode in an MB and the other three subMBs are 8 × 8 blocks, it belongs to the classification of subMB_1, and so on. Because the utilization probability with four subMBs in an MB is obviously low, the performance degradation will not be increased much when the termination is made early to ignore the computation of subMBs.

Table 7 Distribution of subMBs
Table 8 The probability distribution of the proposed Th j (TN NJND ) for subMBs

After the analyses of Tables 5, 6, 7, and 8, the relation between TNNJND, current RDcost16×16, and Th j (TNNJND) is exhibited in Figure 10. In Figure 10, the regions 1, 2, and 4 are allocated by checking all inter modes. SKIP, 16 × 16, 16 × 8, and 8 × 16 modes are included in region 3. With regard to regions 5 and 6, procedures can be terminated early after conducting the RDOs of SKIP and 16 × 16 modes. The condition of TNNJND is firstly determined in the proposed algorithm because TNNJND is the key factor to decide whether or not to check other modes except SKIP and 16 × 16 modes. The decision is made according to the above analyses of the mode distributions from Table 6. Afterwards, if TNNJND is equal to or lower than 127, the comparatively strict Th1(TNNJND) is chosen to be the threshold. If the current RDcost16×16 is equal to or lower than Th1(TNNJND) (region 3 in Figure 10), 16 × 8 and 8 × 16 modes are added in the final mode candidates. If current RDcost16×16 is larger than Th1(TNNJND) (regions 1 and 2 in Figure 10), all inter modes are mode candidates. Also, if TNNJND is larger than 127, the relatively loose Th2(TNNJND) is selected to be the threshold. If the RDcost16×16 of the current MB is equal to or lower than Th2(TNNJND) (regions 5 and 6 in Figure 10), the procedure will be terminated early and no other mode candidate is added. Otherwise, as shown in region 4 in Figure 10, 16 × 8, 8 × 16, 8 × 8 modes, and subMBs are entered as final mode candidates.

According to Figure 10, the relation between TNNJND and Th j (TNNJND) can be discussed. If TNNJND is large in an MB, more visual non-JND will exist. If very low current RDcost16×16 is possessed by an MB, it tends to choose relatively large blocks according to the illustration in Figure 8. There are more opportunities to choose large blocks after the procedure of the mode decision. This characteristic is also possessed by the proposed algorithm. If there are more TNNJND in an MB, or the MB possesses very low current RDcost16×16, the procedure has more opportunities to choose a relatively larger block and to terminate the process earlier. Therefore, the cost of unnecessary computational complexity to choose the best mode among mode candidates can be reduced according to the total number of non-JND pixels and current RDcost16×16 and by the fitting curves to give appropriate Th j (TNNJND) produced from statistics.

3.1. Characteristics of image direction

Following our flowchart shown in Figure 7, the direction of image texture should be considered after the previous steps. The directional characteristic in an MB will be discussed. When the directional characteristic in an MB is strong enough, only one of the two directional modes, 16 × 8 and 8 × 16 modes, is needed because these two modes cannot coexist. Therefore, if only one of the two directional modes in the final mode candidates is chosen, the computational cost can be further reduced.

The edge information of an MB is calculated by Sobel edge detector to decide whether both 16 × 8 and 8 × 16 modes are included as mode candidates or not. If the edge magnitude of any pixel in an MB is larger than 180, the horizontal or vertical decision is made according to Equations (3), (4), (5). Part of the chin image in Foreman sequence is shown in Figure 11a, which is the real mode structure after encoding. The white pixels in Figure 11b indicate those pixels whose Sobel edge magnitudes are larger than 180.

Figure 11
figure 11

The examples of edge characteristics in an MB.

Three groups of edge directions are calculated in the proposed algorithm including the horizontal and vertical calculations of the original gray value (H1/V1), the residual compensated by the MVP in an MB (H2/V2), and the N NJND distribution for each 8 × 8 block (H3/V3). Figure 12a shows an example of N NJND distribution at the brim of the hat in the Foreman sequence. The numbers of N NJND in Figure 12b,c are distributed in the horizontal structure. The best mode is 16 × 8. H3 is much larger than V3.

H i = x = 0 15 y = 0 7 | D i , 16 x 16 x , 2 y + 1 D i , 16 x 16 x , 2 y |
(3a)
V i = x = 0 7 y = 0 15 | D i , 16 x 16 2 x + 1 , y D i , 16 x 16 2 x , y |
(3b)

where i is equal to 1 or 2 for the original or residual blocks, respectively. The x and y are the pixel coordinates in an MB. D i,16×16 is the input information in an MB.

H 3 = N NJND 0 + N NJND 1 N NJND 2 + N NJND 3
(4a)
V 3 = N NJND 0 + N NJND 2 N NJND 1 + N NJND 3
(4b)
16 × 8 mode , if H i > V i , i = 1 , 2 , 3 8 × 16 mode , if H i < V i , i = 1 , 2 , 3 16 × 8 , 8 × 16 modes , otherwise
(5)
Figure 12
figure 12

The examples of N NJND distribution for part of Foreman sequence.

Afterwards, the horizontal and vertical magnitudes are compared to decide the directional mode of an MB as exhibited in Equation (5). For instance, if horizontal characteristics are larger than vertical ones, a 16 × 8 mode would be included in the final mode selection. There will be a strong horizontal feature in the MB. On the contrary, if horizontal characteristics are smaller than vertical ones, the vertical feature is strong and only the 8 × 16 mode is included in the final mode candidates. Otherwise, both 16 × 8 and 8 × 16 modes would be included in the final mode candidates.

3.2. Complete algorithm

The following steps describe the complete algorithm:

  1. 1)

    Calculate the RD costs of SKIP and 16 × 16 modes first, and get the TNNJND. If TNNJND is equal to 256, go to step 8. Otherwise, add an intra 16 × 16 into the mode candidates, and go to step 2.

  2. 2)

    If TNNJND is equal to zero, add an intra 4 × 4 into the mode candidates, and go to step 3. Otherwise, go to step 3 directly.

  3. 3)

    If TNNJND is equal to or lower than 127, go to step 4. Otherwise, go to step 5.

  4. 4)

    If RDcost16×16 is equal to or lower than Th1(TNNJND), go to step 6 directly. Otherwise, add an 8 × 8 mode and subMB into mode candidates and go to step 6.

  5. 5)

    If RDcost16×16 is equal to or lower than Th2(TNNJND), go to step 8. Otherwise, add an 8 × 8 mode and subMB into mode candidates and go to step 6.

  6. 6)

    If there is edge magnitude of any pixel in an MB is larger than 180, go to step 7. Otherwise, add 16 × 8 and 8 × 16 modes, and go to step 8.

  7. 7)

    Check the horizontal/vertical decision to add a 16 × 8 or 8 × 16 mode, and go to step 8.

  8. 8)

    Calculate the best mode from the final mode candidates.

4. Experimental results

In order to evaluate the performance, the proposed algorithm is compared with Eduardo et al.’s [9], Zhao et al.’s [10], and the previous study [18]. The encoding is tested on the PC with Quad CPU Q9400 2.66 GHz and 1.96 GB of memory. The time saving TS is defined as

TS = T o T p T o × 100 % ,
(6)

where T o is the total encoding time of the original H.264/AVC software JM16.2 [21]. T p is for the compared algorithm. The peak-signal-to-noise-ratio reduction ΔPSNR is defined as

Δ PSNR = PSNR p PSNR o ,
(7)

where PSNR o is the original PSNR for JM16.2. PSNR p is for the compared algorithm. The bit-rate increase ΔBR is defined as

Δ BR = bit rate p bit rate o bit rate o × 100 % ,
(8)

where bit-rate o is the total bit-rate encoded by original JM16.2, and the bit-rate p is for the compared algorithm.

In Tables 9, 10, 11, and 12, the performance and coding efficiency comparisons between the proposed algorithm and JM16.2 are exhibited for IPPP and IBBP frame structures, respectively. The BDPSNR and BDBR [22] are listed in Table 13. The performances of the proposed algorithm, Zhao et al.’s [10], and JNDMD [18] are compared in Table 14. The 12 tested benchmark video sequences are Foreman (QCIF), Grandma (QCIF), Mother & Daughter (QCIF), News (QCIF), Salesman (QCIF), Coastguard (CIF), Mobile (CIF), Silent (CIF), Stefan (CIF), Table (CIF), Stockholm (HD), and Parkrun (HD). The coding frame structures are IPPP and IBBP with 300 frames in QPs of 24, 28, 32, 36, and 40 of H.264/AVC software JM16.2 [21]. The QPs of 24, 28, 32, and 36 are used in BDPSNR and BDBR. Other parameter settings are as follows: IntraPeriod is 10; ReferenceFrame is 5; SearchMode uses UMHexagonS; SymbolMode uses CABAC. The search range is ±32 for QCIF and CIF videos, ±64 for SD sequences. According to the experimental results, the coding efficiency and rate distortion performance of the proposed algorithm are much better than those of Zhao et al.’s [10] and of [18]. The time saving of 63.844% is achieved with 0.409% increment of the total bit-rate and average 0.031 dB loss of PSNR. The time savings of Coastguard, Mobile, Stockholm, and Parkrun sequences which have the common video characteristics of a camera moving are smaller. This is the key point which affects the performance according to the criterion of the judgment of the mode decision. The temporal difference between the current block and the reference block is considered for measuring the activity of each MB. Therefore, if there is less TNNJND in an MB, there will be lots of visual noticeable differences. As the matter stands, RD cost is possibly higher than the average one because of a larger temporal difference. The video sequences with a camera moving have more MBs of large temporal difference than those of common video content because of the variable movement produced by the displacement of the camera in the process of making a film.

Table 9 Performance and coding efficiency comparisons of proposed algorithm with JM16.2 for IPPP frame structures (QP: 24, 28 and 32)
Table 10 Performance and coding efficiency comparisons of proposed algorithm with JM16.2 for IPPP frame structures (QP: 36 and 40)
Table 11 Performance and coding efficiency comparisons of proposed algorithm with JM16.2 for IBBP frame structures (QP: 24, 28 and 32)
Table 12 Performance and coding efficiency comparisons of proposed algorithm with JM16.2 for IBBP frame structures (QP:36 and 40)
Table 13 BDBR and BDPSNR in coding structures of IPPP and IBBP of the proposed algorithm compared to JM16.2
Table 14 Performance and coding efficiency comparisons of proposed algorithm with[10, 18] for IPPP and IBBP frame structures

In Tables 15 and 16, the proposed scheme is compared with Eduardo et al.’s [9] and JNDMD [18] in BDPSNR and BDBR using QPs 28, 32, 36, and 40 with 100 frames. Other coding parameter settings and simulation environments are set as previously mentioned. The tested ten benchmark video sequences are Akiyo (CIF), Container (CIF), Mobile (CIF), Paris (CIF), Carphone (QCIF), Claire (QCIF), Coastguard (QCIF), Highway (QCIF), Miss-Amer. (QCIF), and News (QCIF). The proposed scheme achieves outstanding coding efficiency. The time saving of 71.784% on average in IPPP and 65.456% in IBBP are obtained. The proposed algorithm provides better coding efficiency than those of Eduardo et al.’s [9] and JNDMD [18].

Table 15 Performance and coding efficiency comparisons of proposed algorithm with[9, 18] for IPPP frame structures
Table 16 Performance and coding efficiency comparisons of proposed algorithm with[9, 18] for IBBP frame structures

The subjective quality comparisons are shown in Figures 13 and 14. It can be observed that subjective detail or important information in still contents is not sacrificed. Consequently, better subjective quality would also be presented in continuous video sequences. Furthermore, the required coding time is substantially decreased. Therefore, a high coding efficiency can be achieved. The objective quality with PSNR/BDPSNR, bit-rate/BD bit-rate, and subjective quality are thus maintained. The experimental results demonstrate that the proposed method built on the correlation of HVS and RD cost is both practical and efficient.

Figure 13
figure 13

The subjective comparison for Foreman (QCIF) sequence (the 44th frame, IPPP): (a) JM16.2, (b) JNDMD [[18]], (c) proposed algorithm.

Figure 14
figure 14

The subjective comparison for Salesman (QCIF) sequence (the 47th frame, IBBP): (a) JM16.2, (b) JNDMD [[18]], (c) proposed algorithm.

5. Conclusion

In this article, an algorithm is proposed for fast mode decision making in the H.264/AVC video coding standard. Human visual characteristics are taken to analyze an MB. The human eye is simulated by analyzing the residual data with a JND model in order to obtain the statistics to establish the correlation of the RD cost and the JND. By using the proposed algorithm, the number of mode candidates can be reduced and the computational efficiency of H.264/AVC can be improved. The performance of the proposed algorithm is therefore proven to be better than those of previous studies.

References

  1. Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE. Trans. Circuits Syst. Video Technol 2003, 13(7):560-576.

    Article  Google Scholar 

  2. Bharanitharan K, Liu BD, Yang JF: Classified region algorithm for fast inter mode decision in H.264/AVC encoder. EURASIP J. Adv. Signal Process 2010, 2010: 1-10.

    Google Scholar 

  3. Choi BD, Nam JH, Hwang MC, Ko SJ: Fast motion estimation and intermode selection for H.264. EURASIP J. Adv. Signal Process 2006, 2006: 1-8.

    Google Scholar 

  4. Pan F, Yu H, Lin Z: Scalable fast rate-distortion optimization for H.264/AVC. EURASIP J. Adv. Signal Process 2006, 2006: 1-10.

    Google Scholar 

  5. Lee YM, Lin Y: Asymptotic computation in mode decision for H.264/AVC inter frame coding. J. Signal Process. Syst 2012, 66(2):121-127. 10.1007/s11265-011-0585-y

    Article  Google Scholar 

  6. Yeh CH, Fan KJ, Chen MJ, Li GL: Fast mode decision algorithm for scalable video coding using Bayesian theorem detection and Markov process. IEEE. Trans. Circuits Syst. Video Technol 2010, 20(4):563-574.

    Article  Google Scholar 

  7. Grecos C, Yang MY: Fast inter mode prediction for P slices in the H.264 video coding standard. IEEE Trans. Broadcast 2005, 51(2):256-263. 10.1109/TBC.2005.846192

    Article  Google Scholar 

  8. Lin YH, Wu JL: A depth information based fast mode decision algorithm for color plus depth-map 3D videos. IEEE Trans. Broadcast 2011, 57(2):542-550.

    Article  Google Scholar 

  9. Eduardo ME, Amaya JM, Fernando DM: An adaptive algorithm for fast inter mode decision in the H.264/AVC video coding standard. IEEE Trans. Consum. Electron 2010, 56(2):826-834.

    Article  Google Scholar 

  10. Zhao T, Wang H, Kwong S, Kuo C-CJ: Fast mode decision based on mode adaptation. IEEE Trnas. Circuits Syst. Video Technol 2010, 20(5):697-705.

    Article  Google Scholar 

  11. Ri SH, Vatis Y, Ostermann J: Fast inter-mode decision in an H.264/AVC encoder using mode and lagrangian cost correlation. IEEE Trans. Circuits Syst. Video Technol 2009, 19(2):302-306.

    Article  Google Scholar 

  12. Gan T, Alface PR: Fast mode decision for H.264/AVC encoding of tunnel surveillance video. Proceedings of the Second International Conferences on Advances in Multimedia 2010, 7-12.

    Google Scholar 

  13. Chen PH, Chen HM, Shie MC, Su CH, Mao WL, Huang CK: Adaptive fast block mode decision algorithm for H.264/AVC. Proceedings of the 5th IEEE Conference on Industrial Electronics and Applications 2010, 2002-2007.

    Google Scholar 

  14. Tang H, Shi HS: Fast mode decision algorithm for H.264/AVC based on all-zero blocks predetermination. Proceedings of the International Conference on Measuring Technology and Mechatronics Automation, 2 2009, 780-783.

    Google Scholar 

  15. Wang H, Kwong S, Kok CW: An efficient mode decision algorithm for H.264/AVC encoding optimization. IEEE Trans. Multimed 2007, 9(4):882-888.

    Article  Google Scholar 

  16. Zeng H, Cai CH, Ma KK: Fast mode decision for H.264/AVC based on macroblock motion activity. IEEE Trans. Circuits Syst. Video Technol 2009, 19(4):491-499.

    Article  Google Scholar 

  17. Chou CH, Li YC: A perceptually tuned subband image coder based on the measure of just-noticeable-distortion profile. IEEE Trans. Circuits Syst. Video Technol 1995, 5(6):467-476. 10.1109/76.475889

    Article  Google Scholar 

  18. Li MS, Chen MJ: Fast HVS-based mode decision for H.264/AVC using just-noticeable-difference. Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition 2011.

    Google Scholar 

  19. Wang H, Qian X, Liu G: Inter mode decision based on just noticeable difference profile. Proceedings of the 17th IEEE International Conference on Image Processing 2010, 297-300.

    Google Scholar 

  20. Shafique M, Molkenthin B, Henkel J: An HVS-based adaptive computational complexity reduction scheme for H.264/AVC video encoder using prognostic early mode exclusion. Proceedings of the Europe Conference & Exhibition Design, Automation & Test 2010, 1713-1718.

    Google Scholar 

  21. H.264/AVC Reference Software. http://iphome.hhi.de/suehring/tml/

  22. Bjontegaard G: Calculation of average PSNR differences between RD-curves, ITU-T SG16 Doc. VCEG-M33. 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mei-Juan Chen.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Li, MS., Chen, MJ., Tai, KH. et al. Fast mode decision based on human noticeable luminance difference and rate distortion cost for H.264/AVC. EURASIP J. Adv. Signal Process. 2013, 60 (2013). https://doi.org/10.1186/1687-6180-2013-60

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-6180-2013-60

Keywords