 Research
 Open Access
 Published:
Fast mode decision based on human noticeable luminance difference and rate distortion cost for H.264/AVC
EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 60 (2013)
Abstract
This article proposes a fast mode decision algorithm based on the correlation of the justnoticeabledifference (JND) and the rate distortion cost (RD cost) to reduce the computational complexity of H.264/AVC. First, the relationship between the average RD cost and the number of JND pixels is established by Gaussian distributions. Thus, the RD cost of the Inter 16 × 16 mode is compared with the predicted thresholds from these models for fast mode selection. In addition, we use the image content, the residual data, and JND visual model for horizontal/vertical detection, and then utilize the result to predict the partition in a macroblock. From the experimental results, a greater time saving can be achieved while the proposed algorithm also maintains performance and quality effectively.
1. Introduction
With sophisticated technology increasing, multimedia communication has become an important part of human life. In addition to general telecommunications, widespread Internet reliance has made video communication essential. However, the quality of video communication is highly dependent on the efficiency and quality of video transmission. Therefore, many international standards have been developed in recent years. H.264/AVC is one of the popular video coding standards [1]. It is widely applied in video transmission and compression products, e.g., mobile phones, video surveillance, digital TV, etc. Although H.264/AVC has a high coding efficiency, enormous computational complexity is required. In particular, the mode decision procedure occupies the majority of computational complexity due to the evaluation of several inter modes and nine intra predictive directions for Intra 4 × 4 as shown in Figures 1 and 2, respectively. Many studies related to the reduction of computational complexity of mode decision have been proposed.
Bharanitharan et al. [2] proposed a classified region algorithm to reduce the inter mode candidates. The analyses of spatial/temporal homogeneity and edge direction were used to choose the inter modes needed for the rate distortion optimization (RDO) calculation. Choi et al. [3] considered those macroblocks (MBs) with the same motion vectors in the same object. Therefore, they utilized the characteristics of each 4 × 4 block after utilizing a Haar wavelet transform to test the homogeneity in an MB in order to select the candidate modes. Pan et al. [4] reordered the modes according to their probabilities and utilized the mean value and the standard deviation of rate distortion costs (RD costs) to be the early termination criterion of RDO. Lee and Lin [5] utilized the probabilities of several modes to calculate the average computation time in each mode. Yeh et al. [6] predicted the best mode based on Bayesian theory, and refined the prediction with the Markov process. The computational complexity was efficiently reduced. The SKIP mode condition was presented to make a consideration of the neighborhood and colocated information to achieve the reduction in the coding time in [7]. The relation between depth value and mode distribution was analyzed, and the mode candidates were chosen according to different levels of depth in an MB in [8]. Statistics were gathered for both of the RD cost and occurrence probability of each mode in [9]. The normal distribution of RD cost was adopted to calculate the thresholds for early termination. A 2D map was generated according to the neighboring motion vectors in [10]. Inter modes were reordered or removed via this 2D map. Ri et al. [11] defined a spatialtemporal mode prediction. The calculated RD cost and the colocated mode were utilized to produce the threshold for mode selection. Visual characteristics of tunnel surveillance videos were considered to analyze the structure of neighborhood inter/intra blocks for adapting the characteristics of static and fixed backgrounds in the observation systems in [12]. Codes in compliance with a coding order of previously neighboring blocks were assigned to increase the opportunity for an early termination in [13]. The relations between the discrete cosine transform (DCT), the sum of absolute difference, and the sum of square difference were established as the conditions for an early termination in [14]. Not only DCT but also the magnitude order of RD cost was discussed in [15]. The connection between the quantization parameter (QP) and the RD cost was experimented with to act as the threshold in [16]. The activity was calculated by utilizing the motion vectors of the neighboring and colocated blocks of the current block.
In addition to the methods of exploration of rate distortion and motion information, the human visual system (HVS) is also useful for improving video coding. Justnoticeabledifference (JND) is one of the important characteristics in HVS. A model of the luminance difference perception of human vision was developed in [17]. Knowledge about human visual luminance distortion was provided by this model. The human visual characteristics of JND were employed to analyze the content of video for the purpose of improving computational complexity in [18]. In [19], JND was utilized to remeasure image distortion. A perceptual rate distortion model was used to judge mode candidates. The necessary information was provided by gradient, variance, average contrast, and edge data in an MB for considering HVS [20].
This article proposes a fast mode decision algorithm which utilizes the correlation of JND pixels and RD cost to reduce the number of mode candidates. The rest of this article is organized as follows. In Section 2, the JND visual model and total number of nonJND pixels are discussed. The proposed fast mode decision algorithm is described in Section 3. The extensive experimental results are presented in Section 4. In Section 5, conclusion remarks are provided.
2. JND visual model
2.1. Human visual luminance difference
A JND model was applied as a human visual model as previously mentioned in [17]. JND refers to the visual threshold based on the background luminance. The difference between the foreground and background is smaller than that within a certain region, so the human eye is not able to detect it. That is, human eyes are allowed to tolerate certain luminance distortion. This type of feature can be incorporated into a fast mode decision if the observation of the human eye on a block can be characterized by such a feature. The redundancy of computational complexity will then be decreased. The JND visual model indicates the visual distortion of luminance and is shown in Figure 3.
where Y(i,j) is the background luminance. T _{0} and γ are constants (T _{0} = 17, γ = 3/128). The horizontal axis represents the gray level of the background. Each value corresponds to a JND value on the vertical axis. If the gray level difference between the background and the object is smaller than that of the human visual distortion denoted as the JND value, the object could not be detected visually. This concept of the human visual distortion of the gray level difference can be extended to the temporal domain. We can utilize this characteristic of the noticeable difference to observe the variation of the gray level on the temporal domain. In a video stream, there similarly is the variation of the gray level on every pixel location. Through these variations of the gray level framebyframe, consequentially, part of the content is easy to be detected for the variation, but some part seems to be without any alteration. The JND model can thus detect this magnitude of variation of luminance for the human eye. Therefore, that is the purpose for applying noticeable difference to the temporal domain while this usage of pixel domain can also be expended to an MB. Therefore, the human visual distortion criterion of an MB is determined by this characteristic.
In the proposed algorithm, the residual value of every pixel in an MB is compared. That is, the intensity values of the original pixels in the current MB are treated as the background luminance in the JND model, so that the JND value (visual threshold) can be obtained by using the model. If the residual value is less than the JND value, the variation of pixels cannot be perceived by human eyes.
2.2. Human visual characteristics in an MB
After describing the JND visual model, the details of how to utilize this visual characteristic in an MB will be discussed. First, an MB is divided into four 8 × 8 blocks. Pixels are subtracted into an 8 × 8 block from the block of the reference frame at the predicted location which is produced by the predictive motion vector as illustrated in Figure 4. The median motion vectors of the nearest neighboring coded MBs are taken as the predicted motion vectors. It is a little different from the motion vector predictor (MVP) in H.264/AVC, and an example is exhibited in Figure 5. In this step, the motion estimation of each 8 × 8 block does not need to be executed. The current MB is separated into four 8 × 8 blocks (C_{0}, C_{1}, C_{2}, C_{3}) while the coded neighboring MBs, respectively, are left (L), top (T), and upper right (UR). The left MB is the 8 × 8 mode labeled as L_{0}, L_{1}, L_{2}, and L_{3}. The top MB is the 8 × 16 mode labeled as T_{0} and T_{1}. The upper right MB is the 16 × 8 mode labeled as UR_{0} and UR_{1}. In this example, the predictive motion vector of block C_{0} is calculated with the motion vectors of L_{1}, T_{0}, and UR_{1} while that of block C_{1} is obtained with the motion vectors of L_{1}, T_{1}, and UR_{1}. The predictive motion vector of block C_{2} is gotten with the motion vectors of L_{3}, T_{0}, and UR_{1} while that of block C_{3} is calculated with the motion vectors of L_{3}, T_{1}, and UR_{1}.
An 8 × 8 block is chosen to be JND measurement instead of a 16 × 16 or 4 × 4 block due to the consideration of the structure of several modes in H.264/AVC. This mode structure can be imagined as a pyramid. Both the general and detail block partitions are considered. Hence, there are tradeoffs when selecting the block in the mode structure. It should not be as unduly rough as a 16 × 16 block. Also it is not as excessively detailed as a 4 × 4 block. This concept of the different layers of block size is confined to the original mode structure of H.264/AVC. The mode structure of H.264/AVC is only supported by blocks of sizes 16, 8 and 4. No matter what the image resolution is, the biggest block size can only be 16 × 16 while the smallest can only be 4 × 4. Therefore, 8 × 8 is the nearest block size to the two sizes. If 16 × 16 is selected for JND measurement, this MB cannot estimate the different motions of smaller sizes, while if the size of 4 × 4 is adopted, the predictive motion vectors could be too diverse. If, on the other hand, 8 × 8 is selected, it can compensate the drawbacks for blocks that are too large or too small. Therefore, an 8 × 8 block is an applicable one as a basic block for measuring the visual distortion.
After the residual values of the four 8 × 8 blocks in an MB are obtained, the intensity values of the original pixels in these four 8 × 8 blocks are treated as the background luminance in the JND model which allows a JND value to be found for every pixel location. If the displaced residual value is smaller than the JND value, the change of the gray level cannot be detected by human eyes because the difference between the current pixel and the one at the predicted location is smaller than the human noticeable luminance distortion. Pixels are counted in this manner for every 8 × 8 block. The number of nonJND pixels (N _{NJND}) in each 8 × 8 block is provided. The summation of four 8 × 8 blocks is the total number of nonJND pixels (TN_{NJND}) in an MB. A criterion of the visual perception is provided by N _{NJND} or TN_{NJND} which comes from the original JND visual model. If an MB has more TN_{NJND}, it will possess more characteristics of nonnoticeable visual luminance distortion because the number of the points means the number of unnoticeable difference pixels. If there are lots of TN_{NJND} in an MB, it belongs to a relatively low complexity of movement or image content since most of the temporal difference of the predicted location cannot be detected by human eyes. On the contrary, if there are few TN_{NJND} in this MB, the temporal difference can be detected easily and thus this MB has a relatively high complexity of movement or image content.
The examples of N _{NJND} and TN_{NJND} under different conditions of image content and mode partition are exhibited in Figure 6. According to the process described above, the elements of visual judgment are obtained in an MB. N _{NJND0}, N _{NJND1}, N _{NJND2}, and N _{NJND3} are the numbers of N _{NJND} pixels in the four 8 × 8 blocks. TN_{NJND} is the total number of N _{NJND} pixels by summation. In this instance, Figure 6a depicts an example of low variability. It can be observed that the difference is not obvious according to the current MB, the motion compensated MB, and the residual data. Therefore, it processes lots of N _{NJND} in each 8 × 8 block. The final mode partition is 8 × 16 which is a relatively large block type. Figure 6b gives an example of high variability. In this case, the content is obviously variable. It is part of the image in the Foreman sequence. Correspondingly, few N _{NJND} are possessed in each 8 × 8 block due to its high temporal variability. Therefore, it is conceivable that its block type should be selected as a relatively detailed mode partition.
According to the above description, the relationship is obtained between the temporal difference, residual value, and N _{NJND} in each 8 × 8 block. If the temporal discontinuity is larger, the block type has more opportunity for a comparatively detailed mode partition. Few N _{NJND} and TN_{NJND} are obtained due to the massive temporal variability while exceeding the visual unnoticeable distortion.
3. Proposed fast mode decision algorithm
The flowchart of the proposed algorithm is exhibited in Figure 7. The SKIP and 16 × 16 modes will be conducted first as the original flow of the coding standard. We can observe that if TN_{NJND} is equal to 256 in an MB, it almost tends to be selected as SKIP or 16 × 16 mode because it means the difference of this MB in the temporal domain is negligible. Therefore, we just choose SKIP and 16 × 16 modes in mode candidates, and end up with the mode decision. We also provide some statistical examples of its accuracy in Table 1. In Table 1, the high accuracy can prove whether this process of early termination is practical. For usual cases, we try the intra 16 × 16 mode. When TN_{NJND} is equal to zero, we add intra 4 × 4 mode into our mode candidates, because it means the temporal difference of this MB could be very large. The high accuracy is exhibited in Table 2 which indicates the probability of intra 4 × 4 when TN_{NJND} is equal to zero. Since it does not have any of the characteristics of unnoticeable visual distortion, it cannot get a great coding efficiency from the motion compensation and spatial prediction of a large block (intra 16 × 16). In this case, we take the intra 4 × 4 mode into consideration of the intra prediction.
If the required inter mode candidates in H.264/AVC can be predicted accurately, the computational complexity can be reduced. We observe that the average RD cost of the 16 × 16 mode (RDcost_{16×16}, the 16 × 16 mode must have been calculated for every MB in the original coding standard) since each final mode of the MB’s mode decision in inter frames is based on TN_{NJND} which exhibits a trend that when the best mode belongs to a relatively bigger block size, for instance, SKIP or 16 × 16 mode, its RDcost_{16×16} is generally lower than the one that its best mode belongs to in detail partitions. We demonstrate this phenomenon in Figure 8 with six QCIF (Foreman, Grandma, Mother & Daughter (M/D), News, Salesman, Football), three CIF (Bike, Bridge, Highway), three 4CIF (Ice, Soccer, City), and two HD (Stockholm, Parkrun) sequences. Therefore, the distribution model is built between the average RDcost_{16×16} and TN_{NJND}, and then the requisite mode candidates will be determined accurately. According to the relation between the average RDcost_{16×16} and TN_{NJND}, the mode candidates are decided to be the relatively large or detailed mode partition.
The statistics of the average RDcost_{16×16} are gathered when the best mode is 16 × 16 based on each TN_{NJND} using six QCIF sequences as Th_{1}(TN_{NJND}). The tendency judgement of the mode partition by the summation of Gaussian functions is defined as
where j is the number of threshold. a _{ i,j }, b _{ i,j }, c _{ i,j } are coefficients for calculating Th_{ j }(TN_{NJND}). Th_{1}(TN_{NJND}) can be modeled as the tendency judgement of the mode partition as shown in Figure 9. On the other hand, if there still are MBs in which the current RDcost_{16×16} is larger than Th_{1}(TN_{NJND}) with the best mode of 16 × 16, other statistics of the average RDcost_{16×16} can be obtained based on each TN_{NJND}, which are also modeled by Equation (2) to produce Th_{2}(TN_{NJND}) excluding the specimens of MBs which conform to the sieve of Th_{1}(TN_{NJND}). Coefficients for Th_{1}(TN_{NJND}) and Th_{2}(TN_{NJND}) are listed in Tables 3 and 4, respectively. In Figure 9, the part of curves that are crossed is caused by curve fitting. Because the TN_{NJND} samples which are close to 256 are much fewer, the curve trend would fall according to its smaller number of samples. In our experiments, the higher threshold is selected at this condition.
In Table 5, the distributions of various modes are compared. It can be observed that most modes belong to relatively large mode partitions. The total distribution is 60.47% for the SKIP and 16 × 16 modes. A distribution of 15.16% for both the 16 × 8 and 8 × 16 modes is obtained. The 8 × 8 mode and subMB occupy a total distribution of 20.09% in the various mode distributions.
Statistics of the same six QCIF sequences with 300 frames by QP 24 which analyze the accuracy of the proposed Th_{ j }(TN_{NJND}) are listed in Table 6. Mode X is the best mode for an MB after conducting the RDO. P _{ a }(mode X), P _{ b }(mode X), and P _{ c }(mode X) are the probabilities of mode X given all MBs with mode X in inter frames under different conditions. P _{ a }(mode X) denotes that its current RDcost_{16×16} is equal to or lower than Th_{1}(TN_{NJND}). P _{ b }(mode X) means that its current RDcost_{16×16} is larger than Th_{1}(TN_{NJND}) and is equal to or lower than Th_{2}(TN_{NJND}). P _{ c }(mode X) indicates that its current RDcost_{16×16} is larger than Th_{2}(TN_{NJND}). The probability distribution can be displayed by Th_{1} and Th_{2} as listed in Table 6. The regions indicated in Table 6 are shown in Figure 10. The total probability distribution is 92.44% for SKIP mode and 85.65% for 16 × 16 mode in the entire inter frames. If the current RDcost_{16×16} is equal to or lower than Th_{ j }(TN_{NJND}), some smaller modes are still required to be tested. In addition, most of the other smaller modes are distributed with TN_{NJND} under 127. For P _{ a }(mode X), P _{ b }(mode X), P _{ c }(mode X), and the conditions of TN_{NJND} of 127 exhibited in Table 6, there are other modes (16 × 8, 8 × 16, 8 × 8, and subMB) to be included according to Th_{ j }(TN_{NJND}) as well as most of their probabilities dispensed with TN_{NJND} under 127.
In the proposed scheme, both 8 × 8 mode and subMB are not calculated when the condition that the current RDcost_{16×16} is equal to or lower than Th_{1}(TN_{NJND}) with TN_{NJND} under 127 as shown by region 3 in Figure 10. The reason is that the occurrence probabilities of 8 × 8 mode and subMB are relatively low as demonstrated in Table 5 and the distribution of the proposed Th_{ j }(TN_{NJND}) for various modes as demonstrated in Table 6. In Table 5, the total probabilities occupy only 20.09% among various modes. Most of the 8 × 8 mode and subMB are not distributed with high probabilities in region 3 according to the condition of Th_{ j }(TN_{NJND}) in Table 6. Therefore, the occurrence probabilities of 8 × 8 mode and subMB determined in region 3 are small enough to be neglected.
The statistics that subMB are classified into four subMB_n where n is equal to 1, 2, 3, or 4 are shown in Tables 7 and 8. The n indicates the number of subMB in an MB. The importance of subMB is analyzed according to the utilization rate. For instance, if there is only one subMB either 8 × 4, 4 × 8, or 4 × 4 mode in an MB and the other three subMBs are 8 × 8 blocks, it belongs to the classification of subMB_1, and so on. Because the utilization probability with four subMBs in an MB is obviously low, the performance degradation will not be increased much when the termination is made early to ignore the computation of subMBs.
After the analyses of Tables 5, 6, 7, and 8, the relation between TN_{NJND}, current RDcost_{16×16}, and Th_{ j }(TN_{NJND}) is exhibited in Figure 10. In Figure 10, the regions 1, 2, and 4 are allocated by checking all inter modes. SKIP, 16 × 16, 16 × 8, and 8 × 16 modes are included in region 3. With regard to regions 5 and 6, procedures can be terminated early after conducting the RDOs of SKIP and 16 × 16 modes. The condition of TN_{NJND} is firstly determined in the proposed algorithm because TN_{NJND} is the key factor to decide whether or not to check other modes except SKIP and 16 × 16 modes. The decision is made according to the above analyses of the mode distributions from Table 6. Afterwards, if TN_{NJND} is equal to or lower than 127, the comparatively strict Th_{1}(TN_{NJND}) is chosen to be the threshold. If the current RDcost_{16×16} is equal to or lower than Th_{1}(TN_{NJND}) (region 3 in Figure 10), 16 × 8 and 8 × 16 modes are added in the final mode candidates. If current RDcost_{16×16} is larger than Th_{1}(TN_{NJND}) (regions 1 and 2 in Figure 10), all inter modes are mode candidates. Also, if TN_{NJND} is larger than 127, the relatively loose Th_{2}(TN_{NJND}) is selected to be the threshold. If the RDcost_{16×16} of the current MB is equal to or lower than Th_{2}(TN_{NJND}) (regions 5 and 6 in Figure 10), the procedure will be terminated early and no other mode candidate is added. Otherwise, as shown in region 4 in Figure 10, 16 × 8, 8 × 16, 8 × 8 modes, and subMBs are entered as final mode candidates.
According to Figure 10, the relation between TN_{NJND} and Th_{ j }(TN_{NJND}) can be discussed. If TN_{NJND} is large in an MB, more visual nonJND will exist. If very low current RDcost_{16×16} is possessed by an MB, it tends to choose relatively large blocks according to the illustration in Figure 8. There are more opportunities to choose large blocks after the procedure of the mode decision. This characteristic is also possessed by the proposed algorithm. If there are more TN_{NJND} in an MB, or the MB possesses very low current RDcost_{16×16}, the procedure has more opportunities to choose a relatively larger block and to terminate the process earlier. Therefore, the cost of unnecessary computational complexity to choose the best mode among mode candidates can be reduced according to the total number of nonJND pixels and current RDcost_{16×16} and by the fitting curves to give appropriate Th_{ j }(TN_{NJND}) produced from statistics.
3.1. Characteristics of image direction
Following our flowchart shown in Figure 7, the direction of image texture should be considered after the previous steps. The directional characteristic in an MB will be discussed. When the directional characteristic in an MB is strong enough, only one of the two directional modes, 16 × 8 and 8 × 16 modes, is needed because these two modes cannot coexist. Therefore, if only one of the two directional modes in the final mode candidates is chosen, the computational cost can be further reduced.
The edge information of an MB is calculated by Sobel edge detector to decide whether both 16 × 8 and 8 × 16 modes are included as mode candidates or not. If the edge magnitude of any pixel in an MB is larger than 180, the horizontal or vertical decision is made according to Equations (3), (4), (5). Part of the chin image in Foreman sequence is shown in Figure 11a, which is the real mode structure after encoding. The white pixels in Figure 11b indicate those pixels whose Sobel edge magnitudes are larger than 180.
Three groups of edge directions are calculated in the proposed algorithm including the horizontal and vertical calculations of the original gray value (H_{1}/V_{1}), the residual compensated by the MVP in an MB (H_{2}/V_{2}), and the N _{NJND} distribution for each 8 × 8 block (H_{3}/V_{3}). Figure 12a shows an example of N _{NJND} distribution at the brim of the hat in the Foreman sequence. The numbers of N _{NJND} in Figure 12b,c are distributed in the horizontal structure. The best mode is 16 × 8. H_{3} is much larger than V_{3}.
where i is equal to 1 or 2 for the original or residual blocks, respectively. The x and y are the pixel coordinates in an MB. D_{ i,16×16} is the input information in an MB.
Afterwards, the horizontal and vertical magnitudes are compared to decide the directional mode of an MB as exhibited in Equation (5). For instance, if horizontal characteristics are larger than vertical ones, a 16 × 8 mode would be included in the final mode selection. There will be a strong horizontal feature in the MB. On the contrary, if horizontal characteristics are smaller than vertical ones, the vertical feature is strong and only the 8 × 16 mode is included in the final mode candidates. Otherwise, both 16 × 8 and 8 × 16 modes would be included in the final mode candidates.
3.2. Complete algorithm
The following steps describe the complete algorithm:

1)
Calculate the RD costs of SKIP and 16 × 16 modes first, and get the TN_{NJND}. If TN_{NJND} is equal to 256, go to step 8. Otherwise, add an intra 16 × 16 into the mode candidates, and go to step 2.

2)
If TN_{NJND} is equal to zero, add an intra 4 × 4 into the mode candidates, and go to step 3. Otherwise, go to step 3 directly.

3)
If TN_{NJND} is equal to or lower than 127, go to step 4. Otherwise, go to step 5.

4)
If RDcost_{16×16} is equal to or lower than Th_{1}(TN_{NJND}), go to step 6 directly. Otherwise, add an 8 × 8 mode and subMB into mode candidates and go to step 6.

5)
If RDcost_{16×16} is equal to or lower than Th_{2}(TN_{NJND}), go to step 8. Otherwise, add an 8 × 8 mode and subMB into mode candidates and go to step 6.

6)
If there is edge magnitude of any pixel in an MB is larger than 180, go to step 7. Otherwise, add 16 × 8 and 8 × 16 modes, and go to step 8.

7)
Check the horizontal/vertical decision to add a 16 × 8 or 8 × 16 mode, and go to step 8.

8)
Calculate the best mode from the final mode candidates.
4. Experimental results
In order to evaluate the performance, the proposed algorithm is compared with Eduardo et al.’s [9], Zhao et al.’s [10], and the previous study [18]. The encoding is tested on the PC with Quad CPU Q9400 2.66 GHz and 1.96 GB of memory. The time saving TS is defined as
where T _{ o } is the total encoding time of the original H.264/AVC software JM16.2 [21]. T _{ p } is for the compared algorithm. The peaksignaltonoiseratio reduction ΔPSNR is defined as
where PSNR_{ o } is the original PSNR for JM16.2. PSNR_{ p } is for the compared algorithm. The bitrate increase ΔBR is defined as
where bitrate_{ o } is the total bitrate encoded by original JM16.2, and the bitrate_{ p } is for the compared algorithm.
In Tables 9, 10, 11, and 12, the performance and coding efficiency comparisons between the proposed algorithm and JM16.2 are exhibited for IPPP and IBBP frame structures, respectively. The BDPSNR and BDBR [22] are listed in Table 13. The performances of the proposed algorithm, Zhao et al.’s [10], and JNDMD [18] are compared in Table 14. The 12 tested benchmark video sequences are Foreman (QCIF), Grandma (QCIF), Mother & Daughter (QCIF), News (QCIF), Salesman (QCIF), Coastguard (CIF), Mobile (CIF), Silent (CIF), Stefan (CIF), Table (CIF), Stockholm (HD), and Parkrun (HD). The coding frame structures are IPPP and IBBP with 300 frames in QPs of 24, 28, 32, 36, and 40 of H.264/AVC software JM16.2 [21]. The QPs of 24, 28, 32, and 36 are used in BDPSNR and BDBR. Other parameter settings are as follows: IntraPeriod is 10; ReferenceFrame is 5; SearchMode uses UMHexagonS; SymbolMode uses CABAC. The search range is ±32 for QCIF and CIF videos, ±64 for SD sequences. According to the experimental results, the coding efficiency and rate distortion performance of the proposed algorithm are much better than those of Zhao et al.’s [10] and of [18]. The time saving of 63.844% is achieved with 0.409% increment of the total bitrate and average 0.031 dB loss of PSNR. The time savings of Coastguard, Mobile, Stockholm, and Parkrun sequences which have the common video characteristics of a camera moving are smaller. This is the key point which affects the performance according to the criterion of the judgment of the mode decision. The temporal difference between the current block and the reference block is considered for measuring the activity of each MB. Therefore, if there is less TN_{NJND} in an MB, there will be lots of visual noticeable differences. As the matter stands, RD cost is possibly higher than the average one because of a larger temporal difference. The video sequences with a camera moving have more MBs of large temporal difference than those of common video content because of the variable movement produced by the displacement of the camera in the process of making a film.
In Tables 15 and 16, the proposed scheme is compared with Eduardo et al.’s [9] and JNDMD [18] in BDPSNR and BDBR using QPs 28, 32, 36, and 40 with 100 frames. Other coding parameter settings and simulation environments are set as previously mentioned. The tested ten benchmark video sequences are Akiyo (CIF), Container (CIF), Mobile (CIF), Paris (CIF), Carphone (QCIF), Claire (QCIF), Coastguard (QCIF), Highway (QCIF), MissAmer. (QCIF), and News (QCIF). The proposed scheme achieves outstanding coding efficiency. The time saving of 71.784% on average in IPPP and 65.456% in IBBP are obtained. The proposed algorithm provides better coding efficiency than those of Eduardo et al.’s [9] and JNDMD [18].
The subjective quality comparisons are shown in Figures 13 and 14. It can be observed that subjective detail or important information in still contents is not sacrificed. Consequently, better subjective quality would also be presented in continuous video sequences. Furthermore, the required coding time is substantially decreased. Therefore, a high coding efficiency can be achieved. The objective quality with PSNR/BDPSNR, bitrate/BD bitrate, and subjective quality are thus maintained. The experimental results demonstrate that the proposed method built on the correlation of HVS and RD cost is both practical and efficient.
5. Conclusion
In this article, an algorithm is proposed for fast mode decision making in the H.264/AVC video coding standard. Human visual characteristics are taken to analyze an MB. The human eye is simulated by analyzing the residual data with a JND model in order to obtain the statistics to establish the correlation of the RD cost and the JND. By using the proposed algorithm, the number of mode candidates can be reduced and the computational efficiency of H.264/AVC can be improved. The performance of the proposed algorithm is therefore proven to be better than those of previous studies.
References
Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE. Trans. Circuits Syst. Video Technol 2003, 13(7):560576.
Bharanitharan K, Liu BD, Yang JF: Classified region algorithm for fast inter mode decision in H.264/AVC encoder. EURASIP J. Adv. Signal Process 2010, 2010: 110.
Choi BD, Nam JH, Hwang MC, Ko SJ: Fast motion estimation and intermode selection for H.264. EURASIP J. Adv. Signal Process 2006, 2006: 18.
Pan F, Yu H, Lin Z: Scalable fast ratedistortion optimization for H.264/AVC. EURASIP J. Adv. Signal Process 2006, 2006: 110.
Lee YM, Lin Y: Asymptotic computation in mode decision for H.264/AVC inter frame coding. J. Signal Process. Syst 2012, 66(2):121127. 10.1007/s112650110585y
Yeh CH, Fan KJ, Chen MJ, Li GL: Fast mode decision algorithm for scalable video coding using Bayesian theorem detection and Markov process. IEEE. Trans. Circuits Syst. Video Technol 2010, 20(4):563574.
Grecos C, Yang MY: Fast inter mode prediction for P slices in the H.264 video coding standard. IEEE Trans. Broadcast 2005, 51(2):256263. 10.1109/TBC.2005.846192
Lin YH, Wu JL: A depth information based fast mode decision algorithm for color plus depthmap 3D videos. IEEE Trans. Broadcast 2011, 57(2):542550.
Eduardo ME, Amaya JM, Fernando DM: An adaptive algorithm for fast inter mode decision in the H.264/AVC video coding standard. IEEE Trans. Consum. Electron 2010, 56(2):826834.
Zhao T, Wang H, Kwong S, Kuo CCJ: Fast mode decision based on mode adaptation. IEEE Trnas. Circuits Syst. Video Technol 2010, 20(5):697705.
Ri SH, Vatis Y, Ostermann J: Fast intermode decision in an H.264/AVC encoder using mode and lagrangian cost correlation. IEEE Trans. Circuits Syst. Video Technol 2009, 19(2):302306.
Gan T, Alface PR: Fast mode decision for H.264/AVC encoding of tunnel surveillance video. Proceedings of the Second International Conferences on Advances in Multimedia 2010, 712.
Chen PH, Chen HM, Shie MC, Su CH, Mao WL, Huang CK: Adaptive fast block mode decision algorithm for H.264/AVC. Proceedings of the 5th IEEE Conference on Industrial Electronics and Applications 2010, 20022007.
Tang H, Shi HS: Fast mode decision algorithm for H.264/AVC based on allzero blocks predetermination. Proceedings of the International Conference on Measuring Technology and Mechatronics Automation, 2 2009, 780783.
Wang H, Kwong S, Kok CW: An efficient mode decision algorithm for H.264/AVC encoding optimization. IEEE Trans. Multimed 2007, 9(4):882888.
Zeng H, Cai CH, Ma KK: Fast mode decision for H.264/AVC based on macroblock motion activity. IEEE Trans. Circuits Syst. Video Technol 2009, 19(4):491499.
Chou CH, Li YC: A perceptually tuned subband image coder based on the measure of justnoticeabledistortion profile. IEEE Trans. Circuits Syst. Video Technol 1995, 5(6):467476. 10.1109/76.475889
Li MS, Chen MJ: Fast HVSbased mode decision for H.264/AVC using justnoticeabledifference. Proceedings of the International Conference on Image Processing, Computer Vision, and Pattern Recognition 2011.
Wang H, Qian X, Liu G: Inter mode decision based on just noticeable difference profile. Proceedings of the 17th IEEE International Conference on Image Processing 2010, 297300.
Shafique M, Molkenthin B, Henkel J: An HVSbased adaptive computational complexity reduction scheme for H.264/AVC video encoder using prognostic early mode exclusion. Proceedings of the Europe Conference & Exhibition Design, Automation & Test 2010, 17131718.
H.264/AVC Reference Software. http://iphome.hhi.de/suehring/tml/
Bjontegaard G: Calculation of average PSNR differences between RDcurves, ITUT SG16 Doc. VCEGM33. 2001.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Li, MS., Chen, MJ., Tai, KH. et al. Fast mode decision based on human noticeable luminance difference and rate distortion cost for H.264/AVC. EURASIP J. Adv. Signal Process. 2013, 60 (2013). https://doi.org/10.1186/16876180201360
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/16876180201360
Keywords
 Mode decision
 H.264/AVC
 Rate distortion cost
 Human vision system