Mode decision acceleration for H.264/AVC to SVC temporal video transcoding

This study presents a fast video transcoding architecture that overcomes the complexity of different coding structures between H.264/AVC and SVC. The proposed algorithms simplify the mode decision process in SVC owing to its heavy computations. Two scenarios namely transcoding with the same quantization parameter and bitrate reduction are considered. In the first scenario, SVC’s modes are determined by the probability models, including conditional probability, Bayesian theorem, and Markov chain. The second scenario measures MB activity to determine SVC’s modes. Experimental results indicate that our algorithm saves significant coding time with negligible PSNR loss over that when using a cascaded pixel-domain transcoder.

The Joint Video Team consists of ITU-T VCEG, and ISO/IEC MPEG has a standardized SVC, which is an extended version of H.264/AVC. SVC provides scalable functionality by parsing and extracting a partial bitstream to satisfy various terminal requirements and network conditions. However, most conventional video contents have a non-scalable format such as H.264/ AVC. Therefore, video transcoding from non-scalable H.264/AVC to SVC is advisable for reducing computations when transcoding without sacrificing R-D performance. Because of codec incompatibilities between H.264/ AVC and SVC, video format transformation must decode an original video into an intermediate format and re-encode it to SVC. While the decoding overhead is negligible, the high complexity of the encoding process still slows down the transcoding speed even when it is on a modern multicore processor. Such a delay in speed limits its applications.
Cascaded pixel-domain transcoder (CPDT) is a straightforward method for transcoding an existing format to another [5]. The visual quality of CPDT is optimal because the CPDT fully decoded bitstream of CPDT re-encodes it as a new one, resulting in a large computational complexity. The ability to reuse information of the incoming bitstream as much as possible can significantly reduce the computations of transcoding. However, H.264/AVC and SVC differ in coding structures, as illustrated in Figure 2, apparently making it impossible to directly reuse the H.264/AVC modes to those of SVCs'.
This study presents a fast algorithm for transcoding the coding format from H.264/AVC to SVC. First, H.264/AVC to SVC transcoding with the same QP is proposed. The proposed algorithm develops a mode probability model for coding format transcoding from H.264/AVC to SVC, based on the use of conditional probability, Bayesian theorem, and the Markov chain. Experimental results show that the proposed algorithm saves an average of 76.65% coding time with 0.1 dB PSNR loss over that when using a CPDT. In the second part, we discuss video transcoding from H.264/AVC to SVC with bitrate reduction. The residual DCT-domain MB energy obtained from H.264/AVC decoding process is used to find MB activity for the mode decision in SVC encoder [16]. The proposed algorithm saves an average of 59.4% coding time with 6.24% bitrate increase over that when using a CPDT.
The rest of this article is organized as follows. "Related study" section describes the previous aspects of video transcoding. "Proposed video transcoding from H.264/ AVC to SVC" section then introduces the proposed video transcoding algorithm. Next, "Experimental results" section evaluates the performance of the proposed method, based on the experimental results. Conclusions are finally drawn in "Conclusions" section.

Related study
Visual quality and coding time are of priority concern during the design phase of video transcoding. Previous work can be categorized into transcoding in the frequency domain and transcoding in the pixel domain. De Cock et al. [17] proposed a video transcoding scheme in the frequency domain from H.264/AVC to SVC in order to reduce the coding time and complexity. Although capable of avoiding the inverse transform process to save computations, frequency transcoding degrades video quality owing to the drift problem. The study of [18] developed a scheme to transcode a single layer H.264/ AVC bitstream into SNR scalable SVC bitstreams in CGS layer. To avoid the drift problem, this study uses a re-quantization error compensation method to prevent error propagation. However, visual quality of this method has an obvious gap compared to that of CPDT.
In pixel-domain transcoding, Garrido-Cantos et al. [19] developed a method for transcoding from H.264/ AVC to SVC in temporal scalability in order to reduce computational complexity. The decoded motion vectors of H.264/AVC construct a reduced search area to accelerate the motion estimation process in SVC. However, their scheme does not discuss the mode decision, which is computationally intensive.
Al-Muscati and Labeau [20]   To avoid drift problem and achieve the optimal rate distortion (RD) performance, the proposed video transcoding scheme is in the pixel domain to eliminate drift problem and emphasize the mode decision process. Therefore, the proposed architecture has a low computational complexity and satisfactory coding performance in terms of video transcoding.

Proposed video transcoding from H.264/AVC to SVC
This section introduces the proposed H.264/AVC to SVC video transcoder scheme, capable of maintaining the visual quality of transcoded videos and reducing the coding time of transcoding simultaneously.
Transcoding with the same QP Some candidate modes are selected from those of H.264/AVC incoming bitstream by using conditional probability. The conditional probability is statistical mode distribution, consisting of the SVC's mode distribution for a given mode distribution of H.264/AVC. Next, whether the current mode is the best one is determined based on Bayesian theorem. Finally, Markov chain uses transitional probability to predict the likelihood of another candidate mode. The training sets of conditional probability, Bayesian theorem detection, and Markov chain consist of Football, Flower, Foreman, Carphone, and Mobile with CIF format and each sequence contain 200 frames. The quantization parameter is set to 25 and 35 while considering both low and high bitrates.

Candidate mode selection through conditional probability
Reducing computational complexity depends on the ability to efficiently reuse the information of the incoming bitstream. Despite the inability to apply the modes of H.264/AVC to SVC directly, the H.264/AVC's modes provide hints on how to predict SVCs'. Therefore, based on an analysis of the mode distribution of these two standards, this study develops a conditional probability model to select candidate modes. Some useful candidate modes are determined by the highest conditional Analysis results verify the mode relations between H.264/AVC and SVC. Also, the mode extracted from the input bitstream is regarded as the candidate mode of SVC. Moreover, a linear map is found between the modes of H.264/AVC and SVC. The map can be represented as Equation (1).
Next, the current mode in SVC is determined based on the highest conditional probability. The mode can be shown mathematically as follows.
In Equation (2) ð3Þ In Equation (3), if the mode is not utilized in the current MB of H.264/AVC, mode i H.264/AVC is defined as 0; in contrary, mode i H.264/AVC is defined as 1, where i2{1,2,. . .,6}. Finally, the maximum conditional probability is obtained to determine the candidate mode of SVC.

Mode testing by Bayesian theorem detection
Among the neighboring MBs that correlate with each other in terms of motion vector, RD value, and mode, many works inherited this property and developed algorithms. As a simple formula, Bayesian theorem calculates conditional probability and can be used to compute posterior probabilities given some observations. Here, the mode status of the neighboring MBs (top and left) is observed with respect to the current MB. The mode of the current MB selected by conditional probability model is assumed here to be highly correlated with its neighboring MBs'. Therefore, a Bayesian theorem detection scheme is constructed by using the spatial correlation property. Accuracy of the candidate mode is then tested by using the Bayesian theorem detection scheme. Table 3 shows the probability distribution of the current MB's mode in SVC. Tables 4 and 5 show the conditional probability of the current MB's mode given the top MB's mode and left MB's mode, respectively. Whether an error prediction occurs in the current MB is determined based on the Bayesian theorem detection scheme, as shown in Equation (5), which applies the conditional probability and posterior probability.
where X denotes the top (or left) MB mode and Y represents the current MB mode. Also, X and Y 2 {Mode0-6}. Tables 6 and 7 show the posterior probability of the top MB's mode given the current MB's mode. A situation in which the candidate mode in SVC lacks the highest posterior probability, suggests that the candidate mode is probably not the optimal one. Next, the mode must be predicted by using the Markov chain.

Mode refinement by Markov chain
Markov chain is used to obtain the mode when the Bayesian theorem determines that the candidate mode may be incapable of encoding efficiently. As a discrete time process, Markov chain predicts a future state by past and present states. Past, present, and future states are independent of each other because Markov chain has the feature of stationary transition. Based on the feature of Markov chain, future states can be predicted by the current state. Markov chain can be represented by where a finite set of states as where k denotes the number of states. The Markov property states that the conditional probability distribution predicts the future state since its state depends only on the past and the current state of the system. The changes of state are called transitions, and the probabilities associated with various state-changes are called transition probabilities. The transition probability can be shown as where the index i, j2 {0, 1, . . ., k -1}. The set of all states and transition probabilities characterizes completely a Markov chain. These transition probabilities can be shown as a transition probability matrix. Notably, the sum of each row of transition matrix is equal to 1. It can be shown as Here, the mode in SVC can be regarded as a state in the Markov chain. Transition matrix consists of both vertical and horizontal correlations. Vertical transitional probability is the mode distribution of the current MB given the top MB's mode distribution. Horizontal transitional probability is the mode distribution of the current MB given the left MB's mode distribution. Tables 8 and  9 show the vertical transition and horizontal transition matrices, respectively.
A situation in which the candidate mode cannot pass Bayesian theorem suggests that this mode selected by conditional probability may be incapable of coding.
where t -1 and t denote the neighboring MB and current MB, respectively. The transition probability can be illustrated as follows.
where the index i, j2 {1, 2, . . ., 7}. Markov chain uses a certain state to predict the future state. This study attempts to accurately predict the current MB's mode by using the Markov chain. This mode is illustrated in Equation (12).
where T denotes a transition matrix which is the statistical results of mode distribution between the current MB and the left (or top) side of the current MB in SVC. Also, t ij denotes the element which is transitional probability of T. In Equation (12) Table 10 defines the correlation of X and Mode t,SVC . In conclusion, the candidate mode is first selected by the conditional probability model. Next, whether this candidate model is satisfactory is verified by using Bayesian theorem. If not, another mode for coding is predicted by using Markov chain. The proposed algorithm can avoid an exhaustive mode search without sacrificing the RD performance. Table 11 shows the statistic results of the mode distribution between H.264/AVC and SVC when transcoded from high to low bitrates. We observe that Mode0 and Mode1 have high probability to be chosen in SVC. In bitrate reduction transcoding, SVC encoder usually selects large block size such as Mode0 and Mode1 to be the optimal mode and large QP to make the residual smaller to meet the transcoding requirement.  According to MB activity, each MB in H.264/AVC can be classified into a complexity or homogenous region [16]. The MB's mode in SVC can be predicted through the information of MB activity obtained from H.264/ AVC decoding process. We joint consider the relation between MB activity and mode distribution between H.264/AVC and SVC to decide the modes in SVC to avoid exhaustively searching all modes in transcoding process.

Transcoding with bitrate reduction
To obtain MB activity, the average energy of MB should be calculated first. The MB energy (E MB ) is defined as the absolute DCT coefficient values summation of MB, as shown in Equation (14).
where coeff represents the DCT coefficient within each 4 × 4 sub-block. α is the number of 4 × 4 sub-blocks and β is the pixel position. Then, the average energy of frame is calculated for MB activity, as shown in Equation (15).
where N and M are frame size in the horizontal and vertical directions. u and v represent the position of DCT coefficient in the horizontal and vertical directions. Therefore, we use Equations (14) and (15) to build MB activity, as shown in Equation (16).

Experimental results
The proposed algorithm is implemented on JM 13.2 and JSVM 9.12. Each test benchmark contains 200 frames, with one I frame followed by 199 B or P frames. The group of picture is set to 16. The maximum search range is ±16 pixels and the number of reference frame is set to 1. Two different QPs, 25 and 35, are used in the experiments. The proposed method is compared with two methods, i.e., fully decode and fully encode (FDFE) and mode reusing (MR). Tables 12,13, and 14 show the comparisons of PSNR, bitrate, and coding time, respectively, with the same QP transcoding. Tables 15,16, and 17 show the comparisons of PSNR, bitrate, and coding time, respectively, with bitrate reduction transcoding. According to these tables, the proposed algorithm significantly reduces transcoding computations with negligible PSNR loss and bitrate increase when compared to FDFE. According to Table 12, compared to FDFE, PSNR of the proposed algorithm only has 0.1 dB degradation. However, in this case, the proposed algorithm saves up to 76.65% coding time. In

Conclusions
This study presents a video transcoder from H.264/AVC to SVC with temporal scalability in order to maintain transcoding visual quality and save coding time simultaneously. The proposed algorithm solves an important problem, mode prediction, when transcoding a single layer video into a multi-layer one under two different scenarios: transcoding with same QP and bitrate reduction. In the first part, three probability models are also developed to screen, test, and refine modes of an incoming video when transcoded with the same QP. In the second part, MB activity is efficiently measured for mode determination in SVC for bitrate reduction transcoding. Experimental results demonstrate that the proposed algorithm significantly reduces transcoding computations and only has negligible visual quality degradation. Therefore, video content that is encoded using single-layer H.264/AVC can benefit from the newly developed scalability features without too many transcoding efforts.