Skip to main content

Mode decision acceleration for H.264/AVC to SVC temporal video transcoding

Abstract

This study presents a fast video transcoding architecture that overcomes the complexity of different coding structures between H.264/AVC and SVC. The proposed algorithms simplify the mode decision process in SVC owing to its heavy computations. Two scenarios namely transcoding with the same quantization parameter and bitrate reduction are considered. In the first scenario, SVC’s modes are determined by the probability models, including conditional probability, Bayesian theorem, and Markov chain. The second scenario measures MB activity to determine SVC’s modes. Experimental results indicate that our algorithm saves significant coding time with negligible PSNR loss over that when using a cascaded pixel-domain transcoder.

Introduction

Among the many multimedia services that offer universal multimedia access on heterogeneous networks include videoconferencing, distance learning, and video on demand [14]. Such applications require a variety of devices, access links, and resources. In particular, video transcoding enables a pre-coded video to satisfy the constraints of transmission networks or specific applications [515], as shown in Figure 1.

Figure 1
figure 1

Architecture of video transcoding.

The Joint Video Team consists of ITU-T VCEG, and ISO/IEC MPEG has a standardized SVC, which is an extended version of H.264/AVC. SVC provides scalable functionality by parsing and extracting a partial bitstream to satisfy various terminal requirements and network conditions. However, most conventional video contents have a non-scalable format such as H.264/AVC. Therefore, video transcoding from non-scalable H.264/AVC to SVC is advisable for reducing computations when transcoding without sacrificing R-D performance. Because of codec incompatibilities between H.264/AVC and SVC, video format transformation must decode an original video into an intermediate format and re-encode it to SVC. While the decoding overhead is negligible, the high complexity of the encoding process still slows down the transcoding speed even when it is on a modern multicore processor. Such a delay in speed limits its applications.

Cascaded pixel-domain transcoder (CPDT) is a straightforward method for transcoding an existing format to another [5]. The visual quality of CPDT is optimal because the CPDT fully decoded bitstream of CPDT re-encodes it as a new one, resulting in a large computational complexity. The ability to reuse information of the incoming bitstream as much as possible can significantly reduce the computations of transcoding. However, H.264/AVC and SVC differ in coding structures, as illustrated in Figure 2, apparently making it impossible to directly reuse the H.264/AVC modes to those of SVCs’.

Figure 2
figure 2

Different coding structure: (a) H.264/AVC; (b) SVC.

This study presents a fast algorithm for transcoding the coding format from H.264/AVC to SVC. First, H.264/AVC to SVC transcoding with the same QP is proposed. The proposed algorithm develops a mode probability model for coding format transcoding from H.264/AVC to SVC, based on the use of conditional probability, Bayesian theorem, and the Markov chain. Experimental results show that the proposed algorithm saves an average of 76.65% coding time with 0.1 dB PSNR loss over that when using a CPDT. In the second part, we discuss video transcoding from H.264/AVC to SVC with bitrate reduction. The residual DCT-domain MB energy obtained from H.264/AVC decoding process is used to find MB activity for the mode decision in SVC encoder [16]. The proposed algorithm saves an average of 59.4% coding time with 6.24% bitrate increase over that when using a CPDT.

The rest of this article is organized as follows. “Related study” section describes the previous aspects of video transcoding. “Proposed video transcoding from H.264/AVC to SVC” section then introduces the proposed video transcoding algorithm. Next, “Experimental results” section evaluates the performance of the proposed method, based on the experimental results. Conclusions are finally drawn in “Conclusions” section.

Related study

Visual quality and coding time are of priority concern during the design phase of video transcoding. Previous work can be categorized into transcoding in the frequency domain and transcoding in the pixel domain. De Cock et al. [17] proposed a video transcoding scheme in the frequency domain from H.264/AVC to SVC in order to reduce the coding time and complexity. Although capable of avoiding the inverse transform process to save computations, frequency transcoding degrades video quality owing to the drift problem. The study of [18] developed a scheme to transcode a single layer H.264/AVC bitstream into SNR scalable SVC bitstreams in CGS layer. To avoid the drift problem, this study uses a re-quantization error compensation method to prevent error propagation. However, visual quality of this method has an obvious gap compared to that of CPDT.

In pixel-domain transcoding, Garrido-Cantos et al. [19] developed a method for transcoding from H.264/AVC to SVC in temporal scalability in order to reduce computational complexity. The decoded motion vectors of H.264/AVC construct a reduced search area to accelerate the motion estimation process in SVC. However, their scheme does not discuss the mode decision, which is computationally intensive.

Al-Muscati and Labeau [20] also developed a video transcoding approach from H.264/AVC to SVC in temporal scalability. Extracted from H.264/AVC bitstream, the motion vectors are used to map either the hierarchical B frame or zero-delay referencing structures in SVC; in addition, H.264/AVC’s modes are directly reused. Reusing coding modes is an inefficient approach owing to different coding structures between H.264/AVC and SVC, subsequently degrading the coding performance significantly.

To avoid drift problem and achieve the optimal rate distortion (RD) performance, the proposed video transcoding scheme is in the pixel domain to eliminate drift problem and emphasize the mode decision process. Therefore, the proposed architecture has a low computational complexity and satisfactory coding performance in terms of video transcoding.

Proposed video transcoding from H.264/AVC to SVC

This section introduces the proposed H.264/AVC to SVC video transcoder scheme, capable of maintaining the visual quality of transcoded videos and reducing the coding time of transcoding simultaneously.

Transcoding with the same QP

Some candidate modes are selected from those of H.264/AVC incoming bitstream by using conditional probability. The conditional probability is statistical mode distribution, consisting of the SVC’s mode distribution for a given mode distribution of H.264/AVC. Next, whether the current mode is the best one is determined based on Bayesian theorem. Finally, Markov chain uses transitional probability to predict the likelihood of another candidate mode. The training sets of conditional probability, Bayesian theorem detection, and Markov chain consist of Football, Flower, Foreman, Carphone, and Mobile with CIF format and each sequence contain 200 frames. The quantization parameter is set to 25 and 35 while considering both low and high bitrates.

Candidate mode selection through conditional probability

Reducing computational complexity depends on the ability to efficiently reuse the information of the incoming bitstream. Despite the inability to apply the modes of H.264/AVC to SVC directly, the H.264/AVC’s modes provide hints on how to predict SVCs’. Therefore, based on an analysis of the mode distribution of these two standards, this study develops a conditional probability model to select candidate modes. Some useful candidate modes are determined by the highest conditional probability of the SVC’s mode given the H.264/AVC’s mode. Here, Mode0 to Mode6 represent Skip mode, Inter16 × 16, Inter16 × 8, Inter8 × 16, Inter8 × 8, Intra16 × 16, and Intra4 × 4. Table 1 summarizes the statistical results of the mode distribution between H.264/AVC and SVC.

Table 1 Conditional probability of the SVC’s mode distribution

Analysis results verify the mode relations between H.264/AVC and SVC. Also, the mode extracted from the input bitstream is regarded as the candidate mode of SVC. Moreover, a linear map is found between the modes of H.264/AVC and SVC. The map can be represented as Equation (1).

f : M o d e H.264/AVC M o d e SVC .
(1)

Next, the current mode in SVC is determined based on the highest conditional probability. The mode can be shown mathematically as follows.

V = M o d e H.264/AVC · T = v 0 v 1 v 6
(2)

In Equation (2), T denotes a 7 × 7 transition matrix which consists of statistical results of mode distribution in H.264/AVC and SVC. M o d e H.264/AVC can be represented as

M o d e H.264/AVC = m o d e 0 H .264 / A V C m o d e 1 H .264 / A V C m o d e 6 H .264 / A V C T .
(3)

In Equation (3), if the mode is not utilized in the current MB of H.264/AVC, modeiH.264/AVC is defined as 0; in contrary, modeiH.264/AVC is defined as 1, where i{1,2,…,6}. Finally, the maximum conditional probability is obtained to determine the candidate mode of SVC.

X = arg m a x i 0 , 1 , , 6 v i .
(4)

Table 2 defines the correlation between X and ModeSVC in Equation (4). Equations (1)–(4) are used to obtain the highest probability of the SVC’s mode given the H.264/AVC’s mode to be the SVC’s candidate mode.

Table 2 The correlation between X and Mode SVC

Mode testing by Bayesian theorem detection

Among the neighboring MBs that correlate with each other in terms of motion vector, RD value, and mode, many works inherited this property and developed algorithms. As a simple formula, Bayesian theorem calculates conditional probability and can be used to compute posterior probabilities given some observations. Here, the mode status of the neighboring MBs (top and left) is observed with respect to the current MB. The mode of the current MB selected by conditional probability model is assumed here to be highly correlated with its neighboring MBs’. Therefore, a Bayesian theorem detection scheme is constructed by using the spatial correlation property. Accuracy of the candidate mode is then tested by using the Bayesian theorem detection scheme. Table 3 shows the probability distribution of the current MB’s mode in SVC. Tables 4 and 5 show the conditional probability of the current MB’s mode given the top MB’s mode and left MB’s mode, respectively. Whether an error prediction occurs in the current MB is determined based on the Bayesian theorem detection scheme, as shown in Equation (5), which applies the conditional probability and posterior probability.

P X = x | Y = y = P Y = y | X = x · P X = x P Y = y ,
(5)

where X denotes the top (or left) MB mode and Y represents the current MB mode. Also, X and Y {Mode0–6}. Tables 6 and 7 show the posterior probability of the top MB’s mode given the current MB’s mode. A situation in which the candidate mode in SVC lacks the highest posterior probability, suggests that the candidate mode is probably not the optimal one. Next, the mode must be predicted by using the Markov chain.

Table 3 Mode distribution in SVC
Table 4 Conditional probability of the current MB’s mode
Table 5 Conditional probability of the current MB’s mode
Table 6 Posterior probability of the top MB’s mode
Table 7 Posterior probability of the left MB’s mode

Mode refinement by Markov chain

Markov chain is used to obtain the mode when the Bayesian theorem determines that the candidate mode may be incapable of encoding efficiently. As a discrete time process, Markov chain predicts a future state by past and present states. Past, present, and future states are independent of each other because Markov chain has the feature of stationary transition. Based on the feature of Markov chain, future states can be predicted by the current state. Markov chain can be represented by

P S n+ 1 = s n | S 1 = s 1 , S 2 = s 2 , , S n = s n = P S n+ 1 = s n+ 1 | S n = s n ,
(6)

where a finite set of states as

S s 0 , s 1 , , s k 1 ,
(7)

where k denotes the number of states. The Markov property states that the conditional probability distribution predicts the future state since its state depends only on the past and the current state of the system. The changes of state are called transitions, and the probabilities associated with various state-changes are called transition probabilities. The transition probability can be shown as

t i , j = P S n + 1 = s j | S n = s i ,
(8)

where the index i, j {0, 1, …, k – 1}. The set of all states and transition probabilities characterizes completely a Markov chain. These transition probabilities can be shown as a transition probability matrix. Notably, the sum of each row of transition matrix is equal to 1. It can be shown as

j = 0 k 1 t i , j = 1 .
(9)

Here, the mode in SVC can be regarded as a state in the Markov chain. Transition matrix consists of both vertical and horizontal correlations. Vertical transitional probability is the mode distribution of the current MB given the top MB’s mode distribution. Horizontal transitional probability is the mode distribution of the current MB given the left MB’s mode distribution. Tables 8 and 9 show the vertical transition and horizontal transition matrices, respectively.

Table 8 Vertical transition probability matrixes
Table 9 Horizontal transition probability matrixes

A situation in which the candidate mode cannot pass Bayesian theorem suggests that this mode selected by conditional probability may be incapable of coding. Another mode for improving coding performance is predicted using the Markov chain. The mode of the current MB is predicted from the left (or top) side MB by transition matrix. It can be represented as (10)

f : M o d e t 1 , S V C M o d e t , S V C ,
(10)

where t – 1 and t denote the neighboring MB and current MB, respectively. The transition probability can be illustrated as follows.

t i , j = P S n = s n | S n - 1 = s n - 1
(11)

where the index i, j {1, 2, …, 7}. Markov chain uses a certain state to predict the future state. This study attempts to accurately predict the current MB’s mode by using the Markov chain. This mode is illustrated in Equation (12).

V = M o d e t-1, SVC · T = m o d e 0 t 1 , S V C m o d e 1 t 1 , S V C m o d e 6 t 1 , S V C T · t 00 t 01 t 06 t 10 t 11 t 16 t 60 t 61 t 66 = v 0 v 1 v 6
(12)

where T denotes a transition matrix which is the statistical results of mode distribution between the current MB and the left (or top) side of the current MB in SVC. Also, t ij denotes the element which is transitional probability of T. In Equation (12), if the mode is not utilized in the left (or top) side of the current MB in SVC, mode i H.264/AVCis defined as 0. In contrast, if the mode is utilized in the left (or top) side of the current MB in SVC, mode i H.264/AVCis defined as 1, where i {1, 2,…, 6}. The highest value is then obtained. This value can be shown as follow

X = arg m a x i 0 , 1 , , 6 v i .
(13)

Table 10 defines the correlation of X and Modet,SVC. In conclusion, the candidate mode is first selected by the conditional probability model. Next, whether this candidate model is satisfactory is verified by using Bayesian theorem. If not, another mode for coding is predicted by using Markov chain. The proposed algorithm can avoid an exhaustive mode search without sacrificing the RD performance.

Table 10 The correlation between X and Mode t,SVC

Transcoding with bitrate reduction

Table 11 shows the statistic results of the mode distribution between H.264/AVC and SVC when transcoded from high to low bitrates. We observe that Mode0 and Mode1 have high probability to be chosen in SVC. In bitrate reduction transcoding, SVC encoder usually selects large block size such as Mode0 and Mode1 to be the optimal mode and large QP to make the residual smaller to meet the transcoding requirement.

Table 11 Statistics of mode distribution

According to MB activity, each MB in H.264/AVC can be classified into a complexity or homogenous region [16]. The MB’s mode in SVC can be predicted through the information of MB activity obtained from H.264/AVC decoding process. We joint consider the relation between MB activity and mode distribution between H.264/AVC and SVC to decide the modes in SVC to avoid exhaustively searching all modes in transcoding process.

To obtain MB activity, the average energy of MB should be calculated first. The MB energy (EMB) is defined as the absolute DCT coefficient values summation of MB, as shown in Equation (14).

E MB = 1 256 α = 0 15 β = 0 15 c o e f f α β ,
(14)

where coeff represents the DCT coefficient within each 4 × 4 sub-block. α is the number of 4 × 4 sub-blocks and β is the pixel position. Then, the average energy of frame is calculated for MB activity, as shown in Equation (15).

E frame = 1 N × M u = 0 N 1 v = 0 M 1 c o e f f u v ,
(15)

where N and M are frame size in the horizontal and vertical directions. u and v represent the position of DCT coefficient in the horizontal and vertical directions. Therefore, we use Equations (14) and (15) to build MB activity, as shown in Equation (16).

M B Activity = { 1 i f E MB / E frame 1 0 e l s e .
(16)

After obtaining the MB activity, we summarize the proposed algorithm of H.264/AVC to SVC bitrate reduction transcoding as follows. When H.264/AVC’s MB is Mode0 or Mode1 in H.264/AVC, the Mode0 and Mode1 are selected as the candidate modes in SVC. If MB activity is equal to 0 and H.264/AVC’s MB is Mode2, Mode3, or Mode4, it means that the current MB belongs to homogeneous region.

Therefore, Mode0 and Mode1 are determined to be candidate modes. On the other hand, if MB activity is equal to 1, it means the current MB is complexity region. Hence, Mode2, Mode3, and Mode4 should be added as the candidate modes in SVC. When MB activity is equal to 1 and H.264/AVC’s MB is Mode5 or Mode6, the Mode0, Mode1, and Mode5 are chosen as the candidate modes. If MB activity is large, small block size, Mode6, is selected as the candidate modes.

Experimental results

The proposed algorithm is implemented on JM 13.2 and JSVM 9.12. Each test benchmark contains 200 frames, with one I frame followed by 199 B or P frames. The group of picture is set to 16. The maximum search range is ±16 pixels and the number of reference frame is set to 1. Two different QPs, 25 and 35, are used in the experiments. The proposed method is compared with two methods, i.e., fully decode and fully encode (FDFE) and mode reusing (MR).

Tables 12, 13, and 14 show the comparisons of PSNR, bitrate, and coding time, respectively, with the same QP transcoding. Tables 15, 16, and 17 show the comparisons of PSNR, bitrate, and coding time, respectively, with bitrate reduction transcoding. According to these tables, the proposed algorithm significantly reduces transcoding computations with negligible PSNR loss and bitrate increase when compared to FDFE. According to Table 12, compared to FDFE, PSNR of the proposed algorithm only has 0.1 dB degradation. However, in this case, the proposed algorithm saves up to 76.65% coding time. In bitrate reduction transcoding, Table 17 indicates that the proposed algorithm saves 59.4% coding time with only 6.24% bitrate increase and outperforms MR by 13.42% under the similar PSNR. Experimental results demonstrate that the proposed scheme has a satisfactory RD performance and low computational complexity for both low and high motion sequences.

Table 12 PSNR (dB) comparisons of CIF format benchmarks with the same QP transcoding
Table 13 Bitrate (kbps) comparisons of CIF format benchmarks with the same QP transcoding
Table 14 Coding time (s) comparisons of CIF format benchmarks with the same QP transcoding
Table 15 PSNR (dB) Comparisons of CIF format benchmarks with bitrate reduction transcoding
Table 16 Bitrate (kbps) comparisons of CIF format benchmarks with bitrate reduction transcoding
Table 17 Coding time (s) comparisons of CIF format benchmarks with bitrate reduction transcoding

Conclusions

This study presents a video transcoder from H.264/AVC to SVC with temporal scalability in order to maintain transcoding visual quality and save coding time simultaneously. The proposed algorithm solves an important problem, mode prediction, when transcoding a single layer video into a multi-layer one under two different scenarios: transcoding with same QP and bitrate reduction. In the first part, three probability models are also developed to screen, test, and refine modes of an incoming video when transcoded with the same QP. In the second part, MB activity is efficiently measured for mode determination in SVC for bitrate reduction transcoding. Experimental results demonstrate that the proposed algorithm significantly reduces transcoding computations and only has negligible visual quality degradation. Therefore, video content that is encoded using single-layer H.264/AVC can benefit from the newly developed scalability features without too many transcoding efforts.

References

  1. Thekalp AM: Digital Video Processing. Prentice Hall PTR, New Jersey; 1995.

    Google Scholar 

  2. Sun MT, Reibman AR: Compressed Video over Networks. Marcel Dekker, New Work; 2001.

    Google Scholar 

  3. Wang Y, Ostermann J, Zhang YQ: Video Processing and Communications. Prentice Hall, New Jersey; 2002.

    Google Scholar 

  4. Ngan KN, Yap CW, Tan KT: Video Coding for Wireless Communications. Prentice Hall, New Jersey; 2002.

    Google Scholar 

  5. Xin J, Lin CW, Sun MT: Digital video transcoding. Proc IEEE 2005, 93(1):84-97.

    Article  Google Scholar 

  6. Chang SF, Vetro A: Video adaptation: concepts, technologies, and open issues. Proc IEEE 2005, 93(1):148-158.

    Article  Google Scholar 

  7. Vetro A, Christopoulos C, Huifang S: Video transcoding architectures and techniques: an overview. IEEE Signal Process Mag 2003, 20(2):18-29. 10.1109/MSP.2003.1184336

    Article  Google Scholar 

  8. Ahmad I, Wei X, Sun Y, Zhang Y-Q: Video transcoding: an overview of various techniques and research issues. IEEE Trans Multimed 2005, 7(5):793-804.

    Article  Google Scholar 

  9. Nguyen VA, Tan YP: Efficient video transcoding from H.263 to H.264/AVC standard with enhanced rate control. EURASIP J. Appl. Signal Process 2006, 2006(83563):1-15.

    Google Scholar 

  10. Xin J, Vetro A, Sun H, Su Y: Efficient MPEG-2 to H.264/AVC transcoding of intra-coded video. EURASIP J. Appl. Signal Process 2007, 2007(75310):1-12.

    Google Scholar 

  11. Eminsoy S, Dogan S, Kondoz AM: Transcoding-based error-resilient video adaptation for 3 G wireless networks. EURASIP J Appl Signal Process 2007, 2007(39586):1-13.

    Article  Google Scholar 

  12. Lin CW, Tan YP, Vetro A, Kot A, Sun MT: Video adaptation for heterogeneous environments. EURASIP J Appl Signal Process 2007, 2007(18578):1-4.

    Google Scholar 

  13. Li H, Wang Y, Chen CW: An attention-information-based spatial adaptation framework for browsing videos via mobile devices. EURASIP J Appl Signal Process 2007, 2007(25415):1-12.

    Google Scholar 

  14. Tsai TH, Lin YF, Lin HY: Video transcoder in DCT-domain spatial resolution reduction using low-complexity motion vector refinement algorithm. EURASIP J Appl Signal Process 2007, 2007(467290):1-15.

    Google Scholar 

  15. Corrales-Garcia A, Martinez JL, Fernandez-Escribano F, Villalon JM, Kalva H, Cuenca P: Wyner-Ziv to baseline H.264 video transcoder. EURASIP J Appl Signal Process 2012, 135:(2012.

    Google Scholar 

  16. Liu X, Zhu W, Yoo K-Y: Fast inter mode decision algorithm based on the MB activity for MPEG-2 to H.264/AVC transcoding. 2nd edition. Proceedings of International Conference on Computational Science and Engineering, Vancouver, BC; 2009:25-30.

    Google Scholar 

  17. De Cock J, Notebaert S, Lambert P, Van de Walle R: Architectures for fast transcoding of H.264/AVC to quality-scalable SVC streams. IEEE Trans Multimed 2009, 11(7):1209-1224.

    Article  Google Scholar 

  18. De Cock J, Notebaert S, Van de Walle R: Transcoding from H.264/AVC to SVC with CGS layers. 4th edition. Proceedings of IEEE International Conference on Image Processing, San Antonio, TX; 2007:73-76.

    Google Scholar 

  19. Garrido-Cantos R, De Cock J, Martinez JL, Van Leuven S, Cuenca P: Motion-based temporal transcoding from H.264/AVC-to-SVC in baseline profile. IEEE Trans Consum Electron 2011, 57(1):239-246.

    Article  Google Scholar 

  20. Al-Muscati H, Labeau F: Temporal transcoding of H.264/AVC video to the scalable format. Proceedings of 2nd International Conference on Image Processing Theory Tools and Applications, Paris; 2010:138-143.

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Science Council, R.O.C, under the Grant NSC 99-2628-E-110-008-MY3 and NSC101-2221-E-110-093-MY2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chia-Hung Yeh.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Yeh, CH., Tseng, WY. & Wu, ST. Mode decision acceleration for H.264/AVC to SVC temporal video transcoding. EURASIP J. Adv. Signal Process. 2012, 204 (2012). https://doi.org/10.1186/1687-6180-2012-204

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-6180-2012-204

Keywords