- Open Access
Region-of-interest determination and bit-rate conversion for H.264 video transcoding
© Huang et al.; licensee Springer. 2013
- Received: 30 October 2012
- Accepted: 3 May 2013
- Published: 29 May 2013
This paper presents a video bit-rate transcoder for baseline profile in H.264/AVC standard to fit the available channel bandwidth for the client when transmitting video bit-streams via communication channels. To maintain visual quality for low bit-rate video efficiently, this study analyzes the decoded information in the transcoder and proposes a Bayesian theorem-based region-of-interest (ROI) determination algorithm. In addition, a curve fitting scheme is employed to find the models of video bit-rate conversion. The transcoded video will conform to the target bit-rate by re-quantization according to our proposed models. After integrating the ROI detection method and the bit-rate transcoding models, the ROI-based transcoder allocates more coding bits to ROI regions and reduces the complexity of the re-encoding procedure for non-ROI regions. Hence, it not only keeps the coding quality but improves the efficiency of the video transcoding for low target bit-rates and makes the real-time transcoding more practical. Experimental results show that the proposed framework gets significantly better visual quality.
- Video transcoding
- Motion vector
In the applications of video processing, perceptual video coding  has obtained more and more attention due to human visual perception. Especially for the low bit-rate situation, region-of-interest (ROI)-based video coding can maintain better visual quality compared to the non-ROI video coding. According to the properties of the human visual system (HVS), the ROI areas are usually defined by skin color, contrast, video content, objects, and motion information. Liu et al.  extract moving objects in the compressed domain by considering global motion compensation, correspondence matrix, and temporal tracking. Mak and Cham  identify the background motion and then analyze the decoded motion data to complete the video object segmentation. Chi et al.  determine ROI regions through a visual rhythm feature as user attention model. Liu et al.  investigate the event detection in a compressed domain automatically. After motion trajectories are extracted, they use prediction residuals to improve the robustness of the proposed video activity detection strategy. De Bruyne et al.  estimate the reliable motion information to enhance the ROI detection based on the forward/backward projection of motion vectors (MVs) and similarity correlations. In addition, Poppe et al.  present a fast moving object detection algorithm that relies on the structure of the encoded bit-stream at the syntax level. The complicated entropy decoding to get motion information is unnecessary; thus, their method achieves high execution speeds for video surveillance applications.
Combined with a video transcoder and the idea of ROI, a compressed domain transcoder from MPEG-2 to H.264/AVC is proposed in  to reduce the computation complexity. The ROI areas are re-encoded by a closed loop transcoder for high video quality. The regions without motion vectors are defined as non-ROI and re-encoded by an open loop transcoder to speed up the transcoding process. In multipoint video conferencing, Li et al.  present a spatial adaptation ROI system to solve the limited display sizes. Kwon and Lee  propose an ROI transcoder by setting different quantization parameters (QPs) to the coded MBs via the subjective importance of different slice distributions. The center of the whole frame is the most important, and the more outer the center is, the less important it is. It can be described by a quadratic function. Although the research topic is similar to our goal, the ROI region is not precise enough in , since only the center region is defined as the ROI area. It is therefore worthy to pay more attention to the H.264/AVC ROI transcoding to reduce the complexity of the transcoder especially at the low bit-rate situation.
The rest of this paper is organized as follows. Proposed ROI determination method based on Bayesian theorem describes the details of our proposed ROI detection algorithm. Proposed re-quantization models analyze the features of bit-rate conversion for baseline profile in H.264/AVC and the re-quantization models for I frames and P frames, respectively. The frameworks of our proposed ROI-based bit-rate transcoder are described in the frameworks of the proposed H.264/AVC bit-rate transcoder. Simulation results show the experimental results, and the conclusions are made finally.
where P(ROI) means the prior probability of ROI before considering the auxiliary information; P(MI) and P(skin) indicate the marginal probabilities of motion intensity (MI) and skin color data, respectively. P(MI|ROI) and P(skin|ROI) represent the conditional probabilities of MI and skin color data based on ROI distribution, respectively; P(ROI|MI) and P(ROI|skin) represent the posterior probabilities of ROI given the auxiliary MI and skin color information, respectively.
Probability of motion intensity
Probability of skin color pixels for one MB
Number of skin color pixels (%)
0 to 51
52 to 102
103 to 153
154 to 204
205 to 256
3.1. Analysis of video bit-rate conversion
3.2. Proposed re-quantization models for I and P frames
The H.264/AVC encoder performs intra/interpredictions to get the residual and then transforms this data into frequency domain as transform coefficients. Besides, QP relates to quantization operation and also influences residual data. Therefore, we take the relationship of the bit-rate, the number of non-zero transform coefficients (NZTC), and the QP into consideration to find the bit-rate conversion models.
We test six sequences in CIF (352 × 288) resolution with various characteristics including Akiyo, Hall Monitor, News, Paris, Sign Irene, and Soccer. The total number of coding frames is 100, and the group of pictures (GOP) is 15 with 1 I picture followed by 14 P pictures. Curve fitting is utilized to model the relationship between bit-rate, NZTC and QP for I frame and P frame, respectively. The accuracy of the fitting result is evaluated via R 2, also known as the coefficient of determination in regression analysis.
3.2.1. I frame models
Coefficients of Equation 6 for the simulated sequences
Coefficients of Equation 7 for the simulated sequences
3.2.2. P frame models
Coefficients of Equation 8 for the simulated sequences
Coefficients of Equation 9 for the simulated sequences
4.1. Bit-rate transcoder
4.2. Training procedure for model parameters
In fact, the features of various video sequences are distinctive especially for the different applications. Accordingly, the training procedure of the input sequence for its model parameters is necessary before utilizing these bit-rate conversion models. We set I frame for the first coding frame and P frame for the second frame. QP data is set to 25, 30, 35, and 40 for pre-processing coding. A curve fitting scheme is employed to find model parameters. In spite of the training procedure making a little overhead, this promotes accurate bit-rate conversion models.
4.3. ROI-based video bit-rate transcoder
For low bit-rate transmission of video bit-streams, the general re-quantization method diminishes the visual quality of the entire frame significantly. To solve this problem, more coding bits should be allocated in the areas with more users' interest. Hence, an automatic ROI segmentation algorithm is needed. Combined with the ROI segmentation mentioned in section 2, we propose a ROI-based bit-rate transcoder to maintain video quality in ROI MBs.
Figure 9 illustrates the application of the proposed ROI determination method. It can be used in the ROI-based video transcoder to automatically extract ROI maps. The video quality in the ROI region can be better than that of the non-ROI regions for setting the different coding parameters in the encoder in Figure 9.
5.1. Setup of simulation
The proposed video bit-rate transcoder is implemented on the JM15.1  software. The computer used for our simulation has 3 GHz CPU and 3 GB RAM. The total coding frame number is set to 100 for each sequence, and the GOP is 15 with 1 I frame followed by 14 P frames. Pre-encoded bit-streams are encoded by 30 frames per second and 1 megabits per second (Mbps). The search range is set to +/−32, and the number of reference frames is set to one for motion estimation. The 8 × 8 transform size is disabled. We adopt the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM)  for the quality evaluation. The SSIM value used to measure the structural distortion is between −1 and 1, and it is a popular method owing to its high correlation with HVS.
5.2. Simulation results of ROI detection
5.3. Simulation results of the proposed ROI-based bit-rate transcoding
Bit-rate (Kbps) comparison
Proposed ROI-based transcoder
Total coding time ( T ) and time saving (TS) comparison
Proposed ROI-based transcoder
PSNR (dB) comparison for ROI areas
Proposed ROI-based transcoder
In this paper, a ROI-based video transcoder is proposed. Firstly, the Bayesian theorem is applied to automatically determine the ROI region for video transcoding. The region with moving foreground objects is segmented by decoding pre-encoded motion information from the input of the video transcoder. The skin color information is used to further find out the region of human attention. Second, a model-based video bit-rate transcoder is presented for baseline profile in H.264/AVC to meet the available communication bandwidth. We analyze the total coding bit-rate, NZTC, and QP data in the H.264/AVC encoder to get our proposed models and reallocate the bits of each macroblock in consideration of ROI for bit-rate conversion. By virtue of the estimated target QP by the proposed I frame and P frame models, the simulated results show that the transcoded bit-stream conforms to the target bit-rate via the re-quantization method. Furthermore, this paper also proposes a ROI-based video transcoder to maintain visual quality in the region of user's interest for low bit-rate transmission. More coding bits are allocated in ROI MBs, and full coding modes are employed to ensure visual quality. We utilize the SKIP and the decoded mode for motion estimation to reduce coding complexity on non-ROI MBs. Experimental results show that the proposed video bit-rate transcoder provides efficient video transcoding, accurate performance, and better video quality. In the future, the models of the total coding bit-rate, NZTC, and QP data for B frames will be investigated for the applications of high profile H.264/AVC video standard.
- Kwon SK, Tamhankar A, Rao KR: Overview of H.264/MPEG-f part 10. J. Visual Commun. Image Rep. 2006, 17: 186-216. 10.1016/j.jvcir.2005.05.010View ArticleGoogle Scholar
- Xin J, Lin CW, Sun MT: Digital video transcoding. Proceedings of the IEEE 2005, 93(1):84-97.View ArticleGoogle Scholar
- Yeh CH, Fan Jiang SJ, Chen TC, Chen MJ: Motion vector composition through Lagrangian optimization for arbitrary frame-size video transcoding. Optical Engineering 2012, 51(4):047401. 10.1117/1.OE.51.4.047401View ArticleGoogle Scholar
- Yeh CH, Tseng WY, Wu ST: Mode decision acceleration for H.264/AVC to SVC temporal video transcoding. EURASIP J. Adv. Signal Process 2012., 204:Google Scholar
- Xin J, Vetro A, Sun H, Su Y: Efficient MPEG-2 to H.264/AVC transcoding of intra-coded video. EURASIP J. Adv. Signal Process 2007, 2007: 075310. 10.1155/2007/75310View ArticleGoogle Scholar
- Liu X, Yoo KY: Fast interframe mode decision algorithm based on mode mapping and MB activity for MPEG-2 to H.264/AVC transcoding. J. Visual Commun. Image Rep 2010, 21: 155-166. 10.1016/j.jvcir.2009.05.002View ArticleGoogle Scholar
- Tang Q, Mansour H, Nasiopoulos P, Ward R: Bit-rate estimation for bit-rate reduction H.264/AVC video transcoding in wireless networks. In Proceedings of International Symposium on Wireless Pervasive Computing. Santorini; 7–9 May 2008:464-467.Google Scholar
- Yeh CH, Chen YH, Chi MC, Chen MJ: Parabolic motion-vector re-estimation algorithm for compressed video downscaling. J. Signal Process. Syst. 2010, 61(3):375-386. 10.1007/s11265-010-0455-zView ArticleGoogle Scholar
- Kapotas SK, Skodras AN: Bit rate transcoding of H.264 encoded movies by dropping frames in the compressed domain. IEEE Trans. Consum. Electron. 2010, 56(3):1593-1601.View ArticleGoogle Scholar
- Hsu CT, Yeh CH, Chen CY, Chen MJ: Arbitrary frame rate transcoding through temporal and spatial complexity. IEEE Trans. Broadcast. 2009, 55(4):767-775.View ArticleGoogle Scholar
- Yeh CH, Fan Jiang SJ, Lin CY, Chen MJ: Temporal video transcoding based on frame complexity analysis for mobile video communication. IEEE Trans. Broadcast 2013, 59(1):38-46.View ArticleGoogle Scholar
- Kwon SK, Kin SW, Lee JH, Lee JM: An adaptive transcoding method for H.264 video coding. IJCSNS 2008, 8(12):154-160.Google Scholar
- Chen Z, Lin W, Ngan KN: Perceptual video coding: challenges and approaches. Proceedings of the IEEE International Conference on Multimedia Expo, Suntec City, 19–23 Jul 2010 784-789.Google Scholar
- Liu Z, Lu Y, Zhang Z: Real-time spatiotemporal segmentation of video objects in the H.264 compressed domain. J. Visual Commun. Image Rep. 2007, 18: 275-290. 10.1016/j.jvcir.2007.02.002View ArticleGoogle Scholar
- Mak CM, Cham WK: Real-time video object segmentation in H.264 compressed domain. IET Image Processing. 2009, 3(5):272-285. 10.1049/iet-ipr.2008.0093View ArticleGoogle Scholar
- Chi MC, Yeh CH, Chen MJ: Robust region-of-interest determination based on user attention model through visual rhythm analysis. IEEE Trans. Circuits Syst. Video Technol 2009, 19(7):1025-1038.View ArticleGoogle Scholar
- Liu H, Sun MT, Wu RC, Yu SS: Automatic video activity detection using compressed domain motion trajectories for H.264 videos. J. Visual Commun. Image Rep. 2011, 22: 432-439. 10.1016/j.jvcir.2011.03.010View ArticleGoogle Scholar
- De Bruyne S, Poppe C, Verstockt S, Lambert P, Van de Walle R: Estimating motion reliability to improve moving object detection in the H.264/AVC domain. Proceedings of IEEE International Conference on Multimedia and Expo, New York, 28 Jun-3 July 2009 330-333.Google Scholar
- Poppe C, De Bruyne S, Paridaens T, Lambert P, Van de Walle R: Moving object detection in the H.264/AVC compressed domain for video surveillance applications. J. Visual Commun. Image Rep 2009, 20: 428-437. 10.1016/j.jvcir.2009.05.001View ArticleGoogle Scholar
- Xie R, Yu S: Region-of-interest based video transcoding from MPEG-2 to H.264 in the compressed domain. Optical Engineering 2008, 47(9):097001–-1-097001-7.MathSciNetView ArticleGoogle Scholar
- Li H, Wang Y, Chen CW: An attention-information based spatial adaptation framework for browsing videos via mobile devices. EURASIP J. Adv. Signal Process 2007, 2007: 1-12.Google Scholar
- Kwon SK, Lee HY: A transcoding method for improving the subjective picture quality. IJCSNS 2009, 9(1):135-138.Google Scholar
- Huang SF, Chen MJ, Li MS: Region-of-interest segmentation based on Bayesian theorem for H.264 video transcoding. In Proceedings of Visual Communications and Image Processing Conference. Tainan; 6–9 Nov 2011.Google Scholar
- Chi MC, Chen MJ, Yeh CH, Jhu JA: Region-of-interest video coding based on rate and distortion variations for H.263+. Signal Process: Image Commun. 2008, 23(2):127-142. 10.1016/j.image.2007.12.001Google Scholar
- Joint Video Team software JM http://iphome.hhi.de/suehring/tml/download/
- Wang Z, Bovik AC, Sheikh HR, Simoncelli EP: Image quality assessment: form error visibility to structural similarity. IEEE Trans. Image Processing 2004, 13(4):600-612. 10.1109/TIP.2003.819861View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.