Skip to main content

Content-Aware Scalability-Type Selection for Rate Adaptation of Scalable Video


Scalable video coders provide different scaling options, such as temporal, spatial, and SNR scalabilities, where rate reduction by discarding enhancement layers of different scalability-type results in different kinds and/or levels of visual distortion depend on the content and bitrate. This dependency between scalability type, video content, and bitrate is not well investigated in the literature. To this effect, we first propose an objective function that quantifies flatness, blockiness, blurriness, and temporal jerkiness artifacts caused by rate reduction by spatial size, frame rate, and quantization parameter scaling. Next, the weights of this objective function are determined for different content (shot) types and different bitrates using a training procedure with subjective evaluation. Finally, a method is proposed for choosing the best scaling type for each temporal segment that results in minimum visual distortion according to this objective function given the content type of temporal segments. Two subjective tests have been performed to validate the proposed procedure for content-aware selection of the best scalability type on soccer videos. Soccer videos scaled from 600 kbps to 100 kbps by the proposed content-aware selection of scalability type have been found visually superior to those that are scaled using a single scalability option over the whole sequence.


  1. 1.

    Ohm J-R: Advances in scalable video coding. Proceedings of the IEEE 2005,93(1):42-56.

    Article  Google Scholar 

  2. 2.

    Reichel J, Schwarz H, Wien M: Scalable video coding - Working Draft 1. Joint Video Team (JVT), Doc. JVTN020, Hong Kong, January 2005

    Google Scholar 

  3. 3.

    Puri A, Chen X, Luthra A: Video coding using the H.264/MPEG-4 AVC compression standard. Signal Processing: Image Communication 2004,19(9):793-849. 10.1016/j.image.2004.06.003

    Google Scholar 

  4. 4.

    Kumar Rajendran R, van der Schaar M, Chang SF: FGS+: optimizing the joint spatio temporal video quality in MPEG-4 fine grained scalable coding. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '02), May 2002, Phoenix, Ariz, USA

    Google Scholar 

  5. 5.

    Kuhmünch C, Kühne G, Schremmer C, Haenselmann T: Video-scaling algorithm based on human perception for spatio-temporal stimuli. In Multimedia Computing and Networking (MMCN '01), January 2001, San Jose, Calif, USA, Proceedings of SPIE. Volume 4312. SPIE Press; 13–24.

    Google Scholar 

  6. 6.

    Wang Y, van der Schaar M, Chang S-F, Loui AC: Classification-based multidimensional adaptation prediction for scalable video coding using subjective quality evaluation. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(10):1270-1279.

    Article  Google Scholar 

  7. 7.

    Hung B-F, Huang C-L: Content-based FGS coding mode determination for video streaming over wireless networks. IEEE Journal on Selected Areas in Communications 2003,21(10):1595-1603. 10.1109/JSAC.2003.815229

    Article  Google Scholar 

  8. 8.

    Wolf S, Pinson MH: Spatial-temporal distortion metrics for in-service quality monitoring of any digital video system. Proceedings of the Multimedia Systems and Applications II, September 1999, Boston, Mass, USA, Proceedings of SPIE 3845: 266–277.

    Article  Google Scholar 

  9. 9.

    Reed EC, Lim JS: Optimal multidimensional bit-rate control for video communication. IEEE Transactions on Image Processing 2002,11(8):873-885. 10.1109/TIP.2002.801122

    Article  Google Scholar 

  10. 10.

    Vetro A, Wang Y, Sun H: Rate-distortion optimized video coding considering frameskip. Proceedings of IEEE International Conference on Image Processing (ICIP '01), October 2001, Thessaloniki, Greece 3: 534–537.

    Google Scholar 

  11. 11.

    Wang Y, Kim J-G, Chang S-F: Content-based utility function prediction for real-time MPEG-4 video transcoding. Proceedings of IEEE International Conference on Image Processing (ICIP '03), September 2003, Barcelona, Spain 1: 189–192.

    Google Scholar 

  12. 12.

    Yin P, Vetro A, Xia M, Liu B: Rate-distortion models for video transcoding. Image and Video Communications and Processing, January 2003, Santa Clara, Calif, USA, Proceedings of SPIE 5022: 479–488.

    Google Scholar 

  13. 13.

    Girod B: What's wrong with mean-squared error. In Digital Images and Human Vision. Edited by: Watson AB. MIT Press, Cambridge, Mass, USA; 1993:207-220.

    Google Scholar 

  14. 14.

    Winkler S, Lambrecht CJB, Kunt M: Vision and video: models and applications. In Vision Models and Applications to Image and Video Processing. Edited by: Lambrecht CJB. Kluwer Academic Publishers, Dordrecht, The Netherlands; 2001. chapter 10

    Google Scholar 

  15. 15.

    Webster AA, Jones CT, Pinson MH, Voran SD, Wolf S: Objective video quality assessment system based on human perception. Human Vision, Visual Processing, and Digital Display IV, February 1993, San Jose, Calif, USA, Proceedings of SPIE 1913: 15–26.

    Article  Google Scholar 

  16. 16.

    Tan KT, Ghanbari M: A multi-metric objective picture-quality measurement model for MPEG video. IEEE Transactions on Circuits and Systems for Video Technology 2000,10(7):1208-1213. 10.1109/76.875525

    Article  Google Scholar 

  17. 17.

    Wang Y, Liu Z, Huang J-C: Multimedia content analysis-using both audio and visual clues. IEEE Signal Processing Magazine 2000,17(6):12-36. 10.1109/79.888862

    Article  Google Scholar 

  18. 18.

    Akyol E, Tekalp AM, Civanlar MR: Optimum scaling operator selection in scalable video coding. Picture Coding Symposium, December 2004, San Francisco, Calif, USA 477–482.

    Google Scholar 

  19. 19.

    Ekin A, Tekalp AM, Mehrotra R: Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing 2003,12(7):796-807. 10.1109/TIP.2003.812758

    Article  Google Scholar 

  20. 20.

    Kokaram A, Rea N, Dahyot R, et al.: Browsing sports video: trends in sports-related indexing and retrieval work. IEEE Signal Processing Magazine 2006,23(2):47-58.

    Article  Google Scholar 

  21. 21.

    Snoek CGM, Worring M: Multimodal video indexing: a review of the state-of-the-art. Multimedia Tools and Applications 2005,25(1):5-35.

    Article  Google Scholar 

  22. 22.

    Chang S-F, Bocheck P: Principles and applications of content-aware video communication. Proceedings of the IEEE Internaitonal Symposium on Circuits and Systems (ISCAS '00), May 2000, Geneva, Switzerland 4: 33–36.

    Google Scholar 

  23. 23.

    Yuen M, Wu HR: A survey of hybrid MC/DPCM/DCT video coding distortions. Signal Processing 1998,70(3):247-278. 10.1016/S0165-1684(98)00128-5

    MATH  Article  Google Scholar 

  24. 24.

    Marziliano P, Dufaux F, Winkler S, Ebrahimi T: Perceptual blur and ringing metrics: application to JPEG2000. Signal Processing: Image Communication 2004,19(2):163-172. 10.1016/j.image.2003.08.003

    Google Scholar 

  25. 25.

    Shapiro L, Stockman G: Computer Vision. Prentice-Hall, Upper Saddle River, NJ, USA; 2000.

    Google Scholar 

  26. 26.

    Pan F, Lin X, Rahardja S, et al.: A locally adaptive algorithm for measuring blocking artifacts in images and videos. Signal Processing: Image Communication 2004,19(6):499-506. 10.1016/j.image.2004.04.001

    Google Scholar 

  27. 27.

    Frajka T, Zeger K: Downsampling dependent upsampling of images. Signal Processing: Image Communication 2004,19(3):257-265. 10.1016/j.image.2003.10.003

    Google Scholar 

  28. 28.

    Tekalp AM: Digital Video Processing. Prentice-Hall, Upper Saddle River, NJ, USA; 1995.

    Google Scholar 

  29. 29.

    Hekstra AP, Beerends JG, Ledermann D, et al.: PVQM—a perceptual video quality measure. Signal Processing: Image Communication 2002,17(10):781-798. 10.1016/S0923-5965(02)00056-5

    Google Scholar 

  30. 30.

    Xu J, Xiong R, Feng B, et al.: 3D sub-band video coding using barbell lifting. ISO/IEC JTC/WG11 M10569, S05

  31. 31.

    Luo L, Wu F, Li S, Xiong Z, Zhuang Z: Advanced motion threading for 3D wavelet video coding. Signal Processing: Image Communication 2004,19(7):601-616. special issue on Subband/Wavelet Interframe Video Coding 10.1016/j.image.2004.05.004

    Google Scholar 

  32. 32.

    Xu J, Xiong Z, Li S, Zhang Y-Q: Three-dimensional embedded subband coding with optimized truncation (3-D ESCOT). Applied and Computational Harmonic Analysis 2001,10(3):290-315. 10.1006/acha.2000.0345

    MathSciNet  MATH  Article  Google Scholar 

  33. 33.

    Methodology for the subjective assessment of the quality of television pictures In Recommendation ITU-R BT.500-10. ITU Telecommunication Standardization Sector, Geneva, Switzerland; 2000.

  34. 34.

    Devore J: Probability and Statistics for Engineering and the Sciences. Duxbury Press, Pacific Grove, Calif, USA; 1999.

    Google Scholar 

  35. 35.

    VQM software

  36. 36.

    Gulliver SR, Ghinea G: Changing frame rate, changing satisfaction? [Multimedia quality of perception]. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '04), June 2004, Taipei, Taiwan 1: 177–180.

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Emrah Akyol.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

Akyol, E., Tekalp, A.M. & Civanlar, M.R. Content-Aware Scalability-Type Selection for Rate Adaptation of Scalable Video. EURASIP J. Adv. Signal Process. 2007, 010236 (2007).

Download citation


  • Objective Function
  • Rate Reduction
  • Video Coder
  • Video Content
  • Quantization Parameter