Skip to main content

Lightweight Object Tracking in Compressed Video Streams Demonstrated in Region-of-Interest Coding


Video scalability is a recent video coding technology that allows content providers to offer multiple quality versions from a single encoded video file in order to target different kinds of end-user devices and networks. One form of scalability utilizes the region-of-interest concept, that is, the possibility to mark objects or zones within the video as more important than the surrounding area. The scalable video coder ensures that these regions-of-interest are received by an end-user device before the surrounding area and preferably in higher quality. In this paper, novel algorithms are presented making it possible to automatically track the marked objects in the regions of interest. Our methods detect the overall motion of a designated object by retrieving the motion vectors calculated during the motion estimation step of the video encoder. Using this knowledge, the region-of-interest is translated, thus following the objects within. Furthermore, the proposed algorithms allow adequate resizing of the region-of-interest. By using the available information from the video encoder, object tracking can be done in the compressed domain and is suitable for real-time and streaming applications. A time-complexity analysis is given for the algorithms proving the low complexity thereof and the usability for real-time applications. The proposed object tracking methods are generic and can be applied to any codec that calculates the motion vector field. In this paper, the algorithms are implemented within MPEG-4 fine-granularity scalability codec. Different tests on different video sequences are performed to evaluate the accuracy of the methods. Our novel algorithms achieve a precision up to 96.4.


  1. 1.

    Lipton AJ, Fujiyoshi H, Patil RS: Moving target classification and tracking from real-time video. Proceedings of the 4th IEEE Workshop on Applications of Computer Vision (WACV '98), October 1998, Princeton, NJ, USA 8–14.

    Google Scholar 

  2. 2.

    van der Schaar M, Lin Y-T: Content-based selective enhancement for streaming video. Proceedings of IEEE International Conference on Image Processing (ICIP '01), October 2001, Thessaloniki, Greece 2: 977–980.

    Google Scholar 

  3. 3.

    Wang H, Chang S-F: A highly efficient system for automatic face region detection in MPEG video. IEEE Transactions on Circuits and Systems for Video Technology 1997,7(4):615–628. 10.1109/76.611173

    Article  Google Scholar 

  4. 4.

    Bregler C: Learning and recognizing human dynamics in video sequences. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 1997, San Juan, Puerto Rico, USA 568–574.

    Google Scholar 

  5. 5.

    Cavallaro A, Steiger O, Ebrahimi T: Semantic video analysis for adaptive content delivery and automatic description. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(10):1200–1209.

    Article  Google Scholar 

  6. 6.

    Sukmarg O, Rao KR: Fast object detection and segmentation in MPEG compressed domain. Proceedings of IEEE Region 10 Annual International Conference on TENCON (TENCON '00), September 2000, Kuala Lumpur, Malaysia 3: 364–368.

    Google Scholar 

  7. 7.

    Chien S-Y, Huang Y-W, Chen L-G: Predictive watershed: a fast watershed algorithm for video segmentation. IEEE Transactions on Circuits and Systems for Video Technology 2003,13(5):453–461. 10.1109/TCSVT.2003.811605

    Article  Google Scholar 

  8. 8.

    Dasiopoulou S, Mezaris V, Kompatsiaris I, Papastathis V-K, Strintzis MG: Knowledge-assisted semantic video object detection. IEEE Transactions on Circuits and Systems for Video Technology 2005,15(10):1210–1224.

    Article  Google Scholar 

  9. 9.

    Mezaris V, Kompatsiaris I, Boulgouris NV, Strintzis MG: Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval. IEEE Transactions on Circuits and Systems for Video Technology 2004,14(5):606–621. 10.1109/TCSVT.2004.826768

    Article  Google Scholar 

  10. 10.

    Isard M, Blake A: Contour tracking by stochastic propagation of conditional density. Proceeding of 4th European Conference on Computer Vision (ECCV '96), April 1996, Cambridge, UK 1: 343–356.

    Google Scholar 

  11. 11.

    Lie W-N, Chen R-L: Tracking moving objects in MPEG-compressed videos. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '01), August 2001, Tokyo, Japan 1172–1175.

    Google Scholar 

  12. 12.

    Achanta R, Kankanhalli M, Mulhem P: Compressed domain object tracking for automatic indexing of objects in MPEG home video. Proceedings of IEEE International Conference on Multimedia and Expo (ICME '02), August 2002, Lusanne, Switzerland 2: 61–64.

    Article  Google Scholar 

  13. 13.

    Park S-M, Lee J: Object tracking in MPEG compressed video using mean-shift algorithm. Proceedings of the Joint Conference of the 4th International Conference on Information, Communications and Signal Processing and the 4th Pacific-Rim Conference on Multimedia (ICICS-PCM '03), December 2003, Singapore 2: 748–752.

    Google Scholar 

  14. 14.

    Favalli L, Mecocci A, Moschetti F: Object tracking for retrieval applications in MPEG-2. IEEE Transactions on Circuits and Systems for Video Technology 2000,10(3):427–432. 10.1109/76.836288

    Article  Google Scholar 

  15. 15.

    Li W: Overview of fine granularity scalability in MPEG-4 video standard. IEEE Transactions on Circuits and Systems for Video Technology 2001,11(3):301–317. 10.1109/76.911157

    Article  Google Scholar 

  16. 16.

    Ohm J-R: Advances in scalable video coding. Proceedings of the IEEE 2005,93(1):42–56.

    Article  Google Scholar 

  17. 17.

    Pereira F, Ebrahimi T (Eds): The MPEG-4 Book. Prentice-Hall, Englewood Cliffs, NJ, USA; 2002.

    Google Scholar 

  18. 18.

    Puri A, Chen T (Eds): Multimedia Systems, Standards and Networks. Marcel Dekker, New York, NY, USA; 2000.

    Google Scholar 

  19. 19.

    Ascenso J, Pereira F: Drift reduction for a H.264/AVC fine grain scalability with motion compensation architecture. Proceedings of International Conference on Image Processing (ICIP '04), October 2004, Singapore 4: 2259–2262.

    Google Scholar 

  20. 20.

    Domański M, Blaszak L, Maćkowiak S: AVC video coders with spatial and temporal scalability. Proceedings of Picture Coding Symposium (PCS '03), 2003, Saint Malo, France 41–46.

    Google Scholar 

  21. 21.

    Ugur K, Louizis G, Nasiopoulos P, Ward R: Extremely fast selective enhancement method for fine granular scalable enabled H.264 video. Proceedings of IEEE Canadian Conference on Electrical and Computer Engineering (CCECE '03), May 2003, Montreal, Canada 2: 1103–1106.

    Google Scholar 

  22. 22.

    Ugur K, Nasiopoulos P: Combining bitstream switching and FGS for H.264 scalable video transmission over varying bandwidth networks. IEEE Pacific Rim Conference on Communications Computers and Signal Processing (PACRIM '03), August 2003, Victoria, BC, Canada 2: 972–975.

    Google Scholar 

  23. 23.

    Ling F, Li W, Sun H: Bitplane coding of DCT coefficients for image and video compression. Visual Communications and Image Processing, January 1999, San Jose, Calif, USA, Proceedings of SPIE 3653: 500–508.

    Article  Google Scholar 

  24. 24.

    Cavalli F, Cucchiara R, Piccardi M, Prati A: Performance analysis of MPEG-4 decoder and encoder. Proceedings of 4th EURASIP-IEEE International Symposium on Video/Image Processing and Multimedia Communications (VIPromCom '02), June 2002, Zadar, Croatia 227–231.

    Google Scholar 

  25. 25.

    Lehtoranata O, Hämäläinen TD: Complexity analysis of spatially scalable MPEG-4 encoder. IEEE International Symposium on System-on-Chip, November 2003, Tampere, Finland 57–60.

    Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Robbie De Sutter.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article

De Sutter, R., De Wolf, K., Lerouge, S. et al. Lightweight Object Tracking in Compressed Video Streams Demonstrated in Region-of-Interest Coding. EURASIP J. Adv. Signal Process. 2007, 097845 (2006).

Download citation


  • Motion Vector
  • Object Tracking
  • Content Provider
  • Scalable Video Coder
  • Scalable Video