- Research Article
- Open Access
Video Frames Reconstruction Based on Time-Frequency Analysis and Hermite Projection Method
© Srdjan Stanković et al. 2010
- Received: 15 February 2010
- Accepted: 14 August 2010
- Published: 18 August 2010
A method for temporal analysis and reconstruction of video sequences based on the time-frequency analysis and Hermite projection method is proposed. The S-method-based time-frequency distribution is used to characterize stationarity within the sequence. Namely, a sequence of DCT coefficients along the time axes is used to create a frequency-modulated signal. The reconstruction of nonstationary sequences is done using the Hermite expansion coefficients. Here, a small number of Hermite coefficients can be used, which may provide significant savings for some video-based applications. The results are illustrated with video examples.
- Discrete Cosine Transform
- Discrete Cosine Transform Coefficient
- Hermite Function
- Short Time Fourier Transform
- Discrete Cosine Transform Block
Video signal exchange and storage are very important in multimedia applications. For this purpose, different kinds of video processing techniques are needed, such as video compression algorithms, video denoising methods, and scene analysis [1–4]. Depending on the video quality and bit-rate constraints, various compression algorithms have been developed [5–10]. These algorithms commonly employ motion-compensated differential coding (known as P and B frames), that is the interframe prediction based on the reference frames (I-frames). I-frames are set at user-defined intervals (e.g., 1 key frame for every 5 frames, or 15 frames, etc.). Thus, the algorithm compares two images and sends only the parts of the following images (B- and P-frames) that differ from the reference image . For example, such algorithms are MPEG-2 compression and its improved version MPEG-4 . A good implementation of MPEG-4 can additionally reduce the bit rate for approximately 15%, but it requires high processing power. Furthermore, the H.264 standard improves compression in comparison to MPEG-4 [6–8]. It offers many additional but optional tools, so that the compression ratio will significantly vary for different implementations. The most popular Baseline Profile provides a bit rate reduction of 10%–30% over MPEG-4, but it requires almost twice the CPU power. An overly simple H.264 implementation may produce worse results than an MPEG-4 implementation while the Main Profile is computationally heavy. Finally, some applications use the Moving-JPEG (MJPEG) multimedia format, where video frames are separately compressed as JPEG images . It does not include interframe prediction, which results in lower compression ratio. However, it has been commonly used by digital still cameras for the unified treatment of still and video compression. Also, it has been used for IP-based video cameras via HTTP streams.
Here, we propose a method for video sequence reconstruction based on the time-frequency analysis and Hermite projections. The main goal of this paper is not to provide a specific compression solution for video applications, but rather an auxiliary tool for other video processing algorithms, such as video surveillance, motion tracking, and video compression. Combined with the existing compression algorithms, this approach can additionally reduce the amount of data required for high-quality video reconstruction. It does not use the exhaustive search procedures for motion estimation, spatial or temporal prediction, or the computationally demanding advanced options included in other approaches. The proposed procedure can be applied to the coefficients of raw video format or the reference frames (I frames) of coded video, or to the coefficients within the sequence of JPEG images. Therefore, the possibility to merge it with the existing techniques could be interesting for researchers and could provide additional improvements of compression ratio.
The procedure consists of two parts. The first one employs the time-frequency analysis to examine the temporal stationarity/non-stationarity of the coefficients over time. When observing a sequence of video frames, one may distinguish between stationary scene regions that do not change over time and dynamic scene regions containing moving objects (nonstationary regions). Video sequences usually contain noise, causing coefficients to vary, even in the absence of moving objects. In order to reduce the noise influence, here we propose a time-frequency-based procedure for temporally stationary and nonstationary coefficients characterization. Various time-frequency distributions have been used for the analysis of noisy nonstationary signals with different instantaneous frequency laws [11, 12]. Here, we focus on the use of computationally efficient quadratic distribution called the S-method [13, 14]. To characterize temporal behaviour, the sequence of coefficients at the position is analyzed by using the S-method.
The second part of the proposed procedure deals with the high-quality reconstruction of the coefficients. The reconstruction of a stationary sequence is based on its first coefficient. On the other hand, the efficient reconstruction of nonstationary sequences of coefficients is obtained by using the Hermite projection method . Namely, by using a certain number of Hermite coefficients, nonstationary sequence can be reconstructed. This number could be quite smaller than the length of original sequence. Although, the quality of reconstructed video depends on the number of Hermite functions, significant savings can be achieved even if a high video quality is required.
The paper is organized as follows. Section 2 describes the theory behind the time-frequency analysis and its application for characterizing the temporal stationarity. In Section 3, the reconstruction procedure based on the Hermite projection method is proposed. In Section 4, the proposed method is applied to the examples. Concluding remarks are given in Section 5.
A brief theoretical background on the S-method-based time-frequency analysis and the Hermite projection method is presented in this Section. The time-frequency analysis will be used to characterize the stationarity of video coefficients over time while the Hermite projection method reduces the amount of data for high-quality video reconstruction.
2.1. Time-Frequency Analysis—the S-Method
where n and k denote discrete time and frequency, respectively, while the rectangular window P(l) is assumed. Parameter L determines the frequency window width which is . Windowing the product in the convolution through the narrow window , the cross-terms will be reduced or even removed. Thus, by choosing an appropriate value of , the sharpness of the Wigner distribution can be preserved while avoiding the cross-terms. Namely, high autoterms concentration is obtained with only a few summation terms due to the fast convergence within . Hence, in many practical applications is a suitable choice (e.g., ). Also, as shown in the sequel, a lower L value requires a fewer number of computations.
The S-method is computationally less demanding in comparison with other quadratic distributions. It requires complex multiplications and complex additions (N is the number of samples within the window), unlike the Wigner distribution which requires complex multiplications and complex additions. Also, the S-method allows simple and efficient hardware realization that has already been done .
2.2. Fast Hermite Projection Method
3. Video Analysis and Reconstruction Using Time-Frequency Representations and Fast Hermite Projection Method
3.1. Analysis of Temporal Stationarity within the Video Sequence
where block position is determined by the position of its first coefficient while indicate frames' numbers.
The temporal sequence of coefficients may contain the nonstationarities due to the motion, noise, or luminance variations. Thus, the stationary sequence becomes slightly nonstationary even in the presence of a small amount of noise. The comparison between consecutive coefficients may lead to an incorrect conclusion. Consequently, cannot be used to indicate whether a sequence is stationary or not. In order to eliminate the influence of noise, the time-frequency analysis is employed. Therefore, the examination of stationarity is performed by using the time-frequency-based instantaneous frequency estimation. It is estimated as a position of the time-frequency distribution maxima as explained below.
Therefore, if , the block at the position is stationary and will remain unaltered within K consecutive frames. Otherwise, the observed block is nonstationary.
The time-frequency representation of stationary sequence should be robust to certain amount of noise, meaning that it should be flat even in the presence of noise. Otherwise, the nonstationarities caused by the noise may be interpreted as nonstationarities due to the motion. Note that additive noise within the sequence becomes multiplicative one after the frequency-modulated signal is formed (according to (14)). The performance of time-frequency distributions in the presence of multiplicative noises has been studied in the literature [20–23], where various analyses and optimality conditions have been derived. Here, numerous experiments have been performed to prove good characteristics of the proposed approach in a noisy environment.
In each case, one sample frame is illustrated (left), as well as the noisy sequence of DC coefficients and its time-frequency representation (right), which is flat even in the presence of noise.
where is an AC component within the block. The S-method provides a cross-term free representation, but the components have to be spaced from each other by using the constants Namely, these constants are used to shift the components up and down from the central frequency, so that they do not overlap. They are integers whose values depend on the window width and can be chosen experimentally.
3.2. Hermite Projection-Based Temporal Reconstruction of Nonstationary Pixels within the Sequence of Video Frames
The Hermite functions are used as the basis functions for the video sequence expansion method due to their favorable properties. They represent an independent set of orthogonal functions, with good localization. Therefore, they can provide a unique representation of signals, while the coefficients of expansion are easily computed. Hence, the Hermite functions-based transform has been used in many applications for different types of signals, especially for images [15, 16]. Beside the Hermite functions, some other possible basis functions with desirable properties are Legendre polynomial, Laguerre polynomials, Bessel functions, and so forth . For instance, the Legendre polynomials are defined on normalized intervals and their Fourier transform has infinite spread. Thus, there are difficulties to determine the expansion coefficients when the original signal is not explicitly given. The uncertainty inequalities for Laguerre polynomials cannot be easily reduced to a form that involves only expansion coefficients. In the case of Bessel function, the derivation of the coefficients from explicit or implicit information about the signal is very complicated .
Furthermore, by using the Hermite expansion, the signal energy is approximated by the numerical integral of the Gauss-Hermite type and converges more rapidly than the rectangle rule in the case of the DCT . Therefore, the Hermite functions allow for a higher concentration of signal energy at lower frequencies and lead to better compression.
where represents a pixel value in the k th frame. The sequence can be decomposed into N Hermite functions: A sequence of K elements can be reconstructed even by a small number of Hermite coefficients c p , that is, for An error, depending on the value of N, is introduced by the reconstruction. Thus, with a suitable choice of N, a sequence with K pixels can be represented using smaller number (N) of coefficients without significant quality degradation.
Therefore, in the case with , the sequence is reconstructed by using a number of Hermite coefficients that is half the number of original coefficients, that is, . In the second case, the saving rate is .
The previously described procedure should be done for all AC components, as well.
A video sequence with 1200 frames (48 seconds) is considered. It is recorded by the video surveillance camera in the shopping center. It is split into three parts in order to illustrate different moving objects. Several frames for each of them are merged in Figure 5.
First, the temporal stationarity of blocks is analyzed. For this purpose, the frames are divided into blocks and the DCT is performed. Then, the DC sequences are obtained for
In the time-frequency analysis, the window width influences the resolution in the time-frequency domain. A narrow window produces good time resolution while a wide window produces good frequency resolution. In practical applications, the window width should be chosen to provide a good tradeoff between resolutions along the two axes. Here, the window widths of 32, 64, and 128 samples are analyzed and it has been shown experimentally that the width of 64 samples is the most appropriate for the considered sequence length. Thus, the stationarity of a DC sequence is analyzed by using the S-method with window width of 64 samples while . An appropriate value of is chosen to produce a smoothed representation of stationary coefficients, keeping the variations of nonstationary (dynamic) coefficients still intensive.
stationary block (e.g., box 1 in Figure 5),
partly nonstationary (e.g., box 2), and
nonstationary block (e.g., box 3).
The blocks with DC sequences producing constant value in the time-frequency domain (Figure 6(a)) are stationary over the considered time and could be reconstructed from the first frame. Therefore, a temporally stationary sequence of DC components is reconstructed over time by a single coefficient. The same holds for AC components from the stationary block.
stationary part 1:360-1 coefficient,
nonstationary part 361:450-60 Hermite coefficients, that is,
stationary part 451:900-1 coefficient,
nonstationary part 901:1200-200 Hermite coefficients, that is, .
Thus, the total number of coefficients, required for the reconstruction of partly nonstationary sequence (Figure 6(b)) of length 1200, is 262. Note that two coefficients should be added for the baseline calculation of each nonstationary part. However, they do not have significant influence to the total number of coefficients.
nonstationary part 1:360-257 Hermite coefficients
stationary part 361:460-1 coefficient,
nonstationary part 461:520-42 coefficients
stationary part 521:690-1 coefficient,
nonstationary part 691:1100-230 coefficients,
stationary part 1101:1200-1 coefficient.
The total number of coefficients is 532 (without the baseline ones). For the three observed sequences, the average number of Hermite coefficients, required for the reconstruction, is 265 per sequence. It provides the average saving ratio .
Note that, if the DC component is nonstationary, most of the AC components are also nonstationary. The S-method obtained for a few AC components within the nonstationary block is shown in Figure 7(a)–7(d). In the case of AC components reconstruction, a high quality is achieved with . Although the block is nonstationary, some coefficients (e.g., AC (4, 4) in Figure 7(d)) can be partly nonstationary and require just a partial reconstruction with Hermite coefficients.
The total number of stationary, partly nonstationary, and nonstationary blocks within the 1200 frames of the observed sequences is given in Table 1. For the sake of simplicity, it is assumed that all 64 components within the block have almost the same temporal behavior. Nevertheless, there could be slight variations for some of the AC components.
From the presented statistics, we can calculate the total number of coefficients for video reconstruction, which is approximately 20% of the number of original coefficients.
Some of the reconstructed and original nonstationary blocks are illustrated in Figure 8. Each row presents a reconstructed block (left) versus its original version (right). The blocks are chosen randomly from different frames to illustrate the quality of reconstruction. Note that the difference between the original and reconstructed blocks is imperceptible. Additionally, an original and corresponding reconstructed frame is shown in Figure 9. It can be seen that the reconstructed frame preserves the quality of the original one.
The peak signal to noise ratio (PSNR) is calculated and it is approximately around 47 dB, which is significantly higher than in the other compression algorithms . As previously estimated, the proposed method requires approximately 20% of the original coefficients for such a high-quality reconstruction, entailing the compression ratio 5 : 1. Thus, if combined with the existing algorithms it may significantly improve the total compression ratio, without degrading the quality. The estimated compression ratio can be further increased which will produce a lower PSNR.
The number of stationary and nonstationary blocks within the considered video sequence.
Total no. of frames observed
Total no. of blocks
No. of stationary blocks
No. of partly nonstationary blocks
No. of nonstationary blocks
This example aims to show that the proposed method can be performed even on a set of nonconsecutive frames, such as I frames in the MPEG sequence. For this purpose, we made a new sequence of frames that will be called I sequence by selecting each 13th frame from the starting video sequence (we assumed that the I frame rate is set at every 13 frames). However, without loss of generality, we can also use each 5th, 12th, or 15th frame, depending on I frame refreshing rate which can be user defined. The total number of frames within the sequence is 126. Due to a smaller number of coefficients than in the previous case, the window width is 42 samples for the calculation of the S-method.
In order to optimize the processing time, the S-method is calculated for several components at once. The illustrations are given in Figure 10, where the multicomponent time-frequency representation is given for four DCT components from two image blocks. Note that, the DCT components within the first block (Figure 10(a)) are mostly stationary, unlike the components from the second block.
Example 3 (Performance comparison with MJPEG).
In this example, we discuss one simple solution for combining the proposed approach with the Motion JPEG algorithm in order to improve the compression ratio. A part of a video sequence having 126 JPEG frames (as a basis of MJPEG format) of total size 1.38 MB is used. The frame size is while the average number of bits per block is .
The proposed approach classifies DCT blocks into stationary (S) and nonstationary (NS) ones. In the considered sequence, the number of S blocks is , while . All the coefficients from the S blocks are constant over time and can be reconstructed from the corresponding first frame's blocks. Thus, while the set of 126 JPEG frames requires No 126 bits, the proposed approach needs bits to represent the coefficients of S blocks.
for Motion JPEG: ,
for the combined (proposed + MJPEG) approach: .
In this example, the combined approach leads to 10 times smaller size of video sequence.
The proposed method for video sequence reconstruction employs two different signal processing techniques: the time-frequency analysis and the Hermite projection method. The time-frequency distribution provides an efficient analysis of temporal variations of coefficients. In that sense, it is used to distinguish stationary and nonstationary coefficients. Temporally nonstationary coefficients are reconstructed using a smaller number of Hermite expansion coefficients. The results have shown that the high-quality video reconstruction can be achieved by using significantly reduced number of coefficients. An additional improvement can be obtained by using the JPEG compression to reduce the number of AC components that should be reconstructed. The future works could include the time-frequency-based analysis of temporal stationarity in video surveillance applications to detect the appearance of moving objects. For instance, the surveillance system may ignore nonstationarities of short duration (e.g., bird flyover) while the attention should be paid when nonstationary segments last longer (meaning that significant movements appear). To make the proposed method faster for possible real time applications, it would be necessary to develop a special purpose hardware implementation.
The authors are thankful to the anonymous reviewers for their valuable comments and suggestions. Test video data used in the experiments are coming from the EC Funded CAVIAR Project/IST 2001 37540, found at URL: http://homepages.inf.ed.ac.uk/rbf/CAVIAR/.
- Sullivan GJ, Wiegand T: Video compression-from concepts to the H.264/AVC standard. Proceedings of the IEEE 2005, 93(1):18-31.View ArticleGoogle Scholar
- Mitchell JL, Pennebaker WB, Fogg CE, LeGall DJ: MPEG Video Compression Standard. Chapman & Hall, Boca Raton, Fla, USA; 1997.View ArticleGoogle Scholar
- Pižurica A, Zlokolica V, Philips W: Noise reduction in video sequences using wavelet-domain and temporal filtering. Wavelet Applications in Industrial Processing, October 2003, Proceedings of SPIE 5266: 48-59.View ArticleGoogle Scholar
- Zlokolica V, Ptžurica A, Philips W: Wavelet-domain video denoising based on reliability measures. IEEE Transactions on Circuits and Systems for Video Technology 2006, 16(8):993-1007.View ArticleGoogle Scholar
- Sikora T: MPEG digital video coding standards. In Digital Electronics Consumer Handbook. McGraw Hill, New York, NY, USA; 1997.Google Scholar
- Richardson E: H.264 and MPEG-4 Video Compression Video Coding for Next-generation Multimedia. John Wiley & Sons, New York, NY, USA; 2003.Google Scholar
- Sullivan GJ, Topiwala P, Lutha A: The H264/AVC advanced video coding standard, overview and introduction to the fidelity range extensions. Applications of Digital Image Processing XXVII, August 2004, Proceedings of SPIE 5558: 454-474.View ArticleGoogle Scholar
- Wiegand T, Sullivan GJ, Bjøntegaard G, Luthra A: Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 2003, 13(7):560-576.View ArticleGoogle Scholar
- Pearson G, Gill M: An evaluation of Motion JPEG 2000 for video archiving. Proceedings of the Archiving, April 2005, Washington, DC, USA 237-243.Google Scholar
- Hakeem A, Shafique K, Shah M: An object based video coding framework for video sequences obtained from static cameras. Proceedings of the 13th annual ACM International Conference on Multimedia (MULTIMEDIA '05), November 2005, Singapore 608-617.View ArticleGoogle Scholar
- Cohen L: Time-Frequency Analysis. Prentice Hall, Upper Saddle River, NJ, USA; 1995.Google Scholar
- Boashash B: Estimating and interpreting the instantaneous frequency of a signal-Part 1: fundamentals. Proceedings of the IEEE 1992, 80(4):520-538. 10.1109/5.135376View ArticleGoogle Scholar
- Stanković L: Method for time-frequency analysis. IEEE Transactions on Signal Processing 1994, 42(1):225-229. 10.1109/78.258146View ArticleGoogle Scholar
- Stanković S, Stanković L, Ivanović V, Stojanović R: An architecture for the VLSI design of systems for time-frequency analysis and time-varying filtering. Annales des Telecommunications 2002, 57(9-10):974-995.Google Scholar
- Krylov A, Korchagin D: Fast hermite projection method. Proceedings of the 3rd International Conference on Image Analysis and Recognition (ICIAR '06), September 2006, Povoa de Varzim, Portugal, Lecture Notes in Computer Science 4141: 329-338.Google Scholar
- Kortchagine DN, Krylov AS: Projection Filtering in image processing. Proceedings of the International conference on the Computer Graphics and Vision (Graphicon '00) 42-45.Google Scholar
- Stanković S, Orović I, Žarić N: An application of multidimensional time-frequency analysis as a base for the unified watermarking approach. IEEE Transactions on Image Processing 2010, 19(3):736-745.MathSciNetView ArticleGoogle Scholar
- Venkatesh YV: Hermite polynomials for signal reconstruction from zero-crossings. Part 1: one-dimensional signals. IEE Proceedings, Part I 1992, 139(6):587-596.Google Scholar
- Lazaridis P, Debarge G, Gallion P, et al.: Signal compression method for biomedical image using the discrete orthogonal Gauss-Hermite transform. Proceedings of the 6th WSEAS International Conference on Signal Processing, Computational Geometry & Artificial Vision, August 2006 34-38.Google Scholar
- Barkat B: Analysis of frequency modulated signals in multiplicative noise. Proceedings of the 6th International Symposium on Signal Processing and its Applications, 2001 2: 753-756.Google Scholar
- Barkat B: Instantaneous frequency estimation of nonlinear frequency-modulated signals in the presence of multiplicative and additive noise. IEEE Transactions on Signal Processing 2001, 49(10):2214-2222. 10.1109/78.950777View ArticleGoogle Scholar
- Boashash B, Ristic B: Polynomial time-frequency distributions and time-varying higher order spectra: application to the analysis of multicomponent FM signals and to the treatment of multiplicative noise. Signal Processing 1998, 67(1):1-23. 10.1016/S0165-1684(98)00018-8View ArticleMATHGoogle Scholar
- Nguyen LT: Estimation and separation of linear frequency-modulated signals in wireless communications using time-frequency signal processing, Ph.D. thesis. Signal Processing Research Center, Queensland University of Technology, Brisbane, Australia; 2004.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.