 Review Article
 Open Access
 Published:
TimeFrequency Analysis and Its Application in Digital Watermarking
EURASIP Journal on Advances in Signal Processing volume 2010, Article number: 579295 (2010)
Abstract
A review of timefrequency analysis and some aspects of its applications in digital watermarking are presented. The main advantages and drawbacks of various timefrequency distributions are first discussed. The aim of this theoretical overview is to facilitate an appropriate distribution selection in a specific application. Different aspects of the timefrequency analysis when applied to digital watermarking are then presented. In particular, the method that maps timefrequency characteristics of a host signal to the pseudo noise watermark sequence is thoroughly discussed. This approach is presented in the multidimensional form and then applied to digital audio, digital image, and digital video watermarking. Finally, the theoretical considerations are illustrated by various numerical and reallife examples.
1. Introduction
Theoretical aspects of timefrequency analysis have been intensively studied over the last two decades [1–27]. In parallel, their various applications have been exploited as well. Namely, for an efficient analysis of nonstationary signals, such are radar, sonar, biomedical, seismic, and multimedia signals, timefrequency representations are required. Timefrequency distributions are most commonly used for this purpose. Many of the researchers have made significant efforts in defining a distribution that is optimal for a wide class of frequencymodulated signals [8–11]. As a result, a number of timefrequency distributions have been proposed. However, the efficiency of each of them is more or less limited to a specific class of signals and, consequently, to a specific application. One of the goals of this paper is to highlight the most important features of some popular timefrequency distributions and to give an idea of how to choose the most appropriate distribution depending on the signal form. The linear, quadratic, higherorder, and multiwindow timefrequency distributions are considered. The shorttime Fourier transform, as the most commonly used linear transform, is firstly discussed. Next, the Wigner distribution, as the best known quadratic distribution, is presented. Also, the Cohen class and some specific distributions belonging to this class are considered [1, 5–7]. It is shown that the quadratic distributions are optimal for a linear frequencymodulated signal. However, if the instantaneous frequency variation within the analysis window is faster, multiwindow, or higher order distributions should be used, [14–16, 19–27]. The Hermite functionsbased multiwindow approach is also discussed. Finally, highly concentrated distributions with complexlag argument are presented. To facilitate in better understanding of the presented theoretical considerations, numerous illustrative examples have been provided.
The second part of the paper considers timefrequencybased watermarking techniques. The watermarking of digital audio, digital image, and digital video is discussed [28–37]. A short overview of some existing approaches is first given and they are related to digital audio and digital image [28–50]. Watermarking using timefrequency techniques is usually employed in either of the following two ways. The first one uses the time/space domain of the host signal to embed the watermark with specific timefrequency characteristics. The timefrequency analysis is then used for detection. The second way uses timefrequency distributions to create or embed watermark in the timefrequency domain. A flexible procedure that can be used for different kinds of signals is discussed more extensively. Therein, the watermark is shaped according to the timefrequency characteristics of the host signal. The detection is performed in the timefrequency domain. This particular approach is presented in the multidimensional form, and it is applied to digital audio, digital image, and digital video. It provides a high degree of robustness and imperceptibility. Hence, even when the watermark is very weak, a reliably detection can still be achieved. Also, the watermark gets completely hidden by the timefrequency characteristics of the host signal.
2. TimeFrequency Analysis
The Fourier transform provides spectral content of a signal. It has been a valuable tool in various applications. However, for nonstationary signals the Fourier transform cannot give satisfactory results since the information about frequency components variations in time is required.
Furthermore, it can happen that two different signals have the same spectral contents, as illustrated in Figures 1(a) and 1(b). Based on Figure 1(c), however, we can conclude that the timefrequency representations of the two signals are quite different. This example is a simple illustration of the importance of timefrequency analysis for signals whose spectral contents vary with time. Various timefrequency distributions are used for this purpose. The ideal timefrequency representation can be described as [19–21]
where a signal of the form is considered. This representation provides the signal local energy distribution, as well.
A question that naturally arises at this point is whether there exist a single representation that would be ideal for any signal at hand. The answer is no, hence a number of timefrequency distributions have been introduced.
2.1. The ShortTime Fourier Transform
The simplest and most commonly used timefrequency representation is obtained by using the shorttime Fourier transform (STFT) defined as [2]
Thus, it is a windowed version of the Fourier transform. The sliding window function is denoted by , where τ is the lag coordinate. An illustration of the STFT calculation is shown in Figure 2.
Note that the spectral content is calculated for each windowed part of the signal. The central point of the sliding window is the time instant for which the spectrum is calculated. The influence of the window size is critical, as it will be discussed later.
The energetic version of the STFT is known as the spectrogram. It can be written as
where is the Fourier transform of the time domain window , is the first derivative of the phase function, the frequency convolution is denoted by , while is the spread factor which depends on the second and higher order phase derivatives. Note that an ideal representation will be obtained if the signal is constant frequency modulated ( for ) and if , that is, for a large time domain window. However, if the signal is not constant frequency modulated, a large window size can produce a low time resolution and vice versa. In general, there is a tradeoff between the time and frequency resolution, and it is best described with the uncertainty principle, as
where,
are the measures of duration in time and frequency, respectively. The signal should satisfy .
An illustration of the window size influence on the spectrogram resolution is given in Figure 3. A fourcomponent signal is considered. In order to achieve a good time resolution, two short sinusoidal components should be analyzed by using a narrow window. However, to obtain a good frequency resolution the third component (a sinusoid with a long duration) should be analyzed by using a wide window. In Figure 3(a), a narrow window is used, and it results in a good time resolution, while the frequency resolution is low. However, when a large window size is used, a low time resolution is obtained. Hence, the two short sinusoids have almost merged into one (see Figure 3(b)). Due to the presence of a linear frequencymodulated component (chirp signal), the spread factors are present in both cases.
The spectrogram satisfies the marginal properties
The total energy is obtained as
An important property of the STFT is its linearity. Namely, the STFT of a multicomponent signal is . Consequently, if the signal components do not intersect in the timefrequency plane the spectrogram will be equal to the sum of spectrograms of each of the signal components. This is evident from Figure 3.
2.2. Quadratic TimeFrequency Distributions
Quadratic timefrequency distributions have been introduced in order to improve the timefrequency resolution. Namely, they remove the spread factors for linear frequencymodulated signals. Among them, the Wigner distribution is the most commonly used. It has also been used as a base to define several interesting timefrequency distributions.
The windowed version of the Wigner distribution is called the pseudoWigner distribution. It is defined as [1, 2]
The spread factor in this case is .
The Wigner distribution satisfies the marginal conditions. Note that it is always real, while the numerical realization requires oversampling with factor 2. However, the Wigner distribution is not linear. Namely, for a multicomponent signal , the Wigner distribution is of the form
Thus, beside the autoterms, the interactions between different signal components (), called crossterms, appear. This is a major drawback of this distribution. The Wigner distribution of the fourcomponent signal, from the previous example, is given in Figure 4.
In this case, the auto components are well concentrated. However, the crossterms presence is significant. Namely, the timefrequency representation contains frequency components that do not exist within the signal itself. It could lead to a wrong analysis result.
The crossterms could be reduced by using a filter function in the ambiguity domain. The ambiguity function is the two dimensional Fourier transform of the Wigner distribution, that is,
In the ambiguity domain, the autoterms are located around the origin. Thus, a twodimensional filter, called the kernel function, is used to filter out the crossterms (that are generally located away from the origin) [1, 5–7]:
where is the ambiguity function, and is the kernel function. The timefrequency distribution based on the filtered function is obtained as
This is the definition of the Cohen class of distributions. By choosing the corresponding kernel functions, some specific distributions belonging to the Cohen class can be obtained. They satisfy the marginal properties if holds. For example, the ChoiWilliams distribution [5] is obtained for the kernel function where is a scaling factor to control its attenuation rate. The kernels for some distributions are defined in Table 1, [7].
The fourcomponent signal analyzed by the ChoiWilliams and BornJordan distribution is shown in Figure 5. The ChoiWilliams distribution with two values of its kernel parameter is used. The parameter results in a wider width of the kernel assuring good autoterms concentration, but a significant amount of crossterms is still present (Figure 5(a)). The BornJordan distribution producing almost the same autoterms concentration is shown in Figure 5(b). Note that the sidelobes and crossterms are more emphasized than in the ChoiWilliams distribution with [7]. However, if the ChoiWilliams distribution with the attenuation parameter is used (Figure 5(c)), the crossterms get suppressed, while the autoterm concentration becomes reduced (for the chirp component).
Note that there is a tradeoff between the crossterms reduction and autoterm concentration (it depends on parameters of the kernel function). Obviously, distributions belonging to the Cohen class lie in between the two extreme cases: the spectrogram that eliminates the crossterms with a low autoterm concentration and the Wigner distribution that provides high resolution, but with emphatic crossterms. It is possible to obtain an ideal timefrequency concentration only if signal dependent kernels are used [8, 10].
Next, we can ask the following question. Is there a distribution that provides the autoterms concentration as good as in the Wigner distribution, while eliminating the crossterms like the spectrogram does? In order to define a distribution with these properties, let us start with the following relationship between the shorttime Fourier transform and the Wigner distribution:
Clearly, the convolution along the frequency axis improves the autoterms concentration, but it introduces the crossterms. Thus, the convolution should be performed only over the same autoterms, avoiding different signal components being convolved. It can be performed by introducing a frequency domain window (see Figure 6).
A distribution that allows for such convolution is called the Smethod [17]. It is defined as
The frequency domain finite window is denoted by . Observe that for the spectrogram and the Wigner distribution are obtained, respectively.
The discrete version of the Smethod is
The discrete window width is . It determines the number of summation terms in (15). Note that they improve the spectrogram concentration toward that of the Wigner distribution. An illustration of the Smethod calculation is shown in Figure 7.
The calculation of the Smethod is illustrated for the central point of an autoterm as well as for the point located in between the two auto components (Figures 7(a) and 7(b)). It is important to observe that the summation has to be performed only over the autoterms. If the other terms are included, the concentration will not be improved. In addition, the noise could be also picked up. The window has to be narrower than the minimal distance between the autoterms. If this is not the case, the interactions between the autoterms will produce the crossterms. Namely, as illustrated in Figure 7(b), the crossterms appear if the window includes the summation terms marked by red color.
The adaptive Smethod with a variable window adjusted to the autoterms is introduced in [18]. However, in many applications the fixed window size of has been shown to provide very good results, since the convergence within the window is fast, and it is mostly achieved after a few summation terms.
The Smethod of the fourcomponent signal is given in Figure 8.
Note that all signal components (with constant and linear frequency modulations) are well concentrated even for . By increasing , for the crossterms start to appear (the minimal distance between the autoterms becomes less than ).
The spectrogram and the Smethod of a real speech signal are shown in Figure 9.
The speech signal timefrequency resolution is improved by using the Smethod.
Observe that the spread factor in the quadratic distributions will be present if the instantaneous frequency contains third and higher order phase derivatives. Hence, further concentration improvement can be obtained by using the multiwindow approach or by using higher order timefrequency distributions, as discussed below.
2.3. Multiwindow TimeFrequency Distributions
The concept of multiwindow timefrequency distributions has been developed by using optimally concentrated orthogonal windows [25–27]. The Hermite functions that are localized in both time and frequency domain can be used as orthogonal windows. The multiwindow spectrogram is defined as a weighted sum of the spectrograms:
The total number of the spectrograms and Hermite functions is , while are the weighting coefficients. The order Hermite function is defined as
This function can be obtained by using a recursive realization as follow
while:
The weighting coefficients are calculated by
where is the signal amplitude. The weighting coefficients for a signal with a constant amplitude are given in Table 2.
It is important to emphasize that the spread factor is reduced proportionally to the highest order of Hermite functions used in the multiwindow spectrogram:
Thus, the first term in the spread factor comes from the phase derivative. An additional concentration improvement can be achieved by introducing the multiwindow Smethod [51]. It can be written in the discrete domain as
In this case the spread factor is
An example, where the multiwindow spectrogram and the multiwindow Smethod are used is shown in Figure 10 (the components are with constant amplitude and each of them is treated separately).
From Figure 10 it is obvious that the multiwindow versions of distributions outperform their standard counterparts. The multiwindow approach reduces the noise influence as well [27].
This multiwindow approach can also be interpreted by using the Cohen class of distributions, where it can be written as a twodimensional convolution of the Wigner distribution and the kernel function: . The kernel function producing the multiwindow Wigner distribution is obtained as
where is the order Laguerre function, and it is the Wigner distribution of the order Hermite function.
According to the previous consideration, the spread factor can be gradually reduced by increasing the number of Hermite functions, that is, by increasing the number of spectrograms in (16) and (22). Similar resolution improvements can be obtained by using polynomial timefrequency distributions, where each additional order of the distribution results in a removal of one more term within the spread factor [3].
2.4. Distributions with ComplexLag Argument
A significant components concentration improvement can be achieved by introducing higher order distributions with the complexlag argument [19, 21]. A general form of these distributions is [22]
A special case follows for [19]
The fourthorder distribution is obtained for , that is, Thus, it has the form
The term with the complexlag argument is calculated by
For a multicomponent signal, it is calculated for each component separately, where the STFT is used to separate them [20]. This procedure can be generalized for an arbitrary distribution order [24].
For and we have . It defines the sixthorder distribution. Observe that, regarding the spread factor reduction, each distribution order is related to the previous one in the same way the Wigner distribution is related to the spectrogram. Namely, the spread factors for the fourth and the sixthorder distributions are
The complexlag distributions are in particular useful when the instantaneous frequency variation within the window is very fast. The examples where the distribution order is increased in order to improve timefrequency resolution are shown in Figure 11.
Note that the Wigner distribution produces poor results for both signals, since it cannot follow the instantaneous frequency variations.
3. Digital Watermarking
Digital watermarking has been used to protect multimedia data. Demands in this area increase proportionally with the number of internet applications. Namely, these applications are associated with a need for copyright protection of digital audio, digital image, and digital video. Note that the cryptographic methods could be used for this purpose. However, once the data are decoded they can be unlimitedly copied. This has been one of the primary reasons for developing the watermarking techniques. The watermarking, in general, consists of embedding a secret information that can be reliably detected within the host signal. Obviously, this information should be imperceptible within the host data. Depending on the application type, the watermarking can be robust, fragile, or semifragile. The robust watermark should be resistant to various nonmalicious or malicious attacks. Nonmalicious attacks are commonly used signal processing techniques such as compression algorithms, filtering, and so forth, while the malicious attacks are the signal processing techniques that are intentionally used to remove the watermark. The fragile watermark is used to prove data authenticity. Thus, if the content of a signal has been changed, the watermark should no longer exist. The semifragile watermark should be robust to a slight modification, such as for example a certain degree of compression.
Depending on the type of host signal (speech/audio signals, image, video, etc.) various watermarking approaches are developed. Also, different domains have been used: the time domain (or the space domain), the spectral domains such as DFT, DWT, and DCT domain, and a joint time/spacefrequency domain. The existing watermarking techniques are mainly based on either the time or frequency domain. However, in both cases, the timefrequency characteristics of the watermark do not correspond to the timefrequency characteristics of the host signal. It may result in the watermark being not imperceptible, because it is present in the timefrequency regions where the signal components do not exist.
3.1. An Overview of Some TimeFrequencyBased Watermarking Techniques
The timefrequency domain can be very efficient regarding the watermark imperceptibility and robustness. This section presents some key timefrequencybased watermarking procedures with the aim to inspire more contributions on this topic.
Here, we will classify the timefrequencybased watermarking techniques into two categories. The first one is the approaches based on watermarks with specific timefrequency characteristics, where the detection procedure is performed within the timefrequency domain. The second one uses the timefrequency domain to embed or to shape the watermark.

(A)
Image Watermark with Specific TimeFrequency Characteristics
(A.1) Among the first timefrequencybased image watermarking procedures is the approach introduced in [38]. Although the watermark is embedded in the space domain it is chosen to have a specific space/spatialfrequency characteristic. Namely, a twodimensional chirp signal is used as watermark:
Observe that the Wigner distribution provides an ideal representation for this signal.
The watermark is embedded within the entire image: .
The watermark detection is performed by using
where
The variable parameters ,, and are used. Different values of those parameters (, and ) produce a set of projections. The additional term can be used to detect some geometrical transformations, as well. Note that the detector has the form of the RadonWigner distribution, which ensures that the energy of the watermark is distributed over the hyper plane defined by ( is the phase function of the watermark). In order to make a decision weather the watermark exists in the image or not, the maxima of the Radon Wigner distribution
are compared with an assumed reference threshold. Also, multiple chirp watermarks with small and randomly chosen amplitude are used to increase flexibility of the proposed procedure. The parameters of the chirp signal as well as the random sequence that defines the amplitudes of chirp signals serve as the watermark key. Since the watermark is embedded within the entire image in the space domain, a proper masking that provides imperceptibility should be applied. An analysis of the performances giving an estimation of the detectable watermark amplitude level is provided in [38]. The robustness is tested on various attack, some being a median filter, geometrical transformations (translation, rotation and cropping simultaneously applied), a highpass filter, local notch filter, and Gaussian noise.
(A.2) Mobasseri et al. [44] have proposed a scheme for robust watermarking based on the polynomial phase. The algorithm combines the approach in [45] (where bits are embedded in the image) with the 2D chirpbased methods. Here, the image of size is partitioned into blocks. A 2D chirp of the form
is used, where are taken. The watermark is embedded in the block located at the pixel () according to
The constant that controls the watermark strength is , and the integer part is denoted by "", while are watermark bits taken from . The knowledge of the pair is required in order to recover . It can be obtained by using the chirp transform
where:
Finally, the total detection over all blocks can be obtained by
This provides a possibility to use the watermark that cannot be detected by considering a single block only. Thus, in such case it would be necessary to integrate all of them over the entire image. Note that it is also possible to generate different chirps for different blocks instead of using the same chirp for all blocks. It would make the detection even more difficult for unauthorized users.
The embedded bits are recovered by
The proposed method is adapted to be robust to the JPEG compression algorithm. The watermark is embedded within the blocks by using the quantization matrix . Namely, the DCT coefficients and the 2D chirp are quantized by this matrix. However, choosing an appropriate pair is necessary to ensure that the watermark survives this quantization. The watermark survival degree can be quantified by
where is the unmarked compressed block. The watermark is completely removed by compression if is obtained. The quality of the proposed technique is tested on the image Lena, and it is proven that, for this case, it outperforms the standard spread spectrum technique.
(A.3) The watermarking in the fractional Fourier domain belongs to the timefrequencybased algorithm as well. This approach is defined in [39], and it uses a combination of the space and spatialfrequency domain. Namely, the image is transformed in the fractional Fourier domain for the angles :
where FRFT denotes the onedimensional fractional Fourier transform. The FRFT can be treated as a rotation in the timefrequency plane for an angle , while the inverse transform can be considered as a rotation for the angle . Thus, the FRFT domain is a combination of the time and frequency domain (the Fourier transform is a special case for ). Depending on the angle , the FRFT assures that the time or the frequency domain is dominant. For close to the frequency domain is dominant, while for small the FRFT is dominantly in the time domain.The watermark is embedded in the FRFT coefficients reordered into a nonincreasing sequence . By analogy with the watermarking in the DCT domain, the first coefficients are omitted, while the next coefficients are used. The watermark is embedded as
A real valued watermark key composed of and is used. The detection is performed by
The performance analysis providing the detection threshold is done, the threshold being chosen as
where the watermark is a Gaussian white noise with the variance . The watermark key consists of the watermark sequence and the angles (_{1}, _{2}). Thus, the algorithm provides two more degrees of freedom, and it offers more possibility to generate watermarks. The watermarking procedure is tested on various images and attacks.
(A.4) Barkat and Sattar have proposed a fragile watermarking procedure for image authentication [43]. The watermark with a particular timefrequency signature is inserted in the image pixels. Although, in general, pixels (according to the image size) exist, a significantly lower number of them is used. The pixels location can be chosen arbitrarily. The authors have used diagonal pixels, modulated by a pseudonoise sequence as a secret key. Various frequencymodulated nonstationary signals can be a watermark, as well. However, the features that could be easily identified should be used. Consequently, different timefrequency distributions should be used for watermark detection. Barkat and Sattar have used a quadratic frequencymodulated signal. It is detected by using the Wigner distribution. The proposed scheme is tested on the following attacks: cropping, translation, JPEG compression, and scaling. Very week and imperceptible attacks were applied (e.g., JPEG with 99% quality is used). It is shown that the watermark cannot be identified after these attacks.
(B) Watermark Created in the TimeFrequency Domain
(B.1) An image watermarking approach is proposed by Alkhassaweneh and Aviyente in [49]. The image rows are used to create a set of onedimensional signals. Then, the Wigner distribution is calculated for each of them. Also, the watermark sequence is transformed to the timefrequency domain by using the Wigner distribution. Finally, the Wigner distribution of the watermark sequence is embedded in the Wigner distribution of each image row as follows:
where is a set of the timefrequency dependent weighting coefficients. The watermarked image is obtained by using the inverse transform. Having in mind that we deal with a real and positive signal, it is defined as
However, the previous equation holds only if the twodimensional function (45) is a valid Wigner distribution. Namely, it is well known that any twodimensional function cannot be the Wigner distribution. It introduces a very restrictive condition on the function . In the proposed method it is determined by using the timefrequency representation of the corresponding row and taking the middle frequency region.
Alkhassaweneh and Aviyente have suggested a nonblind detection procedure. Namely, the second part of the function in (46) that depends on the watermark is selected. The detection is performed by using the standard correlation detection. A threshold that provides a minimal probability of error is derived. The proposed method is tested on different images and under various attacks. The average probability of error was found to be 0.03.
(B.2) Foo et al. in [48] have defined a method for digital audio watermarking based on the timefrequency domain. Here the audio frames are changed, so that the logical value of 1 is assigned. If the original frame is lengthened or shortened, the logical value 1 is assigned, otherwise the "normal frames" correspond to the logical value 0. The watermark is a sequence obtained as a binary code of the alphabet letters, converted to the ASCII code (the example with the binary code 010001100101001101010111 for the letters FSW is used). The crucial part of this method is the selection of frames that will be lengthened or shortened (the frame size of 1024 samples is used). The frames with signal energy level above the masking threshold are selected (the psychoacoustic model is used to determine the masking threshold in each subband). The frames length is changed by adding or removing samples with amplitudes that do not exceed the masking threshold. Four samples are added or removed within the frame of 1024 samples. It ensures that a perceptual distortion will not appear. In order to preserve the total length of the watermarked audio signal, the same number of the lengthened and shortened frames is used. The pair of frames called Diamond frames is used to represent the binary 1, while the logical values 0 are assigned to the unaltered frames.
The detection procedure is nonblind, that is, the original signal is required. A significant difference between the watermarked and the original signals will appear only if a pair of changed frames exists. Thus, it is used for logical values detection. The proposed watermarking scheme has been tested on various musical signals, as well as on a speech signal, and a set of different attacks has been applied (filtering, resampling, noise, cropping, and MP3 compression). Although the results vary for different signals and attacks, in general they are good. The worst results are obtained for the rock and pop music signals with MP3 compression. However, in all cases the owner can be identified.
(B.3) Esmaili et al. have proposed a spread spectrum based watermarking in the timefrequency domain [46]. This technique is used for watermarking of music signals. The watermark is created as
where is the watermark before spreading, is the spreading code or the pseudonoise sequence, while is the timevarying carrier frequency. The parameter controls the watermark strength. The masking properties of the human auditory system are used to shape an imperceptible watermark. The pseudonoise sequence is low pass filtered according to the signal characteristics (the Butterwort filter is used). Two different scenarios of masking have been considered. The tone or noiselike characteristic are determined by using the entropy
The probability of energy for each frequency (within a window used for the spectrogram calculation) is denoted by , while is the maximum frequency. A half of the maximum entropy is taken as a threshold between noiselike and tonelike characteristics. If the entropy is lower than it is considered as a tonelike, otherwise it is a noiselike characteristic.
The timevarying carrier frequency is obtained as the instantaneous mean frequency of the host signal, calculated by
Finally, after the watermark is modulated and shaped, it is embedded in the time domain as .
A simple watermark detection procedure is applied. First, demodulation is performed by using the timevarying carrier, and then the watermark is detected by using the standard correlation procedure with the pseudonoise sequence.
The proposed method has been tested on several music files. It has been shown that, under various attacks, the bit error rates are mostly between 0.02 and 0.08.
(B.4) An interesting audio watermarking approach based on linear chirps has been proposed in [47]. The watermark is created as a chirp signal, which is perceptually shaped according to the host signal samples. Different chirp rates, each representing a unique watermark message, produce different slopes in the timefrequency domain. The efficient timefrequency representation is obtained by using the Wigner distribution. The extracted chirps are postprocessed in the timefrequency plane by an optimal line detection method based on the HoughRadon transform. It can correctly estimate the slope of the watermark signal despite the broken lines caused by attacks. The simulation results show that the HoughRadon transform applied to a timefrequency distribution can detect the watermark message correctly at bit error rates up to 20%.
3.2. Watermaking Approach Based on the TimeFrequencyShaped Watermark
The approach that will be presented can be used either for audio signals or images [41, 42]. Thus, the embedding and detection procedures for both kinds of signals will be defined and discussed simultaneously, by using the multidimensional notation.
In order to ensure imperceptibility constraints, the watermark should be modeled according to the timefrequency characteristics of the signal components. The concept of nonstationary multidimensional filtering [52] is adapted and used to create a watermark with timefrequency characteristics that correspond to the characteristics of the host signal. The corresponding algorithm consists of the following steps:
()selection of the nonstationary parts of signal suitable for watermark embedding;
()watermark modeling according to the multidimensional timefrequency characteristics of the host signal;
()watermark embedding and watermark detection procedure within the multidimensional timefrequency domain.
Multidimensional timefrequency distributions are employed in order to determine the nonstationary regions. As it will be shown later, the Smethod can be efficiently used to analyze dynamics of the regions of speech signals and images. Although the crossterms are usually undesirable in the timefrequency analysis, they have found to be useful in watermarking. Namely, they may increase performances of a speech watermark detector, and also, increase the efficiency of dynamic regions selection within an image.
The watermark is obtained at the output of a nonstationary filter as follows:
where is the shorttime Fourier transform of a multidimensional random sequence . The function contains the information about the components within the region . It is used to create the watermark that will be adjusted to these components. Thus, we may start with an arbitrary random multidimensional sequence and, by using , its multidimensional timefrequency characteristic is modeled.
The region will be used for watermarking if a timefrequency distribution contains a sufficient number of components whose energy is above a floor value:
The function returns a number of components that satisfy the condition within the parenthesis, while is the reference number of points used to make the decision about the region nonstationary. The parameter is an energy floor that can be determined as a portion of the TFD maximum:
A value of between 0 and 1 is taken.
The components' positions within are identified by using the support function:
An additional function is defined in order to consider the significant components only:
The energy threshold is denoted by . Thus, the resulting support function is defined as
The watermark embedding is done according to
where and are the shorttime Fourier transforms of the multidimensional watermarked data, the host data, and the watermark, respectively.
Note that when compared to the signal domain, in the multidimensional timefrequency domain the number of coefficients that contain the information about the watermark is significantly increased. Consequently, the detector response will be enhanced. The standard correlation detector in the multidimensional timefrequency domain is defined as
The multidimensional timefrequency domainbased detector provides a low probability of error, even when the number of watermarked samples in the signal domain is small.
3.2.1. Digital Audio Signal
Let us consider the voiced part of a speech signal. The region
is determined by the start and the end instances and of the voiced speech, as well as by the interval that contains the strongest formants. The Smethod is used to define the support function [41, 53]:
The region appropriate for watermarking is shown in Figure 12(a). The corresponding support function (Figure 12(b)) is created by using the value = with .
The watermark is embedded in the time domain: . The timefrequency representation of the watermark is shown in Figure 12(c).
As expected, the timefrequency characteristics of the watermark follow those components of the speech signal. Consequently, the watermark is inaudible within the speech signal.
Next, a music signal of the flute is considered, Figure 13(a).
In this case, the fourthorder complexlag distribution is more appropriate for the region selection than the Smethod, because it better follows the frequency variations in the signal (Figures 13(a) and 13(b)). Thus, by using this distribution an inaudible watermark is created.
Note that an important improvement in the watermark detection is obtained if the crossterms are included [41]. Namely, the watermark is present within the crossterms, as well. A standard correlation detector in the timefrequency domain that includes the crossterms can be written in the form
The second term in (60) is the result of crossterms.
Note that this form of detector can be used in other existing detector structures.
The following measure of the detection quality
is used. The mean value and standard deviation of the detector response are denoted by and . Indices and indicate the right and the wrong keys, respectively.
Efficiency of the proposed procedure is demonstrated on various examples. The results for speech signals with maximum frequencies of 4 kHz and 11,025 kHz are presented in [41]. This approach provides a reliable detection for a high SNR (SNR = 32 dB has been used) and under various attacks. The watermark sequence was created by using a pseudorandom Gaussian sequence of 1000 samples.
The probability of error was of order 10^{−7} for: MP3 (constant bit rate 8 kbps and variable bit rate 75–120 kbps are considered), delay monolight echo (180 ms, mixing 20%), echo 200 ms, deep flutter (deep 10, sweeping rate 5 kHz), amplitude (normalize 100%), and additive Gaussian noise (SNR = −35 dB). The worst case is obtained for pitch scaling and it is of order 10^{−5}. The results for other attacks (time stretch wow delay 20%, wow delay 10% and bright flutter, MP3 variable bit rate 40–50 kbps) are of order 10^{−6}.
3.2.2. Digital Image
The spacespatialfrequency analysis (twodimensional timefrequency analysis) is used to select pixels that belong to the image nonstationary regions [42]. The twodimensional Smethod is used as a spacespatial frequency distribution [54]:
By increasing the size of a twodimensional window, the crossterms start to appear. Thus, when compared to the spectrogram, the number of frequency components increases, hence making the region characterization easier. A pixel that belongs to the dynamic region can be selected by using the following procedure.
()The Smethod is calculated for a window (windows of size up to are used). The middle frequency range is used.
()The energy floor is obtained by using the experimentally determined .
()The region is considered as nonstationary if:
where , while is the total number of the points within the region .
The examples where the pixels belong to the dynamic and stationary regions, respectively, are shown in Figures 14(a) and 14(b).
The procedure for watermark embedding is just a twodimensional case of the presented multidimensional approach. Namely, a twodimensional support function is used:
The Smethod as a twodimensional timefrequency distribution is applied, while the energy threshold is . The watermark is shaped by the spacespatial frequency characteristic of the image components:
A twodimensional pseudorandom sequence is used.
The watermark embedding and detection are performed in the spacespatial frequency domain:
This procedure is tested on several images (Lena, Peppers, Boat, F16, and Barbara), under various attacks (JPEG80JPEG40, Median , Median , Average , Impulse noise, Gaussian noise, Lightening, and Darkening). The PSNR was around 50 dB. The number of the selected pixels varied from 3304 for F16 to 7833 for Barbara. The probability of error was compared with the standard DCTbased procedure (with different detector forms), where 22050 coefficients are used. It was shown that the proposed procedure significantly outperforms the standard DCT procedures.
3.2.3. Digital Video
Observe that the proposed approach can be also used for video signal watermarking. The twodimensional and onedimensional timefrequency distributions are combined in this case. Namely, the stationary pixels and stationary regions around them are selected by using the twodimensional analysis, as it was described in the previous subsection. Then, the time dependent sequence is produced by taking the stationary pixels at the position , along consecutive frames. Based on the frequency modulated signal is created as
where , while is a constant. The stationarity of the selected pixels, along the time axis, is examined by using the onedimensional Smethod. The experiments show that the minimal number of pixels for reliable watermark detection is about 600. This can be easily achieved, even for a very short video sequence (note that more than 2500 stationary pixels are obtained for a signal of duration of 2 s in the example provided in [42]). This approach was tested under the presence of MPEG4 compression. The obtained probabilities of errors were found to be within the range .
4. Conclusion
An overview of most important timefrequency analysis techniques is presented. An appropriate distribution selection procedure for a specific type of signal is discussed. Timefrequencybased watermarking algorithms for digital audio, digital image, and video are reviewed, as well. The watermark is either a signal with specific timefrequency characteristics or a pseudonoise sequence shaped according to the timefrequency characteristics of the host signal. The main advantages of the timefrequency domain over the Fourier, DCT, and signal domain are emphasized. Finally, the presented theory could be used to generalize the existing watermarking approaches defined in either the Fourier or the DCT domain.
References
Cohen L: Timefrequency distributions—a review. Proceedings of the IEEE 1989, 77(7):941981. 10.1109/5.30749
Hlawatsch F, BoudreauxBartels GF: Linear and quadratic timefrequency signal representations. IEEE Signal Processing Magazine 1992, 9(2):2167. 10.1109/79.127284
Boashash B, Ristić B: Polynomial timefrequency distributions and timevarying higher order spectra: application to the analysis of multicomponent FM signals and to the treatment of multiplicative noise. Signal Processing 1998, 67(1):123. 10.1016/S01651684(98)000188
Boashash B: TimeFrequency Analysis and Processing. Elsevier, Amsterdam, The Netherlands; 2003.
Choi H, Williams WJ: Improved timefrequency representation of multicomponent signals using exponential kernels. IEEE Transactions on Acoustics, Speech, and Signal Processing 1989, 37(6):862871. 10.1109/ASSP.1989.28057
Zhao Y, Atlas LE, Marks RJ: The use of coneshaped kernels for generalized timefrequency representations of nonstationary signals. IEEE Transactions on Acoustics, Speech, and Signal Processing 1990, 38(7):10841091. 10.1109/29.57537
Stanković L: Autoterm representation by the reduced interference distributions: a procedure for kernel design. IEEE Transactions on Signal Processing 1996, 44(6):15571563. 10.1109/78.506622
Baraniuk RG, Jones DL: A signaldependent timefrequency representation. Optimal kernel design. IEEE Transactions on Signal Processing 1993, 41(4):15891602. 10.1109/78.212733
Hlawatsch F, Urbanke RL: Bilinear timefrequency representations of signals: the shiftscale invariant class. IEEE Transactions on Signal Processing 1994, 42(2):357366. 10.1109/78.275608
Baraniuk RG, Jones DL: Signaldependent timefrequency analysis using a radially Gaussian kernel. Signal Processing 1993, 32(3):263284. 10.1016/01651684(93)90001Q
Amin MG, Williams WJ: High spectral resolution timefrequency distribution kernels. IEEE Transactions on Signal Processing 1998, 46(10):27962804. 10.1109/78.720381
Bastiaans MJ, Alieva T, Stanković L: On rotated timefrequency kernels. IEEE Signal Processing Letters 2002, 9(11):378381. 10.1109/LSP.2002.805118
Boashash B: Estimating and interpreting the instantaneous frequency of a signal—part 1: fundamentals. Proceedings of the IEEE 1992, 80(4):520538. 10.1109/5.135376
Barkat B, Boashash B: Design of higher order polynomial WignerVille distributions. IEEE Transactions on Signal Processing 1999, 47(9):26082611. 10.1109/78.782225
Viswanath G, Sreenivas TV: IF estimation using higher order TFRs. Signal Processing 2002, 82(2):127132. 10.1016/S01651684(01)001682
Stanković L: A multitime definition of the Wigner higher order distribution: LWigner distribution. IEEE Signal Processing Letters 1994, 1(7):106109. 10.1109/97.311805
Stanković L: A method for timefrequency analysis. IEEE Transactions on Signal Processing 1994, 42(1):225229. 10.1109/78.258146
Stanković S, Stanković L: An architecture for the realization of a system for timefrequency signal analysis. IEEE Transactions on Circuits and Systems II 1997, 44(7):600604. 10.1109/82.598433
Stanković S, Stanković L: Introducing timefrequency distribution with a "complextime" argument. Electronics Letters 1996, 32(14):12651267. 10.1049/el:19960849
Stanković L: Timefrequency distributions with complex argument. IEEE Transactions on Signal Processing 2002, 50(3):475486. 10.1109/78.984717
Cornu C, Stanković S, Ioana C, Quinquis A, Stanković L: Generalized representation of phase derivatives for regular signals. IEEE Transactions on Signal Processing 2007, 55(10):48314838.
Stanković S, Orovic I, Ioana C: Effects of Cauchy integral formula on the precision of the IF estimation. IEEE Signal Processing Letters 2009, 16(4):327330.
Morelande M, Senadji B, Boashash B: Complexlag polynomial WignerVille distribution. Proceedings of IEEE Speech and Image Technologies for Computing and Telecommunications (TENCON '97), December 1997, Brisbane, Australia 1: 4346.
Stanković S, Žarić N, Orović I, Ioana C: General form of timefrequency distribution with complexlag argument. Electronics Letters 2008, 44(11):699701. 10.1049/el:20080902
Frazer G, Boashash B: Multiple window spectrogram and timefrequency distributions. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '94), 1994 4: 193296.
Çakrak F, Loughlin PJ: Multiwindow timevarying spectrum with instantaneous bandwidth and frequency constraints. IEEE Transactions on Signal Processing 2001, 49(8):16561666. 10.1109/78.934135
Bayram M, Baraniuk RG: Multiple window timefrequency analysis. Proceedings of the IEEESP International Symposium on TimeFrequency and TimeScale Analysis, June 1996 173176.
Cox IJ, Miller ML, Bloom JA: Digital Watermarking. Academic Press, New York, NY, USA; 2002.
Barni M, Bartolini F: Watermarking Systems Engineering. Marcel Dekker, New York, NY, USA; 2004.
Special issue on "Identification and protection of multimedia information" Proceedings of the IEEE 1999., 87(7):
Muharemagic E, Furht B: Survey of watermarking techniques and applications. In Multimedia Watermarking Techniques and Applications. Edited by: Furht B, Kirovski D. Auerbach Publication; 2006:91130.
Kirovski D, Malvar HS: Spreadspectrum watermarking of audio signals. IEEE Transactions on Signal Processing 2003, 51(4):10201033. 10.1109/TSP.2003.809384
Steinebach M, Dittmann J: Watermarkingbased digital audio data authentication. EURASIP Journal on Applied Signal Processing 2003, 2003(10):10011015. 10.1155/S1110865703304081
Nikolaidis A, Pitas I: Asymptotically optimal detection for additive watermarking in the DCT and DWT domains. IEEE Transactions on Image Processing 2003, 12(5):563571. 10.1109/TIP.2003.810586
Hernández JR, Amado M, PérezGonzález F: DCTdomain watermarking techniques for still images: detector performance analysis and a new structure. IEEE Transactions on Image Processing 2000, 9(1):5568. 10.1109/83.817598
Briassouli A, Strintzis MG: Locally optimum nonlinearities for DCT watermark detection. IEEE Transactions on Image Processing 2004, 13(12):16041617. 10.1109/TIP.2004.837516
Cheng Q, Huang TS: An additive approach to transformdomain information hiding and optimum detection structure. IEEE Transactions on Multimedia 2001, 3(3):273284. 10.1109/6046.944472
Stanković S, Djurović I, Pitas L: Watermarking in the space/spatialfrequency domain using twodimensional RadonWigner distribution. IEEE Transactions on Image Processing 2001, 10(4):650658. 10.1109/83.913599
Djurović I, Stanković S, Pitas I: Digital watermarking in the fractional Fourier transformation domain. Journal of Network and Computer Applications 2001, 24(2):167173. 10.1006/jnca.2000.0128
Wickens T: Elementary Signal Detection Theory. Oxford University Press, Oxford, UK; 2002.
Stanković S, Orović I, Žarić N: Robust speech watermarking procedure in the timefrequency domain. EURASIP Journal on Advances in Signal Processing 2008, 2008:9.
Stanković S, Orović I, Žarić N: An application of multidimensional timefrequency analysis as a base for the unified watermarking approach. IEEE Transactions on Image Processing 2010, 19(3):736745.
Barkat B, Sattar F: A new timefrequency based private fragile watermarking scheme for image authentication. Proceedings of the 7th International Symposium on Signal Processing and Its Applications, July 2003 2: 363366.
Mobasseri BG, Zhang Y, Amin MG, Dogahe BM: Designing robust watermarks using polynomial phase exponentials. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '05), March 2005 833836.
Kutter M, Winkler S: A visionbased masking model for spreadspectrum image watermarking. IEEE Transactions on Image Processing 2002, 11(1):1625. 10.1109/83.977879
Esmaili S, Krishnan S, Raahemifar K: Audio watermarking using timefrequency characteristics. Canadian Journal of Electrical and Computer Engineering 2003, 28(2):5761. 10.1109/CJECE.2003.1532509
Erküçük S, Krishnan S, Zeytinoǧlu M: A robust audio watermark representation based on linear chirps. IEEE Transactions on Multimedia 2006, 8(5):925926.
Foo SW, Ho SM, Ng LM: Audio watermarking using timefrequency compression expansion. Proceedings of IEEE International Symposium on Cirquits and Systems (ISCAS '04), May 2004 201204.
Alkhassaweneh M, Aviyente S: A timefrequency based perceptual and robust watermarking scheme. Proceedings of 13th European Signal Processing Conference (EUSIPCO '05), September 2005, Antalya, Turkey
Erküçük S: Timefrequency analysis of spread spectrum based communication and audio watermarking systems, Ph.D. thesis. 2003.
Orović I, Stanković S, Thayaparan T, Stanković L: Multiwindow Smethod for instantaneous frequency estimation and its application in radar signal analysis. IET Signal Processing 2010, 90(5):7.
Stanković L, Stanković S, Djurović I: Space/spatialfrequency analysis based filtering. IEEE Transactions on Signal Processing 2000, 48(8):23432352. 10.1109/78.852015
Stanković S: About timevariant filtering of speech signals with timefrequency distributions for handsfree telephone systems. Signal Processing 2000, 80(9):17771785. 10.1016/S01651684(00)000876
Stanković S, Stanković L, Uskokovic Z: On the local frequency, group shift, and crossterms in some multidimensional timefrequency distributions: a method for multidimensional timefrequency analysis. IEEE Transactions on Signal Processing 1995, 43(7):17191724. 10.1109/78.398736
Acknowledgments
The author is very thankful to Dr. Irena Orović and Professor Victor Sučić for their help and useful suggestions during the work on this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Stanković, S. TimeFrequency Analysis and Its Application in Digital Watermarking. EURASIP J. Adv. Signal Process. 2010, 579295 (2010). https://doi.org/10.1155/2010/579295
Received:
Accepted:
Published:
DOI: https://doi.org/10.1155/2010/579295
Keywords
 Speech Signal
 Digital Watermark
 Wigner Distribution
 Watermark Embedding
 Chirp Signal