RST-Resilient Video Watermarking Using Scene-Based Feature Extraction

Watermarking for video sequences should consider additional attacks, such as frame averaging, frame-rate change, frame shu ﬄ ing or collusion attacks, as well as those of still images. Also, since video is a sequence of analogous images, video watermarking is subject to interframe collusion. In order to cope with these attacks, we propose a scene-based temporal watermarking algorithm. In each scene, segmented by scene-change detection schemes, a watermark is embedded temporally to one-dimensional projection vectors of the log-polar map, which is generated from the DFT of a two-dimensional feature matrix. Here, each column vector of the feature matrix represents each frame and consists of radial projections of the DFT of the frame. Inverse mapping from the one-dimensional watermarked vector to the feature matrix has a unique optimal solution, which can be derived by a constrained least-square approach. Through intensive computer simulations, it is shown that the proposed scheme provides robustness against transcoding, including frame-rate change, frame averaging, as well as interframe collusion attacks.


INTRODUCTION
The widespread utilization of digital data leads to illegal use of copyrighted material, that is, unlimited duplication and dissemination via the Internet. As a result, this unrestricted piracy makes service providers hesitate to offer services in digital form, in spite of the digital audio and video equipment replacing the analog ones. In order to overcome this reluctancy and possible copyright issues, the intellectual property rights of digitally recorded material should be protected.
For the past few years, the copyright protection problems for digital multimedia data have drawn a significant interest with the increased utilization of the Internet.
In order to protect the copyrighted multimedia data, many approaches, including authentication, encryption, and digital watermarking, have been proposed. The encryption methods may guarantee secure transmission to authenticated users via the defective Internet. Once decrypted data, however, is identical to the original and its piracy cannot be restricted. The digital watermarking is an alternative to deal with these unlawful acts. Watermarking approaches hide invisible mark or copyright information in digital content and claim the copyright. The mark should be robust enough to survive legal or illegal attacks. It is also desirable that some illegal attempts should suffer from the degradation in visual quality, without erasing the watermarks.
For an effective watermarking scheme, two basic requirements should be satisfied: transparency and robustness. Transparency means the invisibility of watermarks embedded in image data without degrading the perceptual quality by watermarking. Robustness means that the watermark should not be removed or detected by attacks, that is, signal processing, compression, resampling, cropping, geometric distortion, and so forth. Many watermarking algorithms for images have been developed, which are generally categorized into spatial-domain [1,2,3] and frequency domain techniques [4,5,6,7,8,9,10]. In most cases, image watermarking techniques in frequency domain, such as discrete cosine transform (DCT), discrete Fourier transform (DFT), and wavelet transform, are preferred because of their efficiency in both robustness and transparency. Specifically, from the viewpoint of geometric attacks, DFT-based or template-embedding watermarking algorithms yield better performance than the others in general [7,8,9].
In case of video watermarking, new kinds of attacks are available to remove the marks. These attacks include frameaveraging, frame-rate change, frame swapping, frame shuffling, and interframe collusion. Since video signals are highly correlated between frames, the mark in video is vulnerable to these attacks, which affect the mark adversely without degrading video quality severely. Since frames in a scene are so analogous, completely different watermarks in each frame may be detected and removed easily by a simple collusion scheme. Also, in case of applying an identical watermark to the whole video sequence, this mark can be easily estimated without satisfying the statistical invisibility. So, many video watermarking algorithms address these collusion issues [11,12,13]. Video sequences are composed of consecutive still images, which can be independently processed by various image watermarking algorithms. In this case, interframe collusions should be considered as in [11]. Also, three-dimensional (3D) transforms are good approaches for the video watermarking since they can be easily generalized from two-dimensional (2D) techniques for images and are robust against collusion attacks [12,13,14]. Watermarking in bit-stream structure can be another solution for video watermarking [15,16,17,18], but this approach may be vulnerable to re-encoding or transcoding.
In this paper, we present a novel video watermarking algorithm of feature-based temporal mark embedding strategy. Video sequence consists of a number of scene segments, and each scene may be a good temporal watermarking unit because the scene itself is always available after attacks of frame-rate change, frame swapping, frame shuffling, and so forth. In many cases, illegal distributors transcode the original video sequence to others, for example, re-encoding MPEG-2 video to MPEG-4, and generally this process forces the original data to suffer from the aforementioned attacks. Thus, we employ features extracted from each video scene as a watermarking domain. The watermark embedding procedure is composed of three steps: feature extraction, watermarking in feature domain, and inverse feature extraction. First, scene-change detection algorithms divide a video sequence into scenes using luminance projection vectors (LFVs) [19,20]. In each scene, one-dimensional (1D) frequency projection vectors (FPVs), which represent the characteristics of the frames, are extracted. An FPV is obtained from the radial-wise sum of log-polar map, generated from the DFT of the frame. This 1D FPV is known to be invariant to rotation, scaling, and translation (RST) [7,8,9]. Then all these vectors in a scene compose a 2D matrix, which is interpolated on the temporal axis and becomes projection vector time flow matrix (PVTM). Specifically, for an N × M PVTM, M is the length of predefined time flow and N is the length of the FPV. Secondly, a watermark is embedded in a 1D watermarking feature vector (WFV), which is generated from the PVTM using the same process of obtaining the FPV. In the proposed algorithm, scalings in image and temporal domains mean aspect-ratio change of frames and frame-rate change, respectively. Thus, this temporal mark embedding strategy is expected to be invariant to some video-oriented attacks. Moreover, since the embedding approach is not a one-toone mapping, inverse feature extraction should be considered. We find that constrained linear least-square method can achieve the global minimum of the optimization problem, and the inverse mapping from the watermarked feature vector (WFV) to the PVTM has a unique optimal solution. This paper is organized as follows. In Section 2, we present the efficiency and reasonability of temporal watermarking for video sequences. Then, the proposed algorithm is described in Section 3, where we present the watermark embedding and detection procedures. In Section 4, the inverse feature extraction, which inversely maps the watermark in the feature domain to the original video frame domain, is derived. Section 5 examines the performance of the proposed algorithm, and shows that the proposed scheme yields a satisfying performance, both in terms of transparency and robustness. In Section 6, we present the conclusion of this paper.

TEMPORAL WATERMARKING FOR VIDEO SEQUENCE
Since a typical video sequence is composed of many frames with temporal redundancy, statistical watermark-estimation, collecting and analyzing the video frames can be an effective attack against video watermarking. Frames within a scene are highly correlated. So, one can exploit the temporal redundancy, either of the frames or of the watermark, to estimate and remove the watermark signal. This collusion has become an important issue for video watermarking. Su et al. [11] have defined two types of linear collusion attacks. One is due to a fixed watermark pattern in large numbers of visually distinctive video frames, and the other is due to independent watermark patterns in large numbers of visually similar frames. Based on the statistical analysis of linear collusions, they presented a spatially localized image-dependent framework for collusion-resilient video watermarking. Frame-based watermarking algorithms are employed, and it is shown that the spatial domain approach outperforms the DFT approach in case of severe compression, while the DFT approach is more robust to general attacks. Alternative approaches, based on the idea of temporal watermarking, are available to cope with the interframe collusion and consider frames in a sequence jointly. Most of these algorithms are generally based on the extended versions of 2D transforms, that is, 3D DFT or 3D wavelet transform.  Figure 1: Framework of the proposed algorithm.
Swanson et al. [12] proposed a scene-based video watermarking algorithm using a temporal wavelet transform of the video scenes. A wavelet transform along the temporal domain of a video scene results in a multiresolution temporal representation of the scene: static (lowpass) and dynamic (highpass) video components. They also used perceptual models for an invisible and robust watermark. Deguillaume et al. [13] employed the 3D DFT in which a watermark and a template are encoded in the 3D DFT magnitude of video sequence and in the log-log-log map of the 3D DFT magnitude, respectively. These algorithms are also resilient to the temporal modifications of frame-rate change, frame swapping, frame dropping, as well as frame-based degradation and distortion. Temporal watermarking strategy must be reliable against such attacks. A scene can be a good segment unit in temporal domain as in [12]. Scenes are always maintained in spite of the aforementioned temporal attacks. So, the proposed algorithm is based on the idea of temporal mark embedding, but it is not just an extension of 2D transforms. The proposed algorithm uses a new feature domain for watermarking. The feature-based watermarking facilitates the 3D problem of temporal mark-embedding and real-time mark detection while providing the resilience against collusion and temporal attacks.

Feature space for video watermarking
In many watermarking systems, the watermarks are embedded in the transform domain, such as DFT domain or DCT domain. That is, these systems use the transform domain as the watermark space, in which the watermarks are inserted and detected [4]. In these cases, the dimension of the watermark domain is the same as that of the media space. For video watermarking, simple extensions of these transforms have been applied to video sequences in [12,13,14]. These algorithms provide effective performance against interframe collusion and noise-like attacks. In this paper, however, we employ the feature domain as the watermark space. The feature has two meanings: some summarization of video con-tents and a 1D mark embedding vector derived from the watermark signals. We modify the feature according to the watermark signals. The proposed algorithm has the following three advantages.
(1) Complexity: the dimension of a video sequence is often too large, so we use the feature as the watermark domain, which has a reduced dimension. (2) Robustness: the feature is RST-invariant.
(3) Transparency: we select the masking method 1 minimizing the error to achieve a good invisibility.
We have defined two types of the feature spaces; one represents frame and video contents, and the other is the watermarking space. As shown in Figures 1 and 2, the PVTM and the FPVs represent video contents in a scene and corresponding frames, respectively, and the WFV is considered as a watermarking space. Here, the FPV and the WFV have a similar structure that is RST-invariant. In [7], Lin et al. proposed an RST-resilient algorithm for the image watermarking. They defined a 1D projection of the magnitude of the Fourier spectrum, denoted by g(θ) and given by where I(ρ j , θ) is the Fourier transform of an image i(x, y) in log-polar coordinates. g(θ) is invariant to both translation and scaling, and rotations result in a circular shift of the values of g(θ). This strategy is employed basically in embedding a watermark vector to the WFV space in the proposed algorithm, except for the inverse mark embedding to the original signals. We consider this inverse problem a linear constrained problem, which will be discussed in Section 4.
In the proposed algorithm, the meanings of the RST are somewhat different from those in image watermarking algorithms. The PVTM is invariant to temporal attacks, such as frame-rate change and frame scaling, which may occur during the process of transcoding, due to the interpolation along the temporal axis in the process of constructing PVTM. The rotation in a frame yields a circular shift in the PVTM domain, which does not change the DFT magnitude of the PVTM, but changes the phase component only. The DFT magnitude itself is invariant to the translation of a frame, and moreover, the PVTM domain and its DFT magnitude are immune against the translation. For interframe collusion, the effect of PVTM is the same as the frame-rate change as mentioned before. Thus, this feature-based watermarking strategy is RST-invariant and reasonable for video watermarking, and we can expect that the proposed approach would provide the robustness against the aforementioned attacks as well as interframe collusions.

Watermark embedding
In the proposed scheme, a single-bit watermark vector of length N is embedded and detected, in which the presence of the watermark claims the ownership for the copyright material. As in Figures 1 and 2, the watermark embedding algorithm can be summarized as follows.
(1) Divide full video sequences into scenes using the distance function in [19,20] in which the measuring functions employing the LPVs, instead of full frames, are used for efficiency. The LPV is the projection of luminance image on column or row axis. Let f i denote an ith image of size M × N in a scene, and then the luminance projections for the nth row and the mth column, denoted by l r n and l c m , respectively, are l r So, the dissimilarity between ith and jth frames can be defined as follows: In many cases, the LPV is extracted from the DC image, which is 1/64 of the original image size [19]. This strategy can decrease the calculation complexity and also guarantee robustness against video coding.
(2) In each scene, extract the FPVs from the frames. First, each frame is put to an l×l square image, padded with trailing zeros, where l is generally confined to powers of two for the fast Fourier transform (FFT). Second, we transform the zeropadded image of the kth frame i k (x, y) into its Fourier transform I k (ξ 1 , ξ 2 ). Next, zero-frequency component of I k (ξ 1 , ξ 2 ) is shifted to the center of spectrum by swapping the first and third quadrants and the second and forth quadrants. Finally, the FPV of the kth frame can be obtained through applying a projection operator R to |I k (ξ 1 , ξ 2 )| given by The symbol R, denoting the Radon transform operator, is also called the projection operator. For matrices, R θi is the projection operator along a radial line oriented at an angle θ i at a specific distance from the origin. More specifically, X can be projected to x(i) for the angle θ i , and the resulting x is a column vector containing the Radon operation for some prespecified degrees written as The Radon operation needs the resampling and interpolation due to the coordinate conversions, and, in this work, we adopt the bilinear interpolation.
(3) The PVTM is constructed with the group of the FPVs in a scene, more specifically, which goes through the interpolation along the temporal axis. As shown in Figure 2, the same process of step (2) is also applied to the 2D matrix PVTM, denoted by V 1 . That is, the WFV v 2 is obtained by applying (4) to log |V 1 (ξ 1 , ξ 2 )|, where V 2 (ξ 1 , ξ 2 ) is the DFT of V 1 . The WFV v 2 can be written as where v 2 is a 1D vector and we modify the vector with a watermark message by a mixing function f wm (v 2 , w 2 ). (4) Compute the watermarked version v 2 using a watermark mixing function f wm (v 2 , w 2 ) given by where α and w 2 are a weighting factor and the watermark message, respectively. (5) The generated signal is in the 1D vector form, and its inverse function, that is, from a lower-dimensional space to the original Fourier magnitude, cannot be defined definitely. Also, mapping the PVTM to original video frames has a similar problem. It is often the case that linear programming can be employed in order to find the solution for these constrained problems. So, we adopt a linear programming method which will be explained in Section 4.

Watermark detection
In order to determine the presence of the watermark, in many cases, a correlation-based detection approach can be used. That is, a correlation coefficient, derived from a given watermark pattern and a signal with/without the watermark, is used to check the presence of the watermark. The watermark is determined to be present if the correlation value is larger than a specific threshold T and vice versa. This strategy is simple and effective for single-bit watermarking systems [5,7,21], which holds true for the proposed algorithm. Moreover, in this paper, the 1D feature vector v 2 is adopted as a watermark space, which alleviates the complexity of the detection procedure, and thus makes the real-time detection possible.
The procedure of the watermark detection follows that of the watermark embedding. First, video segments s are extracted from a suspected video content c. Then, the WFV v 2 , generated from the steps (1)-(3) of the watermark embedding procedures, is correlated with the expected watermark signals w to obtain the distance metric d(v 2 , w 2 ) given by If the metric d(v 2 , w 2 ) is greater than a threshold T, which may be signal-dependent, the signal is declared to contain the watermark. Otherwise, the signal is declared to be not a watermarked one. As shown in Figure 3, however, the feature vector does not satisfy the properties of the random sequence completely. So, we cannot expect that (7) yields optimum results. According to the detection theory, the correlation detectors are optimum only for a signal modeled as additive white Gaussian noise (AWGN) [4,21,22]. Therefore, the detection performance can be improved by making a nonwhite signal to a signal with a constant power spectrum. This can be achieved by a regression method using least-squares fitting [23]. In the proposed algorithm, the feature vector v 2 is predicted by a kth-degree polynomial written as v 2 = a 0 + a 1 x + · · · + a k x k ; ( 8 ) and the detector uses the regression residuals e v of the feature vector v 2 given by where The computer simulation shows that the detection performance can be improved by the regression method.

INVERSE FEATURE EXTRACTION
As shown in Figure 1, the watermarking procedure is divided into two stages: generation and modification of the 1D WFV  and its inverse. The forward processing, in which the watermark vector is weighted and added to the WFV domain, is simple. However, its inverse mapping, which cannot be obtained by straightforward methods, has no unique solution. So, we approach the inverse solution using a linear programming approach. A linear programming problem is defined, as its name implies, by linear functions of the unknowns; the objective is linear in the unknowns, and the constraints are linear equalities or linear inequalities in the unknowns. In this paper, a method for the constrained linear least-square problems is adopted to find the watermark mask. The watermarked signals v 2 , v 1 , and s in Figure 1 are obtained in the reverse order of the feature extraction. During processing, the 1D watermark vector w 2 is weighted and added to the WFV domain, yielding the WFV v 2 . It is difficult to map the 1D watermarked vector v 2 to the 2D signal v 1 . In the same way, it is also difficult to map s to the corresponding video frame. In each domain, that is, S, F 1 , and F 2 , the modified signals can be represented as weighted sums of the original signal and its watermarking mask. That is, since the feature extraction and its inverse mapping are linear operations, the watermark, embedded in the WFV and mapped to video frames, can be represented in a masking form: In the inverse feature extraction, the watermark signals w 1 and w 0 are constructed from the watermark w 2 in the WFV, which are in the 2D feature domain of the PVTM and the original video, respectively. So, we concentrate on only the watermark mask and not on the watermarked signals.

Inverse log-polar projection
In order to find the optimal solution of w 1 from w 2 , we follow the forward processing. Note that the watermark W 2 modifies only the magnitude of the Fourier transform of the cover data V 1 , as shown in Figure 4, and hence the Fourier transforms of W 1 and V 1 have the same phase in common. W 1 and V 1 can be written as where n f and n t are the size of FPV and the number of video frames in the scene, respectively, and O c is a column stacking operator [24] given by where A column stacking operation on an m × n matrix X generates a 1D mn × 1 column vector x; the (i, j) element of X is mapped to (m( j − 1) + i, 1) element of x. Each matrix is reconstructed to l × l square matrix by W p 1 = I l,m W 1 I n,l and V p 1 = I l,m V 1 I n,l .
As shown in Figure 4, assuming that the PVTM is an image, the watermarked data V 1 can be written as V 1 = V 1 + αW 1 from (11). The cover data V 1 and the unknown watermark mask W 1 have the same dimension. The 1D watermarked vector v 2 for detection cannot be exactly identical with the watermarked vector v 2 obtained by feature extraction from the 2D matrix V 1 . The reason is that the inverse feature extraction function from w 1 to W 1 is ill-conditioned and it is not practical to perform this inversion precisely. Instead, we use a linear least-square optimization method. We construct the 2D DFT magnitude W 1 from the 1D vector w 2 with two constraints; one is the feature extraction condition from W 1 to w 1 , and the other is that the inverse DFT (IDFT) values of the generated W 1 , which have the same phase components as V 1 , should be zeros in zero-padding area as in Figure 4(a).
The log-polar projection of the Fourier transform of W 1 , or W 1 , should be the watermark vector w 2 , which can be written as R log |W 1 | = w 2 . As mentioned above, W 1 has the same phase as V 1 given by where Thus, we define the constrained problem as where H is a weighting factor and positive semidefinite, considering the human visual system (HVS) and the conversion from the feature domain to the DFT domain [25]. In case that the matrix H is an identity, the object function w T 1 Hw 1 becomes the Euclidean or l 2 -norm of w 1 . The magnitude of low frequencies can be much larger than the magnitude of mid and high frequencies. In such case, low frequencies can be too dominant. To avoid this problem, Lin et al. sum the logs of the magnitudes of the frequencies along the columns of the log-polar Fourier transform, rather than summing the magnitudes themselves. A beneficial side effect of this is that a desired change in a given frequency is expressed as a fraction of the frequency's current magnitude rather than as an absolute value. In the proposed approach, a weighting matrix H can be substituted instead of the logarithm operation. This is better from a fidelity perspective.
Note that the zero padding is applied before the Fourier transform to increase the resolution. In order to obtain an optimal watermark mask, additional constraints are required besides the aforementioned one. That is, for the inverse Fourier transform of generated watermark mask with the same phase as the PVTM, the corresponding values to the region outside of the PVTM should be zeros. This strategy minimizes the loss of the energy which leaks from the image outside during IDFT. So, (15) has another constraint given by Equation (17) can be rewritten as Finally, from (15), (16), and (18), we have which is a least-square optimization problem with linear constraint equation. So, we can solve this problem using the quadratic programming [26,27]. The construction of W 0 from W 1 follows the similar procedure.

Uniqueness and existence of the solution
In the proposed scheme, the feature extraction and its inverse can be formulated as a linear constrained problem given in the form where x, A, and b can be thought of as watermark in the inverse-feature domain, feature extraction matrix, and watermark in the feature domain, respectively. Since the constraints of (20) are all linear and the Hessian H is positive semidefinite, the objective function is a convex form and its solution is known to exist uniquely in the optimization theory. Thus, (20) can be solved through the simple convex quadratic programming [26]. This problem has a unique global minimum, and thus we can obtain the unique solution of this problem.

SIMULATION RESULTS
In order to evaluate the invisibility and robustness of the proposed algorithm, we take four H.263 videos: Foreman, Carphone, Mobile, and Paris, which are in the standard CIF format (352 × 288) with the frame-rate of 25 frame/s. We construct four scenes intentionally from the above video sequences in which the first 180 frames, 120 frames, 175 frames, and 125 frames from Foreman, Carphone, Mobile, and Paris are employed for tests, respectively. Watermark signals are embedded only in luminance for each scene. Also, we use MPEG-2 (704 × 480) sequences, Football (125 frames) and Flower Garden (85 frames), which have the frame-rate of 30 frame/s. The robustness against incidental or intentional distortions can be measured by the correlation values. In the proposed scheme, two aspects should be considered; one is the positive detection ability in case that a watermark is present,  in which the correlation values should be above a given threshold, and the other is the negative detection ability in case that a watermark is not present. In the computer simulation, various attacks, including video compression as well as intentional RST distortions, are applied to test the robustness. For these attacks, the overall performance may be evaluated by the relative difference between the correlation values when a watermark is present or not. As a result, the overall correlation value is compared with a threshold to determine whether the test video is watermarked. An experimental threshold is chosen to be 0.55, that is, a correlation value greater than or equal to 0.55 indicates the presence of the copyright information. A correlation value less than 0.55 indicates the absence of a watermark. Due to the restricted transmission bandwidth or storage space, video data might suffer from a lossy compression. More specifically, video coding standards, such as MPEG-1/2/4 and H.26x, exploit the temporal and spatial correlations in the video sequence to achieve high compression ratio. We test the ability of the watermark to survive video cod-ing for various compression rates. Each sequence is considered as a scene, where an identical watermark signal is embedded, and each watermarked scene is encoded with the H.263 or MPEG-2 coder. First, we employ the H.263 to encode CIF videos at the variable bit rate (VBR  Table 3. The PSNR and bit rate results are varied according to the characteristics of each sequence. For example, a watermarked Foreman video frame encoded at 324.67 kbps is shown in Figure 5b, which has an objective quality of 33.45 dB on the average. However, Carphone sequence is encoded at 259.06 kbps with the same quantizer. Note that the Foreman sequence has a faster motion than the Carphone sequence, and as a result, it requires additional bit  rates to encode the video. Figure 5a is the original frame of Figure 5b. Figure 5c shows the 2D DFT magnitude of the watermarked frame in log-scale. The equalized watermark mask is shown in Figure 5d. As shown in Table 1, the watermarked Foreman sequence coded with compression ratio from 37:1 to 163:1 yields the detection results of correlation values from 0.91 to 0.66. Also, the results on the watermarked Carphone, Mobile, and Paris sequences are summarized in Tables 1 and  2 in which corresponding correlation values are from 0.90 to 0.71, from 0.87 to 0.57, and from 0.91 to 0.67, respectively. The detection results for the MPEG-2 video sequences are shown in Table 3. Each test is performed with 500 watermark keys. The detection results for the correct key are always above the given threshold 0.55, and the correlation values are under about 0.4 in case of no watermark. Next, we illustrate the robustness of the proposed scheme against RST distortions. In most cases, RST distortions are accompanied by cropping. Figures 6a, 6b, 6c, and 6d show examples of rotation, rotation-cropping, and scaling for Carphone sequence, respectively. With the proposed algorithm, since the cropping does not lead to the loss of the synchronization, the disturbance from the cropping can be classified into the signal processing attacks. So, the distortion due to the cropping can be viewed as additive noise, which may degrade the detection value but not severely. In the simulation, each frame is modified with rotations of −5 • and 5 • , without or with cropping of maximum 16%, and scaling up to the original image size, as shown in Figure 6. Also, translation and scaling for each frame are performed. The detection results after rotation without cropping for Foreman sequence are shown in Figure 7. Figure 7a shows the correlation values without rotation for 500 watermark keys, and Figures 7b and 7c show the correlation values after rotation by −5 • and 5 • , respectively. Figure  correlation values over the 500 runs in case of no watermark. In Figures 8 and 9, the detection results after rotation without cropping for Carphone, Mobile, and Paris sequences are presented. The correlation values after rotation with cropping for various video sequences are shown in Figure 10. In all cases, the presence of a watermark is easily observed, and the maximum correlation values without watermark are under about 0.4. The DFT itself might be RST invariant, but it is often the case that the rotation with or without cropping yields noise-like distortions on the image. The simulation results show that these distortions affect correlation values only slightly in the proposed watermarking strategy. The correlation detections on translation attacks are performed, and the plots are shown in Figure 11. In case of translation, we cropped the upper left part of each frame, and the reference position is translated, and the translation ratio in Figure 11 means noncropping ratio. Figure 12 shows the correlation values after scaling for various video sequences. Also, the presence of the embedded watermark is easily determined. Despite loss of 50 % or more by translation or scaling, the correlation results are maintained without much variance. In the proposed scheme, rotation and scaling in the frame domain yield a circular shift in the corresponding FPVs and decrease the power of them, respectively. They do not change the DFT magnitude of the PVTM, but the phase component only. As a result, in spite of noise-like distortions due to the RTS in the image domain, the WFV is almost invariant.
Some of the distortions of particular interest in video watermarking are those associated with temporal processing, for example, frame-rate change, temporal cropping, frame dropping, and frame interpolation. As usual, these uniform temporal attacks may occur in common video processing, such as transcoding. To test frame dropping and interpolation, we dropped the odd index frames from the test sequences, which means that the frame-rate decreases to the half. For the case of frame averaging, the missing frames are replaced with the average of the two neighboring frames. In these cases, the proposed algorithm detects the watermark perfectly. In the proposed algorithm, the frame-rate change means scaling in the PVTM space. That is, most of the aforementioned temporal distortions are represented by those in the PVTM space. So, since these temporal attacks do not change the DFT magnitude of the PVTM, the WFV, extracted from the PVTM, is also invariant. The detection results after frame-rate changes, which can be achieved by dropping every nth frame, are shown in Figure 13 for various video sequences. Also, those after frame dropping and averaging are shown in Figure 14, in which every nth frame is interpolated by averaging its neighboring frames. As shown in Figures 13 and 14, it is shown that the detection value is reduced as the deformation increases, due to the frame averaging or frame dropping. This simulation shows, nevertheless, that the uniform frame-rate change cannot prevent the proposed algorithm from detecting the watermark signal even though half of the frames are lost. However, in case of random temporal attacks, that is, random frame dropping, it is expected that the proposed algorithm has a weakness not to find the synchronization and not to compensate the lost frames. Generally, the uniform deformation can be recovered by the DFT without loss of synchronization, which is shown by the proposed computer simulation. On the other hand, random loss in the time or spatial domain must cause the DFT signals uncorrelated with the original. That is, the properties of the DFT cannot guarantee that the algorithm can recover the lost signals without exact information about their position. Thus, the proposed algorithm based on the DFT also cannot cope with the random frame dropping attacks and cover all general temporal attacks.

CONCLUSION
This paper presented a novel feature-based watermarking scheme for video sequences. In order to cope with video-oriented attacks, such as frame averaging, frame-rate changes, and interframe collusion, we employ a temporal watermarking algorithm, in which a watermark is embedded temporally to 1D projection vectors of the log-polar map, which is generated from the DFT of a 2D PVTM matrix. Each PVTM is segmented using well-known scene change detection algorithms. This strategy is very effective in order to cope with uniform temporal attacks. However, the proposed algorithm is not robust against random temporal attacks. In this paper, the feature extraction as well as its inverse processing were defined, and it was shown that the inverse problem yields a unique optimal solution subject to a few constraints. The computer simulation results demonstrated that the proposed scheme yields an acceptable performance for transparency and robustness against MC-DCT-based compression. Also, it was shown that the proposed scheme provides robustness to some video-oriented attacks, including frame-rate change, frame averaging, as well as interframe collusion.