 Research
 Open access
 Published:
Counteracting geometrical attacks on robust image watermarking by constructing a deformable pyramid transform
EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 119 (2013)
Abstract
Counteracting geometrical attacks remains one of the most challenging problems in robust watermarking. In this paper, we resist rotation, scaling, and translation (RST) by constructing a kind of deformable pyramid transform (DPT) that is shiftinvariant, steerable, and scalable. The DPT is extended from a closedform polarseparable steerable pyramid transform (SPT). The radial component of the SPT's basis filters is taken as the kernel of the scalable basis filters, and the angular component is used for the steerable basis filters. The shiftinvariance is inherited from the SPT by retaining undecimated highpass and bandpass subbands. Based on the designed DPT, we theoretically derive interpolation functions for steerability and scalability and synchronization mechanisms for translation, rotation, and scaling. By exploiting the preferable characteristics of DPT, we develop a new templatebased robust image watermarking scheme that is resilient to RST. Translation invariance is achieved by taking the Fourier magnitude of the cover image as the DPT's input. The resilience to rotation and scaling is obtained using the synchronization mechanisms for rotation and scaling, for which an efficient templatematching algorithm has been devised. Extensive simulations show that the proposed scheme is highly robust to geometrical attacks, such as RST, cropping, and row/column line removal, as well as common signal processing attacks such as JPEG compression, additive white Gaussian noise, and median filtering.
1. Introduction
Counteracting geometrical attacks such as rotation, scaling, and translation (RST) remains one of the most challenging problems for robust watermarking. This is because geometrical attacks easily desynchronize the watermark, degrading its robustness dramatically. To address such problems, a number of RSTinvariant blind watermarking schemes have been developed over the past two decades. These schemes can be roughly categorized into five paradigms, namely exhaustive search, invariant domain, autocorrelation, featurebased implicit synchronization, and geometrical correction [1–4]. These are briefly described below.
The exhaustive search method [5–7] iteratively corrects each geometrical distortion in the search space and then evaluates the watermark extracted from the geometrically corrected carrier accordingly. This method generally leads to high computational complexity and a large probability of false positives. The invariant domain approach [8–14] eliminates the need to identify geometrical distortions by embedding the watermark in a domain that is invariant to such distortions. However, this method may encounter the issue of interpolation approximation during the geometrically invariant transform. Autocorrelationbased techniques periodically insert the watermark in the cover and use a crosscorrelation function to locate the periodic autocorrelation peaks, which indicate the geometrical transform that has been performed. The schemes discussed in [15–17] are typical examples of this category. The fourth category exploits salient features to achieve geometrical synchronization, as presented in the schemes of [3, 10], and [18–20]. Under this approach, the embedder binds the watermark with the geometrically invariant salient features. The watermark is recovered inversely by the receiver, who seeks the salient features that still exist, even after severe geometrical distortion. In general, this category is somewhat robust against geometrical distortions, but it may degrade greatly if the salient feature detection fails. The final category estimates the geometrical distortion parameters, thus permitting geometrical correction and watermark extraction. The template is generally constructed and embedded in the cover, and the geometrical parameters are sought via a particular technique. Several examples are shown in the schemes described in [21–24]. In addition to the template, support vector machines (SVMs) have also been incorporated to obtain geometrical parameters. For example, a number of recently developed schemes [25–28] generate patterns, such as the inserted template and Zernike moments, from geometrically attacked watermarked images. These patterns are then input to the SVM to train the classification model, and finally, the trained model is used to predict the geometrical parameters of the tobechecked image.
In this paper, we develop a new geometrical correctionbased robust image watermarking scheme by constructing a deformable pyramid transform (DPT) that is shiftinvariant, steerable, and scalable. This is motivated by the scheme of [24], in which a steerable pyramid transform (SPT) with shiftinvariance and steerability [29] is exploited to estimate, with an auxiliary inserted template, the rotation angle. This allows for rotation correction and watermark extraction. Although the scheme in [24] is highly robust against rotation, it cannot resist scaling attacks because of the lack of scalability in SPT. To counteract both the rotation and scaling, a kind of pyramid transform (PT) with shiftinvariance, steerability, and scalability is needed. However, such a PT has not, to the best of our knowledge, been reported in the literature. Inspired by this situation, we design a shiftinvariant, steerable, and scalable DPT. This is extended from SPT as follows.
We start by introducing the SPT. In essence, SPT is a variant of the wavelet transform (WT). As illustrated in [29], the conventional orthogonal or biorthogonal WT is sensitive to translation because of its critical sampling. That is, once the input signal has been translated slightly, its wavelet coefficients are not the translated versions of the original wavelet coefficients, and the information represented within a wavelet subband of the translated signal is not the same as that in the original wavelet subband. To address this issue, Freeman and Adelson [30] proposed a kind of steerable filter that can be used to synthesize any filter at an arbitrary orientation via a linear interpolation. This is termed steerability. Furthermore, Perona [31, 32] developed scalable filters that can be used to interpolate any filter on a scale within a certain range, which is called scalability. These steerable and scalable filters were further integrated to give deformable filters.
In [29], Simoncelli et al. analyzed the translation invariance of WT and then generalized it to the concept of shiftability. In brief, shiftability implies that any filter at an arbitrary position, orientation, and scale can be obtained through a linear interpolation of the designed shiftable, steerable, and scalable basis filters, respectively. The shiftability in orientation and scale is essentially equivalent to the steerability and scalability proposed in [30–32], respectively. In addition, Simoncelli et al. also proposed the concept of joint shiftability, which allows shiftability in the subset of position, orientation, and scale to be achieved simultaneously. As an illustration of these concepts, they also designed a kind of SPT that is shiftinvariant and has shiftability in orientation (i.e., steerability).
In [33], Karasaridis and Simoncelli analyzed constraints for SPT and subsequently designed an SPT under these constraints via a numerical approach. Unfortunately, this SPT has nonperfect reconstruction. In contrast, Portilla et al. [34] developed an SPT with perfect reconstruction.
In summary, the filters developed in [30, 33, 34] are mainly steerable analysis filters or SPTs with steerability but without scalability. Although the filters designed in [30, 31] have both steerability and scalability, they do not incorporate synthesis filters for reconstruction and thus cannot be considered as PTs. To the best of our knowledge, no PTs with shiftinvariance, steerability, and scalability have been reported in the literature. In the interest of counteracting RST in robust image watermarking, we are motivated to extend the SPT with shiftinvariance and steerability to include scalability. This is termed the DPT for convenience. To this end, we adopt the SPT with perfect reconstruction developed in [34]. The steerable filters of this SPT are represented in a polarseparable form, where the angular components are designed so as to achieve the steerability. This implies that scaling the steerable filters would be equivalent to dilating the radial component. Thus, according to the shiftability framework in [29], we can take the radial component of the SPT's steerable filters as the kernel for constructing the scalable filters. Furthermore, combining the scalable and steerable filters derived from the radial and angular components, respectively, gives rise to the scalability and steerability of the DPT. Its shiftinvariance is inherited from the SPT by retaining undecimated highpass and bandpass coefficients. In this way, we construct a DPT with shiftinvariance, steerability, and scalability.
In an attempt to apply the DPT in robust watermarking to counteract RST, we first exploit the shiftinvariance, steerability, and scalability of DPT to theoretically derive a mechanism for RST synchronization. As will be shown, the DPT coefficients of the translated signal are the translated versions of those of the original signal, which is the essence of shiftinvariance. The relationship between the DPT coefficients of the original signal and those of the rotated and scaled signal is characterized by a linear interpolation function parameterized using the rotation angle and scaling factor.
In this paper, based on the derived RST synchronization mechanism, we develop a new robust image watermarking that is resilient to RST. According to the aforementioned essence of DPT's shiftinvariance, the translation of the input signal should affect the synchronization of rotation and scaling. To uncouple the translation from rotation and scaling, we take the Fourier magnitude of the cover image as the input to the DPT. This achieves true translation invariance. Rotation and scaling attacks are counteracted by first deploying the rotation and scaling synchronization mechanism to estimate their parameters. The estimated parameters are then used to correct the rotation and scaling that has been performed. In blind watermarking, the original signal cannot be used by the receiver, so we resort to the template to estimate the parameters of rotation and scaling attacks. Specifically, we insert the template and watermark at level 1 and levels 2 to 3, respectively, of the DPT pyramid in the embedding process. During the detection process, we exploit the rotation and scaling synchronization mechanism to identify the rotation angle and scaling factor, and further use these estimated parameters to correct the rotation and scaling distortions and recover the watermark from the geometrically corrected signal. Extensive experimental results demonstrate that the proposed algorithm is highly robust against geometrical attacks such as RST and exhibits favorable performance against common signal processing, including JPEG compression, median filtering, Gaussian noise, and lowpass filtering. In addition, we observe comparable or higher robustness with respect to other algorithms in the simulation.
The rest of this paper is organized as follows. Section 2 reviews the SPT with shiftinvariance and steerability, and the construction of the DPT with shiftinvariance, steerability, and scalability is detailed in Section 3. In Section 4, we theoretically derive the RST synchronization mechanism. We describe the proposed robust image watermarking scheme in Section 5, and present our experimental results in Section 6. Finally, the conclusions are discussed in Section 7.
2. Steerable pyramid transform with shiftinvariance and steerability
In this section, we review the SPT with shiftinvariance and steerability. We first describe the constraints for SPT given in [33] and then introduce one closedform SPT presented in [34].
In [33], Karasaridis and Simoncelli evaluated the constraints for the recursive multiscale SPT. Figure 1 illustrates a singlestage SPT, and the multistage version can be formed by recursively inserting the block enclosed in the dashed box into the filled circle. To reach the perfect construction, the SPT should meet the following constraints [33]:
where ω = (ω _{ x }, ω _{ y }) is the frequency vector in the Fourier domain, H _{0}(ω) and L _{0}(ω) denote the nonoriented highpass and lowpass filters, respectively, and L _{1}(ω) and B _{ k }(ω) (k = 0, …, K − 1) represent the narrowband lowpass filter and the oriented bandpass filter, respectively. Eqs. (11), (12), and (13) describe the unit system response amplitude, recursion relationship, and aliasing cancellation, respectively. Furthermore, the following constraint must hold to achieve steerability:
where θ = arg(ω), θ _{ k } = πk / K, and B\left(\mathbf{\omega}\right)=\sqrt{{\displaystyle \sum}_{k=0}^{K1}{\left{B}_{k}\left(\mathbf{\omega}\right)\right}^{2}}.
Under the constraints in Eqs. (1) and (2), Karasaridis and Simoncelli employed a numerical technique to design the SPT [33], but unfortunately, this resulted in an SPT with nonperfect reconstruction. In contrast, Portilla et al. [34] devised an SPT with perfect reconstruction. This satisfies the constraints described above and can be represented in a closed form. Because perfect reconstruction is a natural requisite for watermarking, we only introduce the closedform SPT in [34]. This SPT is represented in the Fourier domain, with polarseparable filters written as:
where r=\left\mathbf{\omega}\right=\sqrt{{\omega}_{x}^{2}+{\omega}_{y}^{2}}, θ = arg(ω), and H(r) and G _{ k }(θ) are defined as:
The filters L _{0}(r, θ) and H _{0}(r, θ) are thus constructed as:
Note that the highpass filter H _{0}(r, θ) in [34] is also split into a number of oriented subbands, i.e., H _{0}(r, θ) = H(r / 2)G _{ k }(θ). Because the oriented highpass subbands will not be used in our scheme, they have been equivalently simplified as Eq. (8).
3. Design of deformable pyramid transform with shiftinvariance, steerability, and scalability
In the interest of counteracting RST in robust watermarking, we are motivated to design a DPT with shiftinvariance, steerability, and scalability. We take the SPT in [34] with shiftinvariance and steerability as the starting point. Based on such an SPT, we further achieve scalability by constructing scalable basis filters from the steerable ones, B _{ k }(r, θ). According to the theory of shiftability in [29], scalable basis filters are essentially scaled versions of B _{ k }(r, θ). As scaling B _{ k }(r, θ) is equivalent to scaling H(r) according to Eq. (4), H(r) can be taken as the kernel for constructing scalable basis filters. That is, the radial component H(r) in Eq. (4) is used to achieve scalability, and the angular components G _{ k }(θ) are used to satisfy steerability. Together, this results in the joint steerability and scalability. Furthermore, keeping the highpass and bandpass subbands undecimated, as in the SPT, yields the property of shiftinvariance. Continuing with this line of thought, we can construct the DPT, as shown in Figure 2. This achieves the desired characteristics of shiftinvariance, steerability, and scalability. The C _{ j }(r)(j = 0, 1, …, J − 1) in Figure 2 denote the scalable filters designed from the kernel H(r).
It can be observed from Figure 2 that perfect reconstruction requires the following constraint to be satisfied:
By comparing Eq. (9) to Eqs. (11) and (4), we have
Below, we determine a suitable number of scalable basis filters, J, derive the closedform C _{ j }(r), and obtain the interpolation functions for steerability and scalability.
3.1. Construction of scalable basis filters
According to the sufficient and necessary condition of shiftability in [29], the number of basis filters is equal to or greater than the number of Fourier frequencies with nonzero magnitude, where the Fourier frequency denotes the kernel's frequency in the form of an imaginary exponent. Because H(r) is a piecewise function in the Fourier domain, we determine the number of scalable basis filters in a piecewise fashion, as follows.
First, consider the case r∊ (π/4, π/2) where H(r) = cos ((π/2) log_{2}(2r/π)). Here, H(r) can be treated as a function that has undergone a logarithmic warping operation, i.e., H(r) = cos(ρ(2r/π)), where ρ(2r/π) = π log_{2}(2r/π)/2 ∊ (− π/2, 0). Because warping operations do not, according to [29], affect the property of shiftability, the number of scalable basis filters for r∊ (π / 4, π / 2) depends on the nonwarping kernel \tilde{H}\left(r\right)=\mathrm{cos}\left(2r/\pi \right)=\left({e}^{j2r/\pi}+{e}^{j2r/\pi}\right)/2. Clearly, there are two Fourier frequencies with nonzero magnitude, and thus, the number of scalable basis filters for r∊ (π / 4, π / 2) satisfies J ≥ 2. For simplicity, we choose J = 2 and construct, according to [29], the two scalable basis filters C _{ j }(r)(j = 0, 1) as:
where a _{ j }(a _{ j } > 0) meets the constraint in Eq. (10), and R _{ j }∊ (− π / 2, 0) is set as:
which aims to generate frequency subbands with equal size on a logarithmic axis. To make the scalable basis filter reflectionshiftable [29], we further design C _{ j }(r)(j = 0, 1) as:
By substituting Eqs. (13) and (12) into Eq. (10), we have
As Eq. (14) is underdetermined, there exist many values of a _{ j } that satisfy Eq. (14). By simply setting a _{0} = a _{1}, we have the following solutions:
Therefore, the two scalable basis filters for the case r∊ (π / 4, π / 2) are constructed as:
We proceed to handle the case r∊ (0, π / 4]. The kernel H(r) is H(r) = 0, and thus, J ≥ 0 holds. For the case r∊ [π/2, π], H(r) is represented as H(r) = 1, and hence, we have J ≥ 1. For the convenience of construction, we uniformly adopt J = 2 scalable basis filters for all three cases. Under the constraint of Eq. (10), the two scalable basis filters for r∊ (0, π / 4] and r∊ [π / 2, π] are derived as C _{0}(r) = C _{1}(r) = 0 and {C}_{0}\left(r\right)={C}_{1}\left(r\right)=1/\sqrt{2}, respectively.
In summary, the two scalable basis filters are constructed as follows:
3.2. Derivation of interpolation function
Under the shiftability framework [29], the interpolation function is parameterized by translation distance, rotation angle, or scaling factor, and will be used to interpolate the filter (response) at an arbitrary spatial position, orientation, or scale. Because the designed DPT is shiftinvariant, we mainly derive interpolation functions for steerability and scalability.
We start with the derivation of the interpolation function for steerability. In the interest of reducing the computational complexity of geometrical synchronization, we adopt K = 2 steerable basis filters, i.e., G _{0}(θ) = cos (θ) and G _{1}(θ) = cos(θ − π / 2) according to Eq. (6). From the sufficient and necessary condition of shiftability [29], the steerable interpolation function b _{ k }(ϕ) satisfies the following equation:
where ϕ denotes an arbitrary rotation angle. By requiring that both the real and imaginary parts of Eq. (18) agree, we obtain the following interpolation function for steerability:
We proceed to derive the interpolation function for scalability. As mentioned previously, both H(r) and C _{ j }(r)(j = 0, 1) are piecewise. Thus, the scalable interpolation functions, say s _{ j }(σ), should also be piecewise, where σ(σ > 0) is an arbitrary scaling factor. We first handle the case r∊ (π/4, π / 2). As analyzed in Section 3.1, the Fourier frequency with nonzero amplitude merely depends on that before the unwarping operation. Therefore, the Fourier frequency in this case is equal to k = 2 / π. According to [29], s _{ j }(σ) satisfies the following equation:
where R _{ j }(j = 0, 1) is defined in Eq. (12). Given that both the real and imaginary parts of Eq. (20) agree, we obtain
For the case r∊ (0, π/4], no Fourier frequency has nonzero amplitude, and hence, s _{ j }(σ) can be any value. In our scheme, we simply set s _{ j }(σ) = 0 for r∊ (0, π / 4]. For the case r = ∊ [π / 2, π], the Fourier frequency with nonzero amplitude is k = 0. As a result, we have
Because C _{0}(r) = C _{1}(r) has been adopted in the DPT construction, we similarly set s _{0}(σ) = s _{1}(σ) and obtain s _{0}(σ) = s _{1}(σ) = 1 / 2.
By summarizing the aforementioned results, we derive the following interpolation functions for scalability:
Using the steerable and scalable interpolation functions, we can interpolate the deformable filter at arbitrary orientation ϕ and scale σ, say F ^{ϕ, σ}(r, θ), via the following construction:
where (r, θ) are the polar coordinates in the Fourier domain. For convenience, Eq. (24) is called the deformable interpolation.
Suppose that {Q}_{\mathit{jk}}^{l}\left(r,\theta \right) (j, k∊ {0, 1}; l = 1, 2, …) denotes the DPT basis subband at the l th pyramid level. The filter response at orientation ϕ and scale σ can then be obtained via the deformable interpolation as:
Although both Eqs. (24) and (25) are represented in the Fourier domain, performing the inverse Fourier transform on them leads to a straightforward interpolation expressed in the spatialfrequency domain.
4. Mechanism for geometrical synchronization
In this section, in an attempt to counteract geometrical attacks in robust watermarking, we exploit the characteristics of shiftinvariance, steerability, and scalability in the DPT to theoretically derive synchronization mechanisms for translation, rotation, and scaling. The derivation is as follows.
4.1. Synchronization for translation
Let I(x, y) and {I}^{{x}_{0},{y}_{0}}\left(x,y\right) be the original image and its translated version, respectively, i.e., {I}^{{x}_{0},{y}_{0}}\left(x,y\right)={\mathcal{T}}_{{x}_{0},{y}_{0}}\left[I\left(x,y\right)\right]=I\left(x{x}_{0},y{y}_{0}\right), where (x _{0}, y _{0}) is the translation distance and {\mathcal{T}}_{{x}_{0},{y}_{0}}\left[\cdot \right] is the translation operator. The corresponding Fourier transforms (FTs) are denoted as I(ω _{ x }, ω _{ y }) and I\left({\omega}_{x},{\omega}_{y}\right){e}^{j\left({\omega}_{x}{x}_{0}+{\omega}_{y}{y}_{0}\right)}, respectively.
Assume that ω ^{1} = (ω _{ x }, ω _{ y }) represents the coordination at the first (finest) level of the DPT pyramid. Its corresponding coordination at the l th (l ≥ 1) pyramid level is then computed as {\mathbf{\omega}}^{l}=\left({\omega}_{x}^{l},{\omega}_{y}^{l}\right)=\left({\omega}_{x}/{2}^{l1},{\omega}_{y}/{2}^{l1}\right) (see also Figure 2). Suppose that {Q}_{\mathit{jk}}^{l}\left({\mathbf{\omega}}^{l}\right) and {Q}_{\mathit{jk}}^{l,{x}_{0},{y}_{0}}\left({\mathbf{\omega}}^{l}\right) (j, k∊ {0, 1}; l = 1, 2, …) are the DPT basis subbands in the Fourier domain for I(x, y) and {I}^{{x}_{0},{y}_{0}}\left(x,y\right), respectively. According to Figure 2, we have
By considering Eqs. (26) and (27), we clearly find that
where {q}_{\mathit{jk}}^{l,{x}_{0},{y}_{0}}\left(x,y\right) and {q}_{\mathit{jk}}^{l}\left(x,y\right) are inverse FTs of {Q}_{\mathit{jk}}^{l,{x}_{0},{y}_{0}}\left({\omega}_{x},{\omega}_{y}\right) and {Q}_{\mathit{jk}}^{l}\left({\omega}_{x},{\omega}_{y}\right) respectively.
Equation (28) implies that the DPT basis subband {q}_{\mathit{jk}}^{l,{x}_{0},{y}_{0}}\left(x,y\right) in the spatialfrequency domain for the translated input signal {I}^{{x}_{0},{y}_{0}}\left(x,y\right) is also the translated version of {q}_{\mathit{jk}}^{l}\left(x,y\right) for the original input signal. This is the essence of shiftinvariance in the DPT.
4.2. Synchronization for rotation and scaling
According to the construction of shiftinvariance in the DPT, the translation should affect the synchronization of rotation and scaling. To uncouple the translation from rotation and scaling, we adopt the Fourier magnitude of the input signal as the DPT's input, which in turn achieves the real translation invariance. Under such a setting, we derive the synchronization mechanism for rotation and scaling as follows.
Denote {I}^{\varphi ,\sigma}\left(x,y\right)={\mathcal{G}}_{\varphi ,\sigma}\left[I\left(x,y\right)\right] as e rotated and scaled version of the original image I(x, y), where {\mathcal{G}}_{\varphi ,\sigma}\left[\cdot \right] is an operator that rotates counterclockwise by ϕ and dilates by σ about the origin. Let M(ω _{ x }, ω _{ y }) and M ^{ϕ,1/σ}(ω _{ x }, ω _{ y }) be the Fourier magnitude of I(x, y) and I ^{ϕ,σ}(x, y), respectively. Then, we have {M}^{\varphi ,1/\sigma}\left({\omega}_{x},{\omega}_{y}\right)={\mathcal{G}}_{\varphi ,1/\sigma}\left[M\left({\omega}_{x},{\omega}_{y}\right)\right] according to the property of the FT.
As defined in Section 4.1, let ω ^{l} = (ω _{ x }/2^{l − 1}, ω _{ y }/2^{l − 1}) denote the frequency coordinate at the l th (l ≥ 1) level of the DPT pyramid. Assume that M ^{ϕ,1/σ}(ω ^{1}) and M(ω ^{1}) are decomposed via DPT into l(l ≥ 1) pyramid levels to yield the basis subbands {Q}_{\mathit{jk}}^{l,\varphi ,\sigma}\left({\mathbf{\omega}}^{l}\right) and {Q}_{\mathit{jk}}^{l}\left({\mathbf{\omega}}^{l}\right)\left(j,k\in \left\{0,1\right\};l=1,2,\dots \right), respectively. By virtue of the steerable and scalable properties in Eq. (25), we use {Q}_{\mathit{jk}}^{l}\left({\mathbf{\omega}}^{l}\right)\phantom{\rule{0.25em}{0ex}} to interpolate the response at orientation ψ and scale λ as:
where {F}^{l,\psi ,\lambda}\left({\mathbf{\omega}}^{l}\right)={\displaystyle \sum}_{j=0}^{1}\left({s}_{j}\left(\lambda \right)\left({\displaystyle \sum}_{k=0}^{1}{b}_{k}\left(\psi \right){L}_{0}\left({\mathbf{\omega}}^{l}\right){\left({L}_{1}\left({\mathbf{\omega}}^{l}\right)\right)}^{l1}{C}_{j}\left({\mathbf{\omega}}^{l}\right){G}_{k}\left({\mathbf{\omega}}^{l}\right)\right)\right). Similarly, we further use {Q}_{\mathit{jk}}^{l,\varphi ,\sigma}\left({\mathbf{\omega}}^{l}\right)\phantom{\rule{0.25em}{0ex}} to obtain the response at orientation ϕ + ψ and scale λ/σ as:
where {F}^{l,\varphi +\psi ,\lambda /\sigma}\left({\mathbf{\omega}}^{l}\right)={\displaystyle \sum}_{j=0}^{1}\left({s}_{j}\left(\lambda /\sigma \right)\left({\displaystyle \sum}_{k=0}^{1}{b}_{k}\left(\psi +\varphi \right){L}_{0}\left({\mathbf{\omega}}^{l}\right){\left({L}_{1}\left({\mathbf{\omega}}^{l}\right)\right)}^{l1}{C}_{j}\left({\mathbf{\omega}}^{l}\right){G}_{k}\left({\mathbf{\omega}}^{l}\right)\right)\right). In the framework of shiftability [29], F ^{l,ψ,λ}(ω ^{l}) represents the filter at orientation ψ and scale λ in the l th level of the multiscale DPT (see also Figure 2). This is actually the rotated and scaled version of the kernel {F}^{l,0,{R}_{0}}\left({\mathbf{\omega}}^{l}\right)={L}_{0}\left({\mathbf{\omega}}^{l}\right){\left({L}_{1}\left({\mathbf{\omega}}^{l}\right)\right)}^{l1}{C}_{0}\left({\mathbf{\omega}}^{l}\right){G}_{0}\left({\mathbf{\omega}}^{l}\right) at orientation 0 and scale R _{0} (see also Eq. (12)). In other words, {F}^{l,\psi ,\lambda}\left({\mathbf{\omega}}^{l}\right)={\mathcal{G}}_{\psi ,\lambda /{R}_{0}}\left[{F}^{l,0,{R}_{0}}\left({\mathbf{\omega}}^{l}\right)\right] holds and so does {F}^{l,\varphi +\psi ,\lambda /\sigma}\left({\mathbf{\omega}}^{l}\right)={\mathcal{G}}_{\varphi +\psi ,\lambda /\left(\sigma {R}_{0}\right)}\left[{F}^{l,0,{R}_{0}}\left({\mathbf{\omega}}^{l}\right)\right]. Therefore, we have
Taking Eq. (31) and {M}^{\varphi ,1/\sigma}\left({\mathbf{\omega}}^{l}\right)={\mathcal{G}}_{\varphi ,1/\sigma}\left[M\left({\mathbf{\omega}}^{l}\right)\right] into account, Eqs. (29) and (30) essentially imply the following synchronization mechanism for rotation and scaling:
Performing the inverse FT leads to the rotation and scaling synchronization mechanism in the spatialfrequency domain:
where s ^{l} = (x/2^{l−1}, y/2^{l−1}) is the coordination in the spatialfrequency domain at pyramid level l, and q ^{l,ϕ+ψ,σ/λ}(s ^{l}) and q ^{l,ψ,1/λ}(s ^{l}) are the inverse FTs of Q ^{l,ϕ+ψ,λ/σ}(ω ^{l}) and Q ^{l,ψ,λ}(ω ^{l}), respectively.
Based on Eq. (32), the synchronization for rotation and scaling can be performed as follows: decompose the Fourier magnitude of I ^{ϕ,σ}(x, y) into an llevel DPT pyramid to generate the basis subbands {Q}_{\mathit{jk}}^{l,\varphi ,\sigma}\left({\mathbf{\omega}}^{l}\right)\left(j,k\in \left\{0,1\right\};l=1,2,\dots \right). Then, interpolate the response at orientation ϕ + ψ and scale λ/σ as {Q}_{\mathit{jk}}^{l,\varphi +\psi ,\lambda /\sigma}\left({\mathbf{\omega}}^{l}\right). Finally, successively rotate counterclockwise by ϕ and dilate by σ the interpolated subband {Q}_{\mathit{jk}}^{l,\varphi +\psi ,\lambda /\sigma}\left({\mathbf{\omega}}^{l}\right) to yield the response Q ^{l,ψ,λ}(ω ^{l}) at orientation ψ and scale λ. The Q ^{l,ψ,λ}(ω ^{l}) is equivalent to the subband at orientation ψ and scale λ that is synthesized from the DPT basis subbands {Q}_{\mathit{jk}}^{l}\left({\mathbf{\omega}}^{l}\right) of the original image I(x, y). The rotation and scaling synchronization using Eq. (33) is similar to that based on Eq. (32).
5. Proposed robust watermarking scheme
In this section, we present the proposed robust image watermarking algorithm, which is RSTresilient. The translation invariance is achieved by taking the Fourier magnitude of the cover image I(x, y) as the DPT input, and the rotation and scaling are counteracted using the inserted template and the rotation and scaling synchronization. The details are given below, where only K = 2 steerable basis filters are adopted to reduce the computational complexity of the rotation and scaling synchronization.
5.1. Template and watermark inserti
Assume that the size of cover image I(x, y) is H × W. To obtain favorable resolution for template matching, we symmetrically pad (crop) the rows/columns of I(x, y) to the size of 1,024 if the height/width, H/W, is smaller (larger) than 1,024. We then calculate its Fourier magnitude M(ω _{ x }, ω _{ y }) and phase Ψ(ω _{ x }, ω _{ y }), and further decompose M(ω _{ x }, ω _{ y }) into a threelevel DPT pyramid to generate the spatialfrequency basis subbands q _{ jk } ^{l}(x, y)(j, k∊ {0, 1}; l = 1, 2, 3). Among these, the subbands at the first (finest) level, q _{ jk } ^{1}(x, y), are used for template insertion, whereas those at the other two levels, q _{ jk } ^{l}(x, y)(l = 2, 3), are for watermark embedding. We chose to embed in the spatialfrequency domain instead of the Fourier domain because the symmetry of the Fourier magnitude would decrease the number of candidate coefficients for watermarking and thus the embedding capacity. The template and watermark embedding process is illustrated in Figure 3, which is explained as follows.
5.1.1. Template embedding

(1)
Generate, via a secret key KEY _{ t 1}, a random sequence P = {p _{ i }∊ {+ 1, − 1}, i = 1, …, N _{ t }} of length N _{ t } as the template.

(2)
To enhance the security, we tune q _{ jk } ^{1}(x, y) to the predefined secret orientation θ _{ t } and scale σ _{ t } and obtain {q}^{1,{\theta}_{t},{\sigma}_{t}}\left(x,y\right). According to the steerability and scalability in Eq. (25), we have
\begin{array}{ll}\phantom{\rule{.5em}{0ex}}{q}^{1,{\theta}_{t},{\sigma}_{t}}=& cos{\theta}_{t}\cdot {\mathcal{F}}^{1}\left({s}_{0}\left({\sigma}_{t}\right){Q}_{00}^{1}+{s}_{1}\left({\sigma}_{t}\right){Q}_{10}^{1}\right)\\ +sin{\theta}_{t}\cdot {\mathcal{F}}^{1}\left({s}_{0}\left({\sigma}_{t}\right){Q}_{01}^{1}+{s}_{1}\left({\sigma}_{t}\right){Q}_{11}^{1}\right),\end{array}(34)where Q _{ jk } ^{1}(ω _{ x }, ω _{ y }) denotes the FT of q _{ jk } ^{1}(x, y) and {\mathcal{F}}^{1}\left(\cdot \right) is the inverse FT. Note that the coordinates in Eq. (34) are omitted for compactness.

(3)
Randomly select N _{ t } template positions from {q}^{1,{\theta}_{t},{\sigma}_{t}}\left(x,y\right) using a secret key KEY _{ t 2}, which is denoted as PS = {(x _{ i }, y _{ i }), i = 1, 2, …,N _{ t }}. As a tradeoff between robustness and imperceptibility, we prefer the (x _{ i }, y _{ i }) located in the spatialfrequency region with normalized radius r∊ (π/4, π/2). Then, embed the template in the selected positions using
{u}^{1,{\theta}_{t},{\sigma}_{t}}\left({x}_{i},{y}_{i}\right)={q}^{1,{\theta}_{t},{\sigma}_{t}}\left({x}_{i},{y}_{i}\right)+{\beta}_{t}{p}_{i},(35)where β _{ t } is the embedding strength.

(4)
Tune {u}^{1,{\theta}_{t},{\sigma}_{t}}\left({x}_{i},{y}_{i}\right) backward to obtain the watermarked basis subbands u _{ jk } ^{1}(x, y). This, however, is nontrivial for the following two reasons. First, it is difficult to interpolate Eq. (35) backward to yield four embedded basis subbands u _{ jk } ^{1}(x, y)(j, k∊ {0, 1}). Second, s _{ j }(σ _{ t }) is a piecewise function with respect to {\mathcal{F}}^{1}\left(\cdot \right), according to Eq. (23), and thus, the interpolation in Eq. (34) cannot be implemented directly in the spatialfrequency domain. The latter situation implies that multiple FTs are required to complete the template insertion. This will significantly degrade the performance of bruteforce template matching by the receiver and consequently make the template matching unaffordable.
To simplify the template insertion and template matching, we are motivated to adopt a nonpiecewise s _{ j }(σ _{ t }), e.g., setting s _{ j }(σ _{ t }) to a fixed value u(u > 0), which turns Eq. (34) into
Because s _{ j }(σ _{ t }) is piecewise, we determine a suitable u in a piecewise manner. As pointed out in Section 3.2, for r∊ [0, π/4], s _{ j }(σ _{ t }) can be any value. Thus, we merely consider the cases of r∊ (π/4, π/2) and r∊ [π/2, π]. For the case r∊ [π/2, π], the setting s _{ j }(σ _{ t }) = 1/2 is already a fixed value. For r∊ (π/4, π/2), taking Eqs. (16) and (5) into account, we calculate the expression s _{0}(σ _{ t })Q _{0k } ^{1} + s _{1}(σ _{ t })Q _{1k } ^{1} in Eq. (34) as:
Given that the scale range concerned in our scheme is [0.5, 2] (a broader scale range would degrade the robustness to scaling attacks), the value of \left({s}_{0}\left({\sigma}_{t}\right)+\sqrt{3}{s}_{0}\left({\sigma}_{t}\right)\right)/\left(1+\sqrt{3}\right) is in the range [0.69, 0.95]. Thus, we roughly set s _{ j }(σ _{ t }) = u = 0.7, which approximates the s _{ j }(σ _{ t }) in the cases of r∊ (π/4, π/2) and r∊ [π/2, π]. Although such an approximation will lead to interpolation errors, it is demonstrated to be feasible by the extensive experimental results in Section 6.
Via the simplified Eq. (37), we equivalently embed the template in the DPT basis subbands q _{ jk } ^{1}(x _{ i }, y _{ i }) as follows:
which avoids both the forward and backward interpolations and solves the problem that exists in the backward interpolation.
5.1.2. Watermark embedding

(1)
Generate N _{ m } random bits b = {b _{ i }, i = 1, …, N _{ m }} as the message using a secret key KEY _{ w 1}.

(2)
Encode b with the repeataccumulate (RA) code of rate rate [35] to generated the encoded binary sequence e = {e _{ i }, i = 1, …, N _{ m }/rate}, where RA is a kind of code with excellent codec performance.

(3)
Because there exists a natural quadtree structure between q _{ jk } ^{3}(x, y)(j, k∊ {0, 1}) and {q _{ jk } ^{2}(2x − 1, 2y − 1), q _{ jk } ^{2}(2x, 2y − 1), q _{ jk } ^{2}(2x − 1, 2y), q _{ jk } ^{2}(2x, 2y)}, we group the four quadtrees from four different subbands q _{ jk } ^{l}(x, y) together to form a 20element vector tree T _{ i } = {T _{ iv }, v = 1, …, 20}(i = 1, …, 1024 × 1024/16), as illustrated in Figure 4, where the child coefficients of q _{00} ^{3}(x, y) are listed but the other child coefficients are omitted from the figure for compactness. In our scheme, each vector tree is taken as the basic unit for watermarking. This is an attempt to achieve a reasonable tradeoff between robustness and embedding capacity.

(4)
In the interest of resisting against cropping, we choose, via a secret key KEY _{ w 2}, N _{ m }/rate vector trees located in the central region for watermark insertion. Assume that each vector tree T _{ i } is inserted with one encoded bit e _{ i }(e _{ i } = 0, 1). We then need to take a 20element vector to represent e _{ i }. To enhance the watermarking detection performance, we set the 20element vector {\mathbf{w}}_{0}=\left\{\underset{20"1"s}{\underset{\u23df}{1,\dots ,1}}\right\} for e _{0} and {\mathbf{w}}_{1}=\left\{\underset{20"+1"s}{\underset{\u23df}{+1,\dots ,+1}}\right\} for e _{1}, which achieves the maximum codeword distance and thus decreases the detection error probability.

(5)
Associate the allocated bit e _{ i } to {\mathbf{w}}_{{e}_{i}} and perform the embedding as follows:
{\mathbf{Y}}_{i}={\mathbf{T}}_{i}+{\beta}_{w}{\mathbf{w}}_{{e}_{i}},(39)where Y _{ i } is the watermarked vector tree and β _{ w } is a nonadaptive embedding strength because, to the best of our knowledge, no suitable human visual model has been reported in the literature for the situation in our scheme. Equation (39) is equivalently written as:
\begin{array}{ll}\phantom{\rule{1em}{0ex}}{u}_{\mathit{jk}}^{l}\left({x}_{\mathit{iv}},{y}_{\mathit{iv}}\right)=& {q}_{\mathit{jk}}^{l}\left({x}_{\mathit{iv}},{y}_{\mathit{iv}}\right)\\ +{\beta}_{w}\left(2{e}_{i}1\right),\phantom{\rule{.2em}{0ex}}v=1,\dots ,20,\phantom{\rule{0.2em}{0ex}}l=2,3,\end{array}(40)where (x _{ i }, y _{ i }) is the coordination corresponding to the v th element of T _{ i }.

(6)
Embed all bits e _{ i } into the chosen vector trees by iteratively implementing step 5.

(7)
Finally, perform the inverse DPT on the watermarkinserted basis subbands u _{ jk } ^{l}(l = 2, 3) and the templateembedded ones u _{ jk } ^{1} in Section 5.1.1 to obtain the watermarked Fourier magnitude M _{ w }(ω _{ x }, ω _{ y }).

(8)
Multiply M _{ w }(ω _{ x }, ω _{ y }) by the original phase Ψ(ω _{ x }, ω _{ y }) and perform the inverse FT to obtain the watermarked image I _{ w } ^{pre}(x, y) of size 1,024 × 1,024.

(9)
Execute the inverse padding (cropping) operation on I _{ w } ^{pre}(x, y) to obtain the final watermarked image, I _{ w }(x, y), of size H × W.
5.2. Efficient templatematching algorithm
Because translation invariance has been achieved by taking the Fourier magnitude of the cover image as the DPT input, we merely use the inserted template to estimate the rotation angle and scaling factor. These will be used to correct the rotation and scaling before watermark extraction. Based on the synchronization mechanisms for rotation and scaling in Section 4.2, we develop the efficient templatematching algorithm as follows.
Assume that the received image is I _{ r }(x, y). We first preprocess I _{ r }(x, y) with the same method as in the embedding stage to give an image size of 1,024 × 1,024. We then calculate the Fourier magnitude of the preprocessed image and decompose the resulting magnitude into a onelevel DPT pyramid. This is because only the template inserted at level 1 is required for template matching. This yields the DPT basis subbands q _{ jk } ^{1}(x, y)(j, k∊ {0, 1}). According to Eq. (36), the template matching for rotation and scaling estimation can be performed as follows. The basis subbands q _{ jk } ^{1}(x, y) are tuned to any candidate orientation and scale, and the tuned subband is then inversely rotated and dilated. The template is extracted accordingly to compute the correlation with the original template. After all candidate rotation angles and scaling factors have been searched in this way, the orientation and scale corresponding to the maximum correlation are adopted as the estimated parameters for rotation and scaling.
From the process discussed above, it can be seen that only a limited number of template points are involved in template matching. This motivates us to simplify the template matching by merely interpolating the relevant template points, as described below.

(1)
Set the range [−180, 180) with step Δ _{ ϕ } (e.g., Δ _{ ϕ } = 0.5) as the search space for the rotation angle, and [σ _{1}, σ _{2}] (e.g., [0.5, 2.0]) with step Δ _{ σ } (e.g., Δ _{ σ } = 0.01) as that for the scaling factor. Initialize the search parameters as ϕ = − 180 and σ = σ _{1}.

(2)
For each parameter pair (ϕ, σ), compute the candidate template position as:
\begin{array}{l}{{x}^{\prime}}_{i}=\mathit{round}\left(\right(\left({x}_{i}\mathit{cx}\right)cos\varphi +({y}_{i}\mathit{cy})sin\varphi )/\sigma +\mathit{cx})\\ {{y}^{\prime}}_{i}=\mathit{round}\left(\right(\left({y}_{i}\mathit{cy}\right)cos\varphi ({x}_{i}\mathit{cx})sin\varphi )/\sigma +\mathit{cy}),\end{array}(41)where (x _{ i }, y _{ i })(i = 1, 2, …, N _{ t }) denotes the original template coordinates determined by key KEY _{ t 2}, (cx, cy) is the geometrical center, and round(⋅) is the rounding operation.

(3)
Obtain, via the steerability and scalability in Eq. (36), the coefficients at location (x′_{ i }, y′_{ i }) as:
\begin{array}{ll}\phantom{\rule{1.2em}{0ex}}{q}^{1,\varphi ,\sigma}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)=& ucos\varphi \left({q}_{00}^{1}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)+{q}_{10}^{1}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)\right)\\ +usin\varphi \left({q}_{01}^{1}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)+{q}_{11}^{1}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)\right).\end{array}(42)

(4)
Calculate the correlation between the extracted and original templates as:
\mathit{Corr}\left(\varphi ,\sigma \right)={\displaystyle \sum _{i=1}^{{N}_{t}}{q}^{1,\varphi ,\sigma}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)}\phantom{\rule{0.5em}{0ex}}\cdot {p}_{i}(43)

(5)
Increase the candidate scale σ to σ = σ + Δ _{ σ } while keeping ϕ unchanged. Repeat steps 2 to 4 until σ ≥ σ _{2}.

(6)
Augment the candidate rotation angle ϕ by Δ _{ϕ}, i.e., ϕ = ϕ + Δ_{ϕ}, and reexecute steps 2 to 5 until ϕ ≥ 180.

(7)
Find the maximum correlation value Corr(ϕ, σ)_{max} and take the corresponding geometrical parameters (ϕ_{est}, σ _{est}) as the estimated rotation angle and scaling factor.

(8)
Calculate the real parameters of rotation and dilation attacks as ϕ_{attack} = ϕ_{est} − θ _{ t } and σ _{attack} = σ _{est}/σ _{ t }, respectively. This is because ϕ_{est} and σ _{est} are essentially, according to Section 4.2, equal to ϕ_{est} = θ _{ t } + ϕ_{attack} and σ _{est} = σ _{ t } σ _{attack}, respectively, where θ _{ t }, ϕ_{attack}, σ _{ t }, and σ _{attack} correspond to ψ, ϕ, 1/λ, and σ in Section 4.2, respectively.
Although the above templatematching algorithm only addresses symmetrical scaling, i.e., the scaling factors along the x and yaxes are the same, it can easily be extended to the situation with different scaling factors. To this end, set the parameter space (ϕ, σ _{ x }, σ _{ y }) with ϕ∊ [−180, 180) and σ _{ x }, σ _{ y }∊ [σ _{1}, σ _{2}], and then search each parameter space successively, similar to the above algorithm. Nevertheless, this would significantly increase the computational complexity.
5.3. Geometrical correction and watermark extraction
Suppose that M ^{θ,1/σ}(ω _{ x }, ω _{ y }) denotes the Fourier magnitude of the preprocessed image of size 1,024 × 1,024. We then correct M ^{θ,1/σ}(ω _{ x }, ω _{ y }) by rotating counterclockwise by − ϕ _{attack} and scaling by σ _{attack} about the origin, and obtain the Fourier magnitude M ^{0,1}(ω _{ x }, ω _{ y }) corresponding to the original watermarked image at orientation 0 and scale 1. Next, the watermark is recovered from M ^{0,1}(ω _{ x }, ω _{ y }) as follows.

(1)
Decompose M ^{0,1}(ω _{ x }, ω _{ y }) into a threelevel pyramid via the DPT, which yields the DPT basis subbands q _{ jk } ^{l}(x, y)(l = 1, 2, 3; j, k∊ {0, 1}).

(2)
Use q _{ jk } ^{l}(x, y) to construct the 20element vector trees Z _{ i }(i = 1, …, 1,024 × 1,024/16) in the same way as watermark embedding. Choose N _{ m }/rate vector trees located in the central area via the secret key KEY _{ w 2}.

(3)
For each selected vector tree Z _{ i }, extract the encoded message bits as follows:
{\displaystyle \sum _{v=1}^{20}{\mathbf{Z}}_{\mathit{iv}}{\mathbf{w}}_{0v}}\phantom{\rule{0.5em}{0ex}}\begin{array}{c}\stackrel{b=0}{>}\\ \underset{b=1}{<}\end{array}\phantom{\rule{1em}{0ex}}{\displaystyle \sum _{v=1}^{20}{\mathbf{Z}}_{\mathit{iv}}{\mathbf{w}}_{1v}},(44)where w _{ b }(b = 0, 1) are as in Eq. (39).

(4)
After completing the extraction of encoded message bits from all N _{ m }/rate selected vector trees, run the RA decoding to recover the raw message \widehat{b}.
6. Experimental results and discussion
In this section, we assess the proposed watermarking scheme via experimental simulations. In the simulations, we test 20 512 × 512grey images with different textures. For each test image, we decompose its Fourier magnitude into a threelevel DPT pyramid. The first (finest) level is used for template insertion, and the other two levels are for watermark embedding. The template consists of N _{ t } = 105 random bits and is inserted in positions with normalized radiuses r∊ {0.3, 0.35, 0.4} and angles θ∊ {1:1:10, 15:10:95, 100:17:359}, where 1, 10, and 17 denote the secret step. The watermark is a sequence of 720 bits that is formed by encoding the N _{ m } = 60 random message bits with the RA code of rate rate = 1/12. The embedding strengths β _{ t } and β _{ w } are adjusted imagebyimage such that the peak signaltonoise ratio (PSNR) is 40 dB. Figure 5 illustrates several watermarked images. This figure demonstrates that the images watermarked by the proposed scheme have feasible visual fidelity. The mean and variance of all PSNRs are calculated as 40.01 dB and 0.01, respectively.
For the 20 generated watermarked images, we impose geometrical attacks (e.g., rotation, scaling, cropping) and common signal processing attacks (e.g., JPEG compression, additive white Gaussian noise (AWGN), median filtering, convolution filtering). We then deploy the efficient templatematching algorithm in Section 5.2 to achieve rotation and scaling synchronization, where the search spaces for rotation and scaling are set as ϕ∊ [−180, 180) with step Δ _{ ϕ } = 0.5 and σ∊ [0.5, 2.0] with Δ _{ σ } = 0.01, respectively. The performance against these attacks is summarized below.
6.1. Performance against geometrical attacks
As translation invariance can be theoretically ensured by taking the Fourier magnitude of the cover image as the DPT input, the translation is no longer assessed in this paper. We mainly examine the performance against geometrical attacks such as rotation, scaling, cropping, and row/column line removal, which is practically implemented in StirMark [36, 37].
In StirMark, rotation attacks include rotation without auto cropping, rotation with auto cropping, and rotation with auto cropping and scaling. For these three types of attack, we set the rotation angles as ±2°, ±1°, ±0.75°, ±0.5°, ±0.25°, 45°, and 90°. We then use the efficient templatematching algorithm in Section 5.2 to estimate the rotation angle and scaling factor, followed by geometrical correction and watermark extraction. The experimental simulation shows that the bit error rates (BERs) for all concerned parameters are exactly 0, which demonstrates the high robustness of the proposed scheme to differently implemented rotations.
For scaling attacks in StirMark, we set the scaling factors in the range [0.5, 2.0] with step 0.1. The performance is shown in Figure 6, where the BER is averaged over all 20 test images. It is observed that the proposed scheme achieves BER = 0 for scaling factors from 0.7 to 1.6 and BER < 0.1 for scaling factors from 1.7 to 1.9. However, it is vulnerable to scaling factors of 0.5 and 2. The failure to counter scaling with a factor of 0.5 can be attributed to the loss of 75% of the image information, and the weakness to scaling with a factor of 2 comes from the interpolation approximation via Eq. (36).
Figure 7 summarizes the averaged performance against cropping attacks in the cropping ratio range [40%, 100%] with step 5%, where the ratio is that of the cropped image to the original image. It is found that the proposed scheme achieves BER = 0 for cropping ratios in the range [75%, 100%] and BER < 0.03 for ratios from 60% to 75%. Nevertheless, it is sensitive to cropping with ratios below 60%, which is to be expected as such cropping would lose more than 60% of the image information. In this sense, our scheme has sufficient robustness to cropping.
In addition, we also examine row/column line removal attacks, which are considered to be a kind of geometrical manipulation resulting in local distortions. The frequencies of the removed row and column lines are set in the range [10,100] with step size 10. Simulation results show that the proposed scheme can successfully counteract this attack by achieving BER = 0.
6.2. Performance against common signal processing attacks
Common signal processing attacks include JPEG compression, AWGN, median filtering, and convolution filtering. We first evaluate the performance against JPEG compression. In the simulation, we set the quality factor (QF) range of JPEG compression to be [10, 38] with step size 2. Figure 8 plots the performance averaged over 20 test images. It is shown that the proposed scheme has excellent performance against JPEG compression with QF ≥ 22, as well as favorable robustness for QF∊ [16, 22). This preferable performance may be attributed to the redundant representation of the DPT, which is, from a mathematical viewpoint, essentially a frame expansion with promising robustness to added noise.
We further impose AWGN on the aforementioned watermarked images. We set the noise levels for AWGN in StirMark from 1 to 7 and then examine the performance in terms of BER. Figure 9 depicts the averaged performance. It is seen that the BER is 0 for noise levels of less than or equal to 3 and smaller than 0.05 for noise level 4, although it is large for the other cases. This indicates that the proposed scheme has sufficient robustness to AWGN.
In examining the performance against median filtering (cut), we set the size of the median filter in the range [2, 5] with step 1. The simulation results are summarized in Table 1. It is observed that the proposed scheme has high robustness to median filtering with sizes 2 and 3, but the performance suffers with sizes 4 and 5. This implies that the proposed scheme has feasible but not sufficiently robust performance against median filtering.
Finally, we test the robustness of the proposed scheme to convolution filtering, which includes sharpening and Gaussian filtering. The simulation result shows that the average BER for sharpening is exactly 0 and that for Gaussian filtering is 0.143. This demonstrates that the proposed scheme is totally insensitive to sharpening but is not sufficiently robust to Gaussian filtering.
6.3. Computation time evaluation
In this section, we evaluate the computation time of the proposed scheme. As described in Section 5, the proposed scheme consists of message embedding and extraction processes. As the computation time for the message embedding process is much less than that for the message extraction process, we mainly examine the computation time for message extraction.
The message extraction process includes two stages, namely template matching and message recovery. As the former stage takes up most of the computation time of the message extraction process, below focuses on the analysis on the computation time of template matching. According to Section 5.2, there are total N _{ ϕ } = 360/Δ _{ ϕ } candidate rotation angles and N _{ σ } = (σ _{2} − σ _{1})/Δ _{ σ } candidate scaling factors. For each candidate rotation angle, all candidate scaling factors are required to search. Therefore, the computational complexity, in unit of N _{ t }dimensional correlation calculation, of template matching can be represented as O(N _{ ϕ } N _{ σ }).
To illustrate the computation time of template matching, we perform the following experimental simulation. As set in Section 6.1, we adopt Δ _{ ϕ } = 0.5, σ _{1} = 0.5, σ _{2} = 2.0, Δ _{ σ } = 0.01, and N _{ t } = 105 for simulation. Under these settings, we evaluate the computation time of template matching by executing the Matlab code on an Intel personal computer (Intel, Santa Clara, CA, USA) with 2.2GHz core(TM) 2 Duo CPU and 2GB memory. The computation time averaged over 10 runs is 4.34 s. This implies that although a bruteforce searching approach is employed for template matching, its computation time is not as high as the intuition believes. This is because only a number of template points are incorporated in template matching, and thus, feasible computation time is achieved.
6.4. Comparison to related schemes
To further evaluate the proposed watermarking scheme, we compare it with those in [21], [38], and [14], which are denoted as PPAFFINE, WNZHDMST, and NIKOSFND for notational convenience, respectively. PPAFFINE is, as surveyed in [1], a typical templatebased watermarking algorithm with high robustness against affine attacks. WNZHDMST is another templatebased watermarking approach incorporating the deformable multiscale transform (DMST) that is somewhat similar to the DPT. NIKOSFND is a kind of salientfeature and normalizationbased watermarking scheme with excellent performance against both local and global distortions.
We start with the comparison to PPAFFINE. In this scheme, three 512 × 512grey images, namely Baboon, Lena, and Boat, are adopted for performance assessment. Both a 60bit message and a 14point template are inserted in the Fourier domain, and the resulting watermarked images have PSNRs no greater than 38 dB. To ensure a fair comparison, we also embed a 60bit message via the proposed scheme and adjust the embedding strength adaptively to make the PSNRs close to 38 dB. We then employ the same evaluation as [21] for performance comparison.
Table 2 summarizes the simulation results for PPAFFINE and the proposed scheme. It can be observed that both schemes have the same high robustness against enhancement, row/column removal, rotation with auto cropping and scaling, and random geometrical distortions. It is interesting to find that the proposed scheme obtains a significant improvement in performance against JPEG compression, where PPAFFINE takes the Fourier magnitude of the test image as the cover signal and the proposed scheme inputs the Fourier magnitude of the test image to the DPT to generate DPT subbands as the cover signal. This implies that DPT subbands facilitate the improvement in watermarking robustness. Nevertheless, the proposed scheme is worse than PPAFFINE in counteracting scaling, cropping, and shearing, which is explained as follows. Compared with PPAFFINE, the proposed scheme cannot resist against a scaling attack with factor 2, because of the interpolation approximation in Eq. (36). The proposed scheme fails to deal with cropping ratios of 50% and 75%, because at least 75% of the image information has been lost in these two situations. The weakness of the proposed scheme to shearing is to be expected, as it is designed to counteract RST rather than affine transforms. In this sense, we may claim that the proposed scheme is comparable to PPAFFINE in terms of its performance against RST and common signal processing attacks.
We proceed to compare the proposed scheme with WNZHDMST [38]. For fair comparison, we adopt the same settings as WNZHDMST. In particular, we test the same five images (i.e., Tank, Globe, Lena, Man, and Zelda) as those in WNZHDMST. We also insert a 60bit message sequence in each test image, and let PSNRs of watermarked images be equal to those in WNZHDMST by slightly adjusting the embedding strength. We then perform common signal processing attacks and geometrical attacks on watermarked images via StirMark 4.1 [36, 37]. The performance comparison between the proposed scheme (denoted as DPT) and WNZHDMST is summarized in Tables 3 and 4. It is found that the performance of the proposed scheme is comparable to or better than that of WNZHDMST.
According to [38], the DMST only has analysis filters and thus is not a pyramid transform. For this sake, WNZHDMST uses the SPT for template/watermark insertion and message extraction and adopts the DMST to estimate the rotation angle and scaling factor. As the DMST is similar to DPT's analysis filters, the template matching of WNZHDMST is thus similar to that in the proposed scheme. By recalling the above performance comparison, it makes sense to claim that the proposed scheme is promising to achieve better performance.
In comparison to NIKOSFND [14], we adopt the same settings as NIKOSFND for impartial evaluation. In [14], 10 512 × 512grey images, namely Airplane, Boat, House, Peppers, Splash, Baboon, Couple, Lena, Elaine, and Lake, were used for performance examination. For each image, a 50bit message was inserted and the PSNR was held at 40 dB. The watermarked images were then polluted with both local and global geometrical attacks and common signal processing attacks. The performance was evaluated by comparing the NIKOSFND scheme to stateoftheart approaches belonging to the same category. It was found that NIKOSFND demonstrated comparable or even better performance than the schemes it was compared with. To ensure a fair comparison, these settings are similarly applied to our scheme, and the performance comparison is then carried out accordingly. In the simulation, we compare three stateoftheart schemes, i.e., the SIFTbased NIKO_SFND and the schemes presented by Dong et al. [39] and Tian et al. [40]. The SIFTbased NIKOSFND is one of the best of its class using different salient features.
1. Local geometrical attacks: According to [14], this type of attack includes row/column line removal, jitter, and cropping. The performance of the proposed scheme against these attacks is summarized in Figures 10, 11, and 12, respectively, where Nikolaidis denotes the NIKOSFND in [14], as for all other figures below. The BERs shown in figures are averaged over 10 test images and the search space (ϕ, σ_{x}, σ_{y}) (see Section 5.2) is adopted in evaluating the attack of row/column line removal. It can be seen that the proposed scheme significantly outperforms the three comparison approaches in counteracting the attacks of row/column line removal and cropping, whereas it is remarkably weaker than the compared schemes. The weakness to jitter attacks comes from the fact that jitter attacks are outside the scope of the proposed scheme.

2.
Global geometrical attacks: In [14], the author evaluated the performance of NIKOSFND against global geometrical attacks such as rotation, scaling, downsampling followed by upsampling, shearing, and general affine transforms. Performance comparisons are given in Figures 13, 14, 15, 16, and 17, respectively. It can be observed from Figure 13 that the proposed scheme successfully estimates and corrects all checked rotation angles. This outperforms the method of Dong et al., obtains a remarkable improvement over NIKOSFND, and has superiority over Tian et al.
Based on the performance against scaling shown in Figure 14, the proposed scheme exhibits a considerable improvement over the three compared schemes for scaling factors of 0.75, 0.9, 1.1, and 1.5. However, it is much worse for scaling factors of 0.5 and 2.
Figure 15 shows that the proposed scheme has high robustness to the downsampling and upsampling pairs (0.5, 1.5), (1.5, 0.5), (0.7, 1.3), and (1.3, 0.7), outperforming the approaches of Dong et al., Tian et al., and Nikolaidis. Nevertheless, these three approaches outperform the proposed scheme for other cases with a high ratio of information loss.
It is shown from Figures 16 to 17 that the proposed scheme is vulnerable to the attacks of shearing and general affine transform. This is because that the proposed scheme is designed to counteract RST attacks rather than to handle affine transform. Also, it is found that the proposed scheme and the Tian et al.'s approach have similar poor performance, and both of them are significantly worse than NIKOSFNK and Dong et al.

3.
Signal processing attacks: The signal processing attacks considered in [14] are JPEG compression, H.264 intraframe compression, Gaussian noise addition, and lowpass filtering. The performance against these attacks is illustrated in Figures 18, 19, 20, and 21, respectively.
It is found from Figure 18 that the proposed scheme has better performance than the compared schemes for JPEG QFs from 20 to 50 but is weaker for other cases. Because situations with QF < 20 seldom occur in practice, the proposed scheme is more favorable than the three compared approaches in counteracting JPEG compression.
Figure 19 presents the performance comparison against H.264 intraframe compression. It can be observed that the proposed scheme has similarly high robustness as the compared three approaches for quality factor values below 25, but it has significantly better performance for other cases.
As shown in Figures 20 and 21, the performance against Gaussian noise addition and lowpass filtering is somewhat similar. According to Figure 20, the proposed scheme has significant superiority over the schemes of Tian et al. and Nikolaidis. It is similar to the scheme of Dong et al. for noise variances from 0.001 to 0.005 but is slightly better for a noise variance of 0.006. The lowpass filtering results shown in Figure 21 demonstrate that the robustness of the proposed scheme is higher than that of the schemes presented by Tian et al. and Nikolaidis, while it is equivalent to that of Dong et al.'s scheme.
7. Conclusions
In this paper, we have presented a DPTbased robust image watermarking scheme resilient to rotation, scaling, and translation. We first constructed a DPT with shiftinvariance, steerability, and scalability by extending an SPT represented in a closed and polarseparable form. The radial component of the SPT's basis filters was taken as the kernel for designing the scalable basis filters. These were further combined with the steerable basis filters corresponding to the angular components of the SPT's basis filters, resulting in joint scalability and steerability. The shiftinvariance was inherited from the SPT by retaining undecimated highpass and bandpass basis subbands. We also derived interpolation functions for steerability and scalability. These allow the interpolation of any filter (response) at an arbitrary orientation and scale via a linear combination of the DPT's basis filters (responses). By exploiting the characteristics of shiftinvariance, steerability, and scalability, we further derived the theoretical synchronization mechanisms for translation, rotation, and scaling.
Based on the constructed DPT with preferable characteristics, we developed a robust image watermarking scheme that is resilient to translation, rotation, and scaling. The translation invariance is achieved by taking the Fourier magnitude of the cover image as the DPT input. The resilience to rotation and scaling is obtained via the synchronization mechanisms for rotation and scaling. At the transmitter, the template and watermark are inserted in the first level of the DPT pyramid and the other two levels, respectively. At the receiver, the rotation angle and scaling factor are estimated via an efficient templatematching algorithm, and these are further used to correct the rotation and scaling attacks on the received image followed by watermark extraction from the corrected image. Extensive simulations show that the proposed scheme is highly robust to geometrical attacks, such as rotation, scaling, translation, cropping, and row/column line removal, as well as common signal processing attacks such as JPEG compression, AWGN, median filtering, and convolution filtering. In addition, the comparison to some excellent related schemes demonstrated that the proposed scheme has a comparable performance against rotation, scaling, translation, cropping, and row/column line removal attacks, whereas it generally achieves a higher robustness to JPEG compression, AWGN, and lowpass filtering.
References
Zheng D, Liu Y, Zhao J, Saddik AE: A survey of RST invariant image watermarking algorithms. ACM Comput. Surv. 2007, 39(2, Article 5):191.
Kumar A, Santhi V: A review on geometric invariant digital image watermarking techniques. Int. J. Comp. Appl. 2011, 12(9):3136.
Bas P, Chassery JM, Macq B: Geometrically invariant watermarking using feature points. IEEE Trans. Image Processing 2002, 11(9):10141028. 10.1109/TIP.2002.801587
Wang X, Wang C, Yang Y, Niu P: A robust blind color image watermarking in quaternion Fourier transform domain. J. Syst. Softw. 2013, 86: 255277. 10.1016/j.jss.2012.08.015
Wang Y, Doherty JF, Van Dyck RE: A rotation, scaling and translation resilient image watermarking algorithm using circular Gaussian filters. In Proc. of the IEEEEURASIP Workshop on Nonlinear Signal and Image Processing. Baltimore, MD; 2001.
Lichtenauer J, Setyawana I, Kalker T, Lagendijka R: Exhaustive geometrical search and the false positive watermark detection probability. In Proc. of the SPIESecurity and Watermarking of Multimedia Contents V. Volume 5020. Santa Clara, CA; 2003:203214. 10.1117/12.503186
Barni M: Effectiveness of exhaustive search and template matching against watermark desynchronization. IEEE Trans. Signal Proc. Letters 2005, 12(2):158161.
O’Ruanaidh JJK, Pun T: Rotation, scale and translation invariant spread spectrum digital image watermarking. Signal Process. 1998, 66(3):303317. 10.1016/S01651684(98)000127
Kim HS, Lee HK: Invariant image watermark using Zernike moments. IEEE Trans. Circuit Syst. Video Technol. 2003, 13(8):766775. 10.1109/TCSVT.2003.815955
Tang CW, Hang HM: A featuredbased robust digital image watermarking scheme. IEEE Trans. Signal Processing 2003, 51(4):11231129.
Teague M: Image analysis via the general theory of moment. J. Opt. Soc. Am. 1980, 70(8):920930. 10.1364/JOSA.70.000920
Zhang H, Shu H, Coatrieux G, Zhu J, Wu J, et al.: Affine Legendre moment invariants for image watermarking robust to geometrical distortions. IEEE Trans. Image Process. 2011, PP(99):10551068.
Xiang S, Kim HJ, Huang J: Invariant image watermarking based on statistical features in the lowfrequency domain. IEEE Trans. Circuit Syst. Video Technol. 2008, 18(6):777790.
Nikolaidis A: Local distortion resistant image watermarking relying on salient feature extraction. EURASIP J. Adv. Signal Processing 2012, 2012: 97. 10.1186/16876180201297
Kutter M: Watermarking resistance to translation, rotation, and scaling. In Proc. of the International Society for Optical Engineering (SPIE): Multimedia Systems Applications. Volume 3528. Boston, MA; 1998:423.
Voloshynovskiy S, Deguillaume F, Pun T: Content adaptive watermarking based on a stochastic multiresolution image modeling. In Proc. of the 10th European Signal Processing Conference (EUSIPCO’2000). Tampere, Finland; 2000:58.
Voloshynovskiy S, Deguillaume F, Pun T: Multibit digital watermarking robust against local nonlinear geometrical distortions. In Proc. of the International Conference on Image Processing. Volume 3. Thessaloniki, Greece; 2001:999.
Zheng Z, Wang S, Zhao J: RST invariant image watermarking algorithm with mathematical modeling and analysis of the watermarking processes. IEEE Trans. Image Process. 2009, 18(5):10551068.
Tsai J, Huang W, Kuo Y: On the Selection of optimal feature region set for robust digital image watermarking. IEEE Trans. Image Process. 2011, 20(3):735743.
Tsai JS, Huang WB, Kuo YH, Horng MF: Joint robustness and security enhancement for featurebased image watermarking using invariant feature regions. Signal Process. 2012, 92: 14311445. 10.1016/j.sigpro.2011.11.033
Pereia S, Pun T: Robust template matching for affine resistant image watermarks. IEEE Trans. Image Processing 2000, 9(6):11231129. 10.1109/83.846253
Kang X, Huang J, Shi YQ, Lin Y: A DWTDFT composite watermarking scheme robust to both affine transform and JPEG compression. IEEE Trans. Circuit Syst Video Technol 2003, 13(8):776786. 10.1109/TCSVT.2003.815957
Ni J, Wang C, Huang J: A RSTInvariant robust DWTHMM watermarking algorithm incorporating Zernike moments and template. In KES2005: KnowledgeBased Intelligent Information & Engineering Systems. Volume 3681. Edited by: Khosla R, Howlett R, Jain LC. Heidelberg: Springer; 2005:12331239. 10.1007/11552413_176
Ni J, Zhang R, Huang J, Wang C, Li Q: A rotationInvariant secure image watermarking algorithm incorporating steerable pyramid transform. In IWDW2006: Digital Watermarking. Volume 4283. 5th edition. Edited by: Shi YQ, Jeon B. Heidelberg: Springer; 2006:446460. Int’l Workshop on Digital Watermarking, Lecture Notes Computer Science 10.1007/11922841_36
Bogumi D: An asymmetric image watermarking scheme resistant against geometrical distortions. Signal Processing: Image Communication 2006, 21: 5966. 10.1016/j.image.2005.06.005
Fu YG, Shen R, Lu H: Watermarking scheme based on support vector machine for color images. IEE Electronics Letters 2004, 40(16):9867. 10.1049/el:20040600
Tsai HH, Sun DW: Color image watermark extraction based on support vector machines. Inf. Sci. 2007, 177(2):55069. 10.1016/j.ins.2006.05.002
Peng H, Wang J, Wang W: Image watermarking method in multiwavelet domain based on support vector machines. J. Syst. Softw. 2010, 83(8):14707. 10.1016/j.jss.2010.03.006
Simoncelli EP, Freeman WT, Adelson EH, Heeger DJ: Shiftable multiscale transform. IEEE Trans. Information Theory 1992, 38(2):587607. 10.1109/18.119725
Freeman WT, Adelson EH: The design and use of steerable filters. IEEE Trans. PAMI 1991, 13(9):891906. 10.1109/34.93808
Perona P: Deformable kernels for early vision. In Proc. of the third Int. Conf. on Computer Vision and Pattern Recognition (CVPR). Lahaina, Maui; 1991:222227.
Perona P: Deformable kernels for early vision. IEEE Trans. Pattern Anal. Machine Intelligence 1995, 17(5):488499. 10.1109/34.391394
Karasaridis A, Simoncelli EP: A filter design technique for steerable pyramid transform. In Proc. of the 21th International Conference on Acoustics, Speech, and Signal Processing. Volume 4. Atlanta, GA; 1996:23872390.
Portilla J, Strela V, Wainwright MJ, Simoncelli EP: Image denoising using scale mixtures of Gaussians in the wavelet domain. IEEE Trans. Image Processing 2003, 12(11):13381351. 10.1109/TIP.2003.818640
Divsalar D, Jin H, Mceliece RJ: Coding theorems for turbolike codes. In Proc of the 36th Annual Allerton Conf. on Communication, Control and Comp. Monticello, IL; 1998:525539.
Petitcolas FAP: Watermarking schemes evaluation. IEEE Trans. Signal Processing 2000, 17(5):5864. 10.1109/79.879339
Petitcolas FAP, Stir M: IOP Publishing PhysicsWeb, 2012. 2013. . Accessed 16 Feb. 2013 http://www.cl.cam.ac.uk/~fapp2/watermarking/stirmark/
Wang C, Ni J, Zhuo H, Huang J: A geometrically resilient robust image watermarking scheme using deformable multiscale transform. In Proc. of the Intl. Conf. on Image Processing 2010. Hong Kong; 2010:36773680.
Dong P, Brankov JG, Galatsanos NP, Yang Y, Davoine F: Digital Watermarking Robust to Geometric Distortions. IEEE Trans. Image Process. 2005, 14(12):21402150.
Tian H, Zhao Y, Ni R, Pan JS: Spread spectrumbased image watermarking resistant to rotation and scaling using radon transform. In Proc of the Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIHMSP 2010). Darmstadt; 2010:442445.
Acknowledgments
This work is supported by NSFC (nos. 61202467 and 61100170), the National Research Foundation for the Doctoral Program of Higher Education of China (no. 20120171110037), the Key Program of Natural Science Foundation of Guangdong (no. S2012020011114), and the Scientific Research Foundation for Returned Overseas Chinese Scholars (State Education Ministry).
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Wang, C., Ni, J. & Zhang, D. Counteracting geometrical attacks on robust image watermarking by constructing a deformable pyramid transform. EURASIP J. Adv. Signal Process. 2013, 119 (2013). https://doi.org/10.1186/168761802013119
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/168761802013119