- Research
- Open Access

# Counteracting geometrical attacks on robust image watermarking by constructing a deformable pyramid transform

- Chuntao Wang
^{1, 2}Email author, - Jiangqun Ni
^{2}and - Dong Zhang
^{2}

**2013**:119

https://doi.org/10.1186/1687-6180-2013-119

© Wang et al.; licensee Springer. 2013

**Received:**18 March 2013**Accepted:**25 May 2013**Published:**24 June 2013

## Abstract

Counteracting geometrical attacks remains one of the most challenging problems in robust watermarking. In this paper, we resist rotation, scaling, and translation (RST) by constructing a kind of deformable pyramid transform (DPT) that is shift-invariant, steerable, and scalable. The DPT is extended from a closed-form polar-separable steerable pyramid transform (SPT). The radial component of the SPT's basis filters is taken as the kernel of the scalable basis filters, and the angular component is used for the steerable basis filters. The shift-invariance is inherited from the SPT by retaining undecimated high-pass and band-pass subbands. Based on the designed DPT, we theoretically derive interpolation functions for steerability and scalability and synchronization mechanisms for translation, rotation, and scaling. By exploiting the preferable characteristics of DPT, we develop a new template-based robust image watermarking scheme that is resilient to RST. Translation invariance is achieved by taking the Fourier magnitude of the cover image as the DPT's input. The resilience to rotation and scaling is obtained using the synchronization mechanisms for rotation and scaling, for which an efficient template-matching algorithm has been devised. Extensive simulations show that the proposed scheme is highly robust to geometrical attacks, such as RST, cropping, and row/column line removal, as well as common signal processing attacks such as JPEG compression, additive white Gaussian noise, and median filtering.

## Keywords

- Robust watermarking
- RST invariant
- Deformable pyramid transform
- Template
- Robustness

## 1. Introduction

Counteracting geometrical attacks such as rotation, scaling, and translation (RST) remains one of the most challenging problems for robust watermarking. This is because geometrical attacks easily desynchronize the watermark, degrading its robustness dramatically. To address such problems, a number of RST-invariant blind watermarking schemes have been developed over the past two decades. These schemes can be roughly categorized into five paradigms, namely exhaustive search, invariant domain, auto-correlation, feature-based implicit synchronization, and geometrical correction [1–4]. These are briefly described below.

The exhaustive search method [5–7] iteratively corrects each geometrical distortion in the search space and then evaluates the watermark extracted from the geometrically corrected carrier accordingly. This method generally leads to high computational complexity and a large probability of false positives. The invariant domain approach [8–14] eliminates the need to identify geometrical distortions by embedding the watermark in a domain that is invariant to such distortions. However, this method may encounter the issue of interpolation approximation during the geometrically invariant transform. Auto-correlation-based techniques periodically insert the watermark in the cover and use a cross-correlation function to locate the periodic autocorrelation peaks, which indicate the geometrical transform that has been performed. The schemes discussed in [15–17] are typical examples of this category. The fourth category exploits salient features to achieve geometrical synchronization, as presented in the schemes of [3, 10], and [18–20]. Under this approach, the embedder binds the watermark with the geometrically invariant salient features. The watermark is recovered inversely by the receiver, who seeks the salient features that still exist, even after severe geometrical distortion. In general, this category is somewhat robust against geometrical distortions, but it may degrade greatly if the salient feature detection fails. The final category estimates the geometrical distortion parameters, thus permitting geometrical correction and watermark extraction. The template is generally constructed and embedded in the cover, and the geometrical parameters are sought via a particular technique. Several examples are shown in the schemes described in [21–24]. In addition to the template, support vector machines (SVMs) have also been incorporated to obtain geometrical parameters. For example, a number of recently developed schemes [25–28] generate patterns, such as the inserted template and Zernike moments, from geometrically attacked watermarked images. These patterns are then input to the SVM to train the classification model, and finally, the trained model is used to predict the geometrical parameters of the to-be-checked image.

In this paper, we develop a new geometrical correction-based robust image watermarking scheme by constructing a deformable pyramid transform (DPT) that is shift-invariant, steerable, and scalable. This is motivated by the scheme of [24], in which a steerable pyramid transform (SPT) with shift-invariance and steerability [29] is exploited to estimate, with an auxiliary inserted template, the rotation angle. This allows for rotation correction and watermark extraction. Although the scheme in [24] is highly robust against rotation, it cannot resist scaling attacks because of the lack of scalability in SPT. To counteract both the rotation and scaling, a kind of pyramid transform (PT) with shift-invariance, steerability, and scalability is needed. However, such a PT has not, to the best of our knowledge, been reported in the literature. Inspired by this situation, we design a shift-invariant, steerable, and scalable DPT. This is extended from SPT as follows.

We start by introducing the SPT. In essence, SPT is a variant of the wavelet transform (WT). As illustrated in [29], the conventional orthogonal or bi-orthogonal WT is sensitive to translation because of its critical sampling. That is, once the input signal has been translated slightly, its wavelet coefficients are not the translated versions of the original wavelet coefficients, and the information represented within a wavelet subband of the translated signal is not the same as that in the original wavelet subband. To address this issue, Freeman and Adelson [30] proposed a kind of steerable filter that can be used to synthesize any filter at an arbitrary orientation via a linear interpolation. This is termed *steerability*. Furthermore, Perona [31, 32] developed scalable filters that can be used to interpolate any filter on a scale within a certain range, which is called *scalability*. These steerable and scalable filters were further integrated to give deformable filters.

In [29], Simoncelli et al. analyzed the translation invariance of WT and then generalized it to the concept of shiftability. In brief, shiftability implies that any filter at an arbitrary position, orientation, and scale can be obtained through a linear interpolation of the designed shiftable, steerable, and scalable basis filters, respectively. The shiftability in orientation and scale is essentially equivalent to the steerability and scalability proposed in [30–32], respectively. In addition, Simoncelli et al. also proposed the concept of joint shiftability, which allows shiftability in the subset of position, orientation, and scale to be achieved simultaneously. As an illustration of these concepts, they also designed a kind of SPT that is shift-invariant and has shiftability in orientation (i.e., steerability).

In [33], Karasaridis and Simoncelli analyzed constraints for SPT and subsequently designed an SPT under these constraints via a numerical approach. Unfortunately, this SPT has non-perfect reconstruction. In contrast, Portilla et al. [34] developed an SPT with perfect reconstruction.

In summary, the filters developed in [30, 33, 34] are mainly steerable analysis filters or SPTs with steerability but without scalability. Although the filters designed in [30, 31] have both steerability and scalability, they do not incorporate synthesis filters for reconstruction and thus cannot be considered as PTs. To the best of our knowledge, no PTs with shift-invariance, steerability, and scalability have been reported in the literature. In the interest of counteracting RST in robust image watermarking, we are motivated to extend the SPT with shift-invariance and steerability to include scalability. This is termed the DPT for convenience. To this end, we adopt the SPT with perfect reconstruction developed in [34]. The steerable filters of this SPT are represented in a polar-separable form, where the angular components are designed so as to achieve the steerability. This implies that scaling the steerable filters would be equivalent to dilating the radial component. Thus, according to the shiftability framework in [29], we can take the radial component of the SPT's steerable filters as the kernel for constructing the scalable filters. Furthermore, combining the scalable and steerable filters derived from the radial and angular components, respectively, gives rise to the scalability and steerability of the DPT. Its shift-invariance is inherited from the SPT by retaining undecimated high-pass and band-pass coefficients. In this way, we construct a DPT with shift-invariance, steerability, and scalability.

In an attempt to apply the DPT in robust watermarking to counteract RST, we first exploit the shift-invariance, steerability, and scalability of DPT to theoretically derive a mechanism for RST synchronization. As will be shown, the DPT coefficients of the translated signal are the translated versions of those of the original signal, which is the essence of shift-invariance. The relationship between the DPT coefficients of the original signal and those of the rotated and scaled signal is characterized by a linear interpolation function parameterized using the rotation angle and scaling factor.

In this paper, based on the derived RST synchronization mechanism, we develop a new robust image watermarking that is resilient to RST. According to the aforementioned essence of DPT's shift-invariance, the translation of the input signal should affect the synchronization of rotation and scaling. To uncouple the translation from rotation and scaling, we take the Fourier magnitude of the cover image as the input to the DPT. This achieves true translation invariance. Rotation and scaling attacks are counteracted by first deploying the rotation and scaling synchronization mechanism to estimate their parameters. The estimated parameters are then used to correct the rotation and scaling that has been performed. In blind watermarking, the original signal cannot be used by the receiver, so we resort to the template to estimate the parameters of rotation and scaling attacks. Specifically, we insert the template and watermark at level 1 and levels 2 to 3, respectively, of the DPT pyramid in the embedding process. During the detection process, we exploit the rotation and scaling synchronization mechanism to identify the rotation angle and scaling factor, and further use these estimated parameters to correct the rotation and scaling distortions and recover the watermark from the geometrically corrected signal. Extensive experimental results demonstrate that the proposed algorithm is highly robust against geometrical attacks such as RST and exhibits favorable performance against common signal processing, including JPEG compression, median filtering, Gaussian noise, and low-pass filtering. In addition, we observe comparable or higher robustness with respect to other algorithms in the simulation.

The rest of this paper is organized as follows. Section 2 reviews the SPT with shift-invariance and steerability, and the construction of the DPT with shift-invariance, steerability, and scalability is detailed in Section 3. In Section 4, we theoretically derive the RST synchronization mechanism. We describe the proposed robust image watermarking scheme in Section 5, and present our experimental results in Section 6. Finally, the conclusions are discussed in Section 7.

## 2. Steerable pyramid transform with shift-invariance and steerability

In this section, we review the SPT with shift-invariance and steerability. We first describe the constraints for SPT given in [33] and then introduce one closed-form SPT presented in [34].

**ω**= (

*ω*

_{ x },

*ω*

_{ y }) is the frequency vector in the Fourier domain,

*H*

_{0}(

**ω**) and

*L*

_{0}(

**ω**) denote the non-oriented high-pass and low-pass filters, respectively, and

*L*

_{1}(

**ω**) and

*B*

_{ k }(

**ω**) (

*k*= 0, …,

*K*− 1) represent the narrow-band low-pass filter and the oriented band-pass filter, respectively. Eqs. (1-1), (1-2), and (1-3) describe the unit system response amplitude, recursion relationship, and aliasing cancellation, respectively. Furthermore, the following constraint must hold to achieve steerability:

*θ*= arg(

**ω**),

*θ*

_{ k }=

*πk*/

*K*, and $B\left(\mathbf{\omega}\right)=\sqrt{{\displaystyle \sum}_{k=0}^{K-1}{\left|{B}_{k}\left(\mathbf{\omega}\right)\right|}^{2}}.$

*θ*= arg(

**ω**), and

*H*(

*r*) and

*G*

_{ k }(

*θ*) are defined as:

*L*

_{0}(

*r*,

*θ*) and

*H*

_{0}(

*r*,

*θ*) are thus constructed as:

Note that the high-pass filter *H*
_{0}(*r*, *θ*) in [34] is also split into a number of oriented subbands, i.e., *H*
_{0}(*r*, *θ*) = *H*(*r* / 2)*G*
_{
k
}(*θ*). Because the oriented high-pass subbands will not be used in our scheme, they have been equivalently simplified as Eq. (8).

## 3. Design of deformable pyramid transform with shift-invariance, steerability, and scalability

*B*

_{ k }(

*r*,

*θ*). According to the theory of shiftability in [29], scalable basis filters are essentially scaled versions of

*B*

_{ k }(

*r*,

*θ*). As scaling

*B*

_{ k }(

*r*,

*θ*) is equivalent to scaling

*H*(

*r*) according to Eq. (4),

*H*(

*r*) can be taken as the kernel for constructing scalable basis filters. That is, the radial component

*H*(

*r*) in Eq. (4) is used to achieve scalability, and the angular components

*G*

_{ k }(

*θ*) are used to satisfy steerability. Together, this results in the joint steerability and scalability. Furthermore, keeping the high-pass and band-pass subbands undecimated, as in the SPT, yields the property of shift-invariance. Continuing with this line of thought, we can construct the DPT, as shown in Figure 2. This achieves the desired characteristics of shift-invariance, steerability, and scalability. The

*C*

_{ j }(

*r*)(

*j*= 0, 1, …,

*J*− 1) in Figure 2 denote the scalable filters designed from the kernel

*H*(

*r*).

Below, we determine a suitable number of scalable basis filters, *J*, derive the closed-form *C*
_{
j
}(*r*), and obtain the interpolation functions for steerability and scalability.

### 3.1. Construction of scalable basis filters

According to the sufficient and necessary condition of shiftability in [29], the number of basis filters is equal to or greater than the number of Fourier frequencies with non-zero magnitude, where the Fourier frequency denotes the kernel's frequency in the form of an imaginary exponent. Because *H*(*r*) is a piecewise function in the Fourier domain, we determine the number of scalable basis filters in a piecewise fashion, as follows.

*r*∊ (

*π*/4,

*π*/2) where

*H*(

*r*) = cos ((

*π*/2) log

_{2}(2

*r*/

*π*)). Here,

*H*(

*r*) can be treated as a function that has undergone a logarithmic warping operation, i.e.,

*H*(

*r*) = cos(

*ρ*(2

*r*/

*π*)), where

*ρ*(2

*r*/

*π*) =

*π*log

_{2}(2

*r*/

*π*)/2 ∊ (−

*π*/2, 0). Because warping operations do not, according to [29], affect the property of shiftability, the number of scalable basis filters for

*r*∊ (

*π*/ 4,

*π*/ 2) depends on the non-warping kernel $\tilde{H}\left(r\right)=\mathrm{cos}\left(2r/\pi \right)=\left({e}^{j2r/\pi}+{e}^{-j2r/\pi}\right)/2$. Clearly, there are two Fourier frequencies with non-zero magnitude, and thus, the number of scalable basis filters for

*r*∊ (

*π*/ 4,

*π*/ 2) satisfies

*J*≥ 2. For simplicity, we choose

*J*= 2 and construct, according to [29], the two scalable basis filters

*C*

_{ j }(

*r*)(

*j*= 0, 1) as:

*a*

_{ j }(

*a*

_{ j }> 0) meets the constraint in Eq. (10), and

*R*

_{ j }∊ (−

*π*/ 2, 0) is set as:

*C*

_{ j }(

*r*)(

*j*= 0, 1) as:

*a*

_{ j }that satisfy Eq. (14). By simply setting

*a*

_{0}=

*a*

_{1}, we have the following solutions:

*r*∊ (

*π*/ 4,

*π*/ 2) are constructed as:

We proceed to handle the case *r*∊ (0, *π* / 4]. The kernel *H*(*r*) is *H*(*r*) = 0, and thus, *J* ≥ 0 holds. For the case *r*∊ [*π*/2, *π*], *H*(*r*) is represented as *H*(*r*) = 1, and hence, we have *J* ≥ 1. For the convenience of construction, we uniformly adopt *J* = 2 scalable basis filters for all three cases. Under the constraint of Eq. (10), the two scalable basis filters for *r*∊ (0, *π* / 4] and *r*∊ [*π* / 2, *π*] are derived as *C*
_{0}(*r*) = *C*
_{1}(*r*) = 0 and ${C}_{0}\left(r\right)={C}_{1}\left(r\right)=1/\sqrt{2},$ respectively.

### 3.2. Derivation of interpolation function

Under the shiftability framework [29], the interpolation function is parameterized by translation distance, rotation angle, or scaling factor, and will be used to interpolate the filter (response) at an arbitrary spatial position, orientation, or scale. Because the designed DPT is shift-invariant, we mainly derive interpolation functions for steerability and scalability.

*K*= 2 steerable basis filters, i.e.,

*G*

_{0}(

*θ*) = cos (

*θ*) and

*G*

_{1}(

*θ*) = cos(

*θ*−

*π*/ 2) according to Eq. (6). From the sufficient and necessary condition of shiftability [29], the steerable interpolation function

*b*

_{ k }(

*ϕ*) satisfies the following equation:

*ϕ*denotes an arbitrary rotation angle. By requiring that both the real and imaginary parts of Eq. (18) agree, we obtain the following interpolation function for steerability:

*H*(

*r*) and

*C*

_{ j }(

*r*)(

*j*= 0, 1) are piecewise. Thus, the scalable interpolation functions, say

*s*

_{ j }(

*σ*), should also be piecewise, where

*σ*(

*σ*> 0) is an arbitrary scaling factor. We first handle the case

*r*∊ (

*π*/4,

*π*/ 2). As analyzed in Section 3.1, the Fourier frequency with non-zero amplitude merely depends on that before the un-warping operation. Therefore, the Fourier frequency in this case is equal to

*k*= 2 /

*π*. According to [29],

*s*

_{ j }(

*σ*) satisfies the following equation:

*R*

_{ j }(

*j*= 0, 1) is defined in Eq. (12). Given that both the real and imaginary parts of Eq. (20) agree, we obtain

*r*∊ (0,

*π*/4], no Fourier frequency has non-zero amplitude, and hence,

*s*

_{ j }(

*σ*) can be any value. In our scheme, we simply set

*s*

_{ j }(

*σ*) = 0 for

*r*∊ (0,

*π*/ 4]. For the case

*r*= ∊ [

*π*/ 2,

*π*], the Fourier frequency with non-zero amplitude is

*k*= 0. As a result, we have

Because *C*
_{0}(*r*) = *C*
_{1}(*r*) has been adopted in the DPT construction, we similarly set *s*
_{0}(*σ*) = *s*
_{1}(*σ*) and obtain *s*
_{0}(*σ*) = *s*
_{1}(*σ*) = 1 / 2.

*ϕ*and scale

*σ*, say

*F*

^{ ϕ, σ }(

*r*,

*θ*), via the following construction:

where (*r*, *θ*) are the polar coordinates in the Fourier domain. For convenience, Eq. (24) is called the *deformable interpolation*.

*j*,

*k*∊ {0, 1};

*l*= 1, 2, …) denotes the DPT basis subband at the

*l*th pyramid level. The filter response at orientation

*ϕ*and scale

*σ*can then be obtained via the deformable interpolation as:

Although both Eqs. (24) and (25) are represented in the Fourier domain, performing the inverse Fourier transform on them leads to a straightforward interpolation expressed in the spatial-frequency domain.

## 4. Mechanism for geometrical synchronization

In this section, in an attempt to counteract geometrical attacks in robust watermarking, we exploit the characteristics of shift-invariance, steerability, and scalability in the DPT to theoretically derive synchronization mechanisms for translation, rotation, and scaling. The derivation is as follows.

### 4.1. Synchronization for translation

Let *I*(*x*, *y*) and ${I}^{{x}_{0},{y}_{0}}\left(x,y\right)$ be the original image and its translated version, respectively, i.e., ${I}^{{x}_{0},{y}_{0}}\left(x,y\right)={\mathcal{T}}_{{x}_{0},{y}_{0}}\left[I\left(x,y\right)\right]=I\left(x-{x}_{0},y-{y}_{0}\right)$, where (*x*
_{0}, *y*
_{0}) is the translation distance and ${\mathcal{T}}_{{x}_{0},{y}_{0}}\left[\cdot \right]$ is the translation operator. The corresponding Fourier transforms (FTs) are denoted as *I*(*ω*
_{
x
}, *ω*
_{
y
}) and $I\left({\omega}_{x},{\omega}_{y}\right){e}^{-j\left({\omega}_{x}{x}_{0}+{\omega}_{y}{y}_{0}\right)}$, respectively.

**ω**

^{1}= (

*ω*

_{ x },

*ω*

_{ y }) represents the coordination at the first (finest) level of the DPT pyramid. Its corresponding coordination at the

*l*th (

*l*≥ 1) pyramid level is then computed as ${\mathbf{\omega}}^{l}=\left({\omega}_{x}^{l},{\omega}_{y}^{l}\right)=\left({\omega}_{x}/{2}^{l-1},{\omega}_{y}/{2}^{l-1}\right)$ (see also Figure 2). Suppose that ${Q}_{\mathit{jk}}^{l}\left({\mathbf{\omega}}^{l}\right)$ and ${Q}_{\mathit{jk}}^{l,{x}_{0},{y}_{0}}\left({\mathbf{\omega}}^{l}\right)$ (

*j*,

*k*∊ {0, 1};

*l*= 1, 2, …) are the DPT basis subbands in the Fourier domain for

*I*(

*x*,

*y*) and ${I}^{{x}_{0},{y}_{0}}\left(x,y\right)$, respectively. According to Figure 2, we have

where ${q}_{\mathit{jk}}^{l,{x}_{0},{y}_{0}}\left(x,y\right)$ and ${q}_{\mathit{jk}}^{l}\left(x,y\right)$ are inverse FTs of ${Q}_{\mathit{jk}}^{l,{x}_{0},{y}_{0}}\left({\omega}_{x},{\omega}_{y}\right)$ and ${Q}_{\mathit{jk}}^{l}\left({\omega}_{x},{\omega}_{y}\right)$ respectively.

Equation (28) implies that the DPT basis subband ${q}_{\mathit{jk}}^{l,{x}_{0},{y}_{0}}\left(x,y\right)$ in the spatial-frequency domain for the translated input signal ${I}^{{x}_{0},{y}_{0}}\left(x,y\right)$ is also the translated version of ${q}_{\mathit{jk}}^{l}\left(x,y\right)$ for the original input signal. This is the essence of shift-invariance in the DPT.

### 4.2. Synchronization for rotation and scaling

According to the construction of shift-invariance in the DPT, the translation should affect the synchronization of rotation and scaling. To uncouple the translation from rotation and scaling, we adopt the Fourier magnitude of the input signal as the DPT's input, which in turn achieves the real translation invariance. Under such a setting, we derive the synchronization mechanism for rotation and scaling as follows.

Denote ${I}^{\varphi ,\sigma}\left(x,y\right)={\mathcal{G}}_{\varphi ,\sigma}\left[I\left(x,y\right)\right]$ as e rotated and scaled version of the original image *I*(*x*, *y*), where ${\mathcal{G}}_{\varphi ,\sigma}\left[\cdot \right]$ is an operator that rotates counter-clockwise by *ϕ* and dilates by *σ* about the origin. Let *M*(*ω*
_{
x
}, *ω*
_{
y
}) and *M*
^{
ϕ,1/σ
}(*ω*
_{
x
}, *ω*
_{
y
}) be the Fourier magnitude of *I*(*x*, *y*) and *I*
^{
ϕ,σ
}(*x*, *y*), respectively. Then, we have ${M}^{\varphi ,1/\sigma}\left({\omega}_{x},{\omega}_{y}\right)={\mathcal{G}}_{\varphi ,1/\sigma}\left[M\left({\omega}_{x},{\omega}_{y}\right)\right]$ according to the property of the FT.

**ω**

^{ l }= (

*ω*

_{ x }/2

^{ l − 1},

*ω*

_{ y }/2

^{ l − 1}) denote the frequency coordinate at the

*l*th (

*l*≥ 1) level of the DPT pyramid. Assume that

*M*

^{ ϕ,1/σ }(

**ω**

^{1}) and

*M*(

**ω**

^{1}) are decomposed via DPT into

*l*(

*l*≥ 1) pyramid levels to yield the basis subbands ${Q}_{\mathit{jk}}^{l,\varphi ,\sigma}\left({\mathbf{\omega}}^{l}\right)$ and ${Q}_{\mathit{jk}}^{l}\left({\mathbf{\omega}}^{l}\right)\left(j,k\in \left\{0,1\right\};l=1,2,\dots \right)$, respectively. By virtue of the steerable and scalable properties in Eq. (25), we use ${Q}_{\mathit{jk}}^{l}\left({\mathbf{\omega}}^{l}\right)\phantom{\rule{0.25em}{0ex}}$ to interpolate the response at orientation

*ψ*and scale

*λ*as:

*ϕ*+

*ψ*and scale

*λ*/

*σ*as:

*F*

^{ l,ψ,λ }(

**ω**

^{ l }) represents the filter at orientation

*ψ*and scale

*λ*in the

*l*th level of the multi-scale DPT (see also Figure 2). This is actually the rotated and scaled version of the kernel ${F}^{l,0,{R}_{0}}\left({\mathbf{\omega}}^{l}\right)={L}_{0}\left({\mathbf{\omega}}^{l}\right){\left({L}_{1}\left({\mathbf{\omega}}^{l}\right)\right)}^{l-1}{C}_{0}\left({\mathbf{\omega}}^{l}\right){G}_{0}\left({\mathbf{\omega}}^{l}\right)$ at orientation 0 and scale

*R*

_{0}(see also Eq. (12)). In other words, ${F}^{l,\psi ,\lambda}\left({\mathbf{\omega}}^{l}\right)={\mathcal{G}}_{\psi ,\lambda /{R}_{0}}\left[{F}^{l,0,{R}_{0}}\left({\mathbf{\omega}}^{l}\right)\right]$ holds and so does ${F}^{l,\varphi +\psi ,\lambda /\sigma}\left({\mathbf{\omega}}^{l}\right)={\mathcal{G}}_{\varphi +\psi ,\lambda /\left(\sigma {R}_{0}\right)}\left[{F}^{l,0,{R}_{0}}\left({\mathbf{\omega}}^{l}\right)\right]$. Therefore, we have

where **s**
^{
l
} = (*x*/2^{
l−1}, *y*/2^{
l−1}) is the coordination in the spatial-frequency domain at pyramid level *l*, and *q*
^{
l,ϕ+ψ,σ/λ
}(**s**
^{
l
}) and *q*
^{
l,ψ,1/λ
}(**s**
^{
l
}) are the inverse FTs of *Q*
^{
l,ϕ+ψ,λ/σ
}(**ω**
^{
l
}) and *Q*
^{
l,ψ,λ
}(**ω**
^{
l
}), respectively.

Based on Eq. (32), the synchronization for rotation and scaling can be performed as follows: decompose the Fourier magnitude of *I*
^{
ϕ,σ
}(*x*, *y*) into an *l*-level DPT pyramid to generate the basis subbands ${Q}_{\mathit{jk}}^{l,\varphi ,\sigma}\left({\mathbf{\omega}}^{l}\right)\left(j,k\in \left\{0,1\right\};l=1,2,\dots \right)$. Then, interpolate the response at orientation *ϕ* + *ψ* and scale *λ*/*σ* as ${Q}_{\mathit{jk}}^{l,\varphi +\psi ,\lambda /\sigma}\left({\mathbf{\omega}}^{l}\right)$. Finally, successively rotate counter-clockwise by -*ϕ* and dilate by *σ* the interpolated subband ${Q}_{\mathit{jk}}^{l,\varphi +\psi ,\lambda /\sigma}\left({\mathbf{\omega}}^{l}\right)$ to yield the response *Q*
^{
l,ψ,λ
}(**ω**
^{
l
}) at orientation *ψ* and scale *λ*. The *Q*
^{
l,ψ,λ
}(**ω**
^{
l
}) is equivalent to the subband at orientation *ψ* and scale *λ* that is synthesized from the DPT basis subbands ${Q}_{\mathit{jk}}^{l}\left({\mathbf{\omega}}^{l}\right)$ of the original image *I*(*x*, *y*). The rotation and scaling synchronization using Eq. (33) is similar to that based on Eq. (32).

## 5. Proposed robust watermarking scheme

In this section, we present the proposed robust image watermarking algorithm, which is RST-resilient. The translation invariance is achieved by taking the Fourier magnitude of the cover image *I*(*x*, *y*) as the DPT input, and the rotation and scaling are counteracted using the inserted template and the rotation and scaling synchronization. The details are given below, where only *K* = 2 steerable basis filters are adopted to reduce the computational complexity of the rotation and scaling synchronization.

### 5.1. Template and watermark inserti

*I*(

*x*,

*y*) is

*H*×

*W*. To obtain favorable resolution for template matching, we symmetrically pad (crop) the rows/columns of

*I*(

*x*,

*y*) to the size of 1,024 if the height/width,

*H*/

*W*, is smaller (larger) than 1,024. We then calculate its Fourier magnitude

*M*(

*ω*

_{ x },

*ω*

_{ y }) and phase

*Ψ*(

*ω*

_{ x },

*ω*

_{ y }), and further decompose

*M*(

*ω*

_{ x },

*ω*

_{ y }) into a three-level DPT pyramid to generate the spatial-frequency basis subbands

*q*

_{ jk }

^{ l }(

*x*,

*y*)(

*j*,

*k*∊ {0, 1};

*l*= 1, 2, 3). Among these, the subbands at the first (finest) level,

*q*

_{ jk }

^{1}(

*x*,

*y*), are used for template insertion, whereas those at the other two levels,

*q*

_{ jk }

^{ l }(

*x*,

*y*)(

*l*= 2, 3), are for watermark embedding. We chose to embed in the spatial-frequency domain instead of the Fourier domain because the symmetry of the Fourier magnitude would decrease the number of candidate coefficients for watermarking and thus the embedding capacity. The template and watermark embedding process is illustrated in Figure 3, which is explained as follows.

#### 5.1.1. Template embedding

- (1)
Generate, via a secret key

*KEY*_{ t 1}, a random sequence**P**= {*p*_{ i }∊ {+ 1, − 1},*i*= 1, …,*N*_{ t }} of length*N*_{ t }as the template. - (2)To enhance the security, we tune
*q*_{ jk }^{1}(*x*,*y*) to the predefined secret orientation*θ*_{ t }and scale*σ*_{ t }and obtain ${q}^{1,{\theta}_{t},{\sigma}_{t}}\left(x,y\right)$. According to the steerability and scalability in Eq. (25), we have$\begin{array}{ll}\phantom{\rule{.5em}{0ex}}{q}^{1,{\theta}_{t},{\sigma}_{t}}=& cos{\theta}_{t}\cdot {\mathcal{F}}^{-1}\left({s}_{0}\left({\sigma}_{t}\right){Q}_{00}^{1}+{s}_{1}\left({\sigma}_{t}\right){Q}_{10}^{1}\right)\\ +sin{\theta}_{t}\cdot {\mathcal{F}}^{-1}\left({s}_{0}\left({\sigma}_{t}\right){Q}_{01}^{1}+{s}_{1}\left({\sigma}_{t}\right){Q}_{11}^{1}\right),\end{array}$(34)where

*Q*_{ jk }^{1}(*ω*_{ x },*ω*_{ y }) denotes the FT of*q*_{ jk }^{1}(*x*,*y*) and ${\mathcal{F}}^{-1}\left(\cdot \right)$ is the inverse FT. Note that the coordinates in Eq. (34) are omitted for compactness. - (3)Randomly select
*N*_{ t }template positions from ${q}^{1,{\theta}_{t},{\sigma}_{t}}\left(x,y\right)$ using a secret key*KEY*_{ t 2}, which is denoted as*PS*= {(*x*_{ i },*y*_{ i }),*i*= 1, 2, …,*N*_{ t }}. As a trade-off between robustness and imperceptibility, we prefer the (*x*_{ i },*y*_{ i }) located in the spatial-frequency region with normalized radius*r*∊ (*π*/4,*π*/2). Then, embed the template in the selected positions using${u}^{1,{\theta}_{t},{\sigma}_{t}}\left({x}_{i},{y}_{i}\right)={q}^{1,{\theta}_{t},{\sigma}_{t}}\left({x}_{i},{y}_{i}\right)+{\beta}_{t}{p}_{i},$(35)where

*β*_{ t }is the embedding strength. - (4)
Tune ${u}^{1,{\theta}_{t},{\sigma}_{t}}\left({x}_{i},{y}_{i}\right)$ backward to obtain the watermarked basis subbands

*u*_{ jk }^{1}(*x*,*y*). This, however, is non-trivial for the following two reasons. First, it is difficult to interpolate Eq. (35) backward to yield four embedded basis subbands*u*_{ jk }^{1}(*x*,*y*)(*j*,*k*∊ {0, 1}). Second,*s*_{ j }(*σ*_{ t }) is a piecewise function with respect to ${\mathcal{F}}^{-1}\left(\cdot \right)$, according to Eq. (23), and thus, the interpolation in Eq. (34) cannot be implemented directly in the spatial-frequency domain. The latter situation implies that multiple FTs are required to complete the template insertion. This will significantly degrade the performance of brute-force template matching by the receiver and consequently make the template matching unaffordable.

*s*

_{ j }(

*σ*

_{ t }), e.g., setting

*s*

_{ j }(

*σ*

_{ t }) to a fixed value

*u*(

*u*> 0), which turns Eq. (34) into

*s*

_{ j }(

*σ*

_{ t }) is piecewise, we determine a suitable

*u*in a piecewise manner. As pointed out in Section 3.2, for

*r*∊ [0,

*π*/4],

*s*

_{ j }(

*σ*

_{ t }) can be any value. Thus, we merely consider the cases of

*r*∊ (

*π*/4,

*π*/2) and

*r*∊ [

*π*/2,

*π*]. For the case

*r*∊ [

*π*/2,

*π*], the setting

*s*

_{ j }(

*σ*

_{ t }) = 1/2 is already a fixed value. For

*r*∊ (

*π*/4,

*π*/2), taking Eqs. (16) and (5) into account, we calculate the expression

*s*

_{0}(

*σ*

_{ t })

*Q*

_{0k }

^{1}+

*s*

_{1}(

*σ*

_{ t })

*Q*

_{1k }

^{1}in Eq. (34) as:

Given that the scale range concerned in our scheme is [0.5, 2] (a broader scale range would degrade the robustness to scaling attacks), the value of $\left({s}_{0}\left({\sigma}_{t}\right)+\sqrt{3}{s}_{0}\left({\sigma}_{t}\right)\right)/\left(1+\sqrt{3}\right)$ is in the range [0.69, 0.95]. Thus, we roughly set *s*
_{
j
}(*σ*
_{
t
}) = *u* = 0.7, which approximates the *s*
_{
j
}(*σ*
_{
t
}) in the cases of *r*∊ (*π*/4, *π*/2) and *r*∊ [*π*/2, *π*]. Although such an approximation will lead to interpolation errors, it is demonstrated to be feasible by the extensive experimental results in Section 6.

*q*

_{ jk }

^{1}(

*x*

_{ i },

*y*

_{ i }) as follows:

which avoids both the forward and backward interpolations and solves the problem that exists in the backward interpolation.

#### 5.1.2. Watermark embedding

- (1)
Generate

*N*_{ m }random bits**b**= {*b*_{ i },*i*= 1, …,*N*_{ m }} as the message using a secret key*KEY*_{ w 1}. - (2)
Encode

**b**with the repeat-accumulate (RA) code of rate*rate*[35] to generated the encoded binary sequence**e**= {*e*_{ i },*i*= 1, …,*N*_{ m }/*rate*}, where RA is a kind of code with excellent codec performance. - (3)
Because there exists a natural quad-tree structure between

*q*_{ jk }^{3}(*x*,*y*)(*j*,*k*∊ {0, 1}) and {*q*_{ jk }^{2}(2*x*− 1, 2*y*− 1),*q*_{ jk }^{2}(2*x*, 2*y*− 1),*q*_{ jk }^{2}(2*x*− 1, 2*y*),*q*_{ jk }^{2}(2*x*, 2*y*)}, we group the four quad-trees from four different subbands*q*_{ jk }^{ l }(*x*,*y*) together to form a 20-element vector tree**T**_{ i }= {**T**_{ iv },*v*= 1, …, 20}(*i*= 1, …, 1024 × 1024/16), as illustrated in Figure 4, where the child coefficients of*q*_{00}^{3}(*x*,*y*) are listed but the other child coefficients are omitted from the figure for compactness. In our scheme, each vector tree is taken as the basic unit for watermarking. This is an attempt to achieve a reasonable trade-off between robustness and embedding capacity. - (4)
In the interest of resisting against cropping, we choose, via a secret key

*KEY*_{ w 2},*N*_{ m }/*rate*vector trees located in the central region for watermark insertion. Assume that each vector tree**T**_{ i }is inserted with one encoded bit*e*_{ i }(*e*_{ i }= 0, 1). We then need to take a 20-element vector to represent*e*_{ i }. To enhance the watermarking detection performance, we set the 20-element vector ${\mathbf{w}}_{0}=\left\{\underset{20"-1"s}{\underset{\u23df}{-1,\dots ,-1}}\right\}$ for*e*_{0}and ${\mathbf{w}}_{1}=\left\{\underset{20"+1"s}{\underset{\u23df}{+1,\dots ,+1}}\right\}$ for*e*_{1}, which achieves the maximum codeword distance and thus decreases the detection error probability. - (5)Associate the allocated bit
*e*_{ i }to ${\mathbf{w}}_{{e}_{i}}$ and perform the embedding as follows:${\mathbf{Y}}_{i}={\mathbf{T}}_{i}+{\beta}_{w}{\mathbf{w}}_{{e}_{i}},$(39)where**Y**_{ i }is the watermarked vector tree and*β*_{ w }is a non-adaptive embedding strength because, to the best of our knowledge, no suitable human visual model has been reported in the literature for the situation in our scheme. Equation (39) is equivalently written as:$\begin{array}{ll}\phantom{\rule{1em}{0ex}}{u}_{\mathit{jk}}^{l}\left({x}_{\mathit{iv}},{y}_{\mathit{iv}}\right)=& {q}_{\mathit{jk}}^{l}\left({x}_{\mathit{iv}},{y}_{\mathit{iv}}\right)\\ +{\beta}_{w}\left(2{e}_{i}-1\right),\phantom{\rule{.2em}{0ex}}v=1,\dots ,20,\phantom{\rule{0.2em}{0ex}}l=2,3,\end{array}$(40)where (

*x*_{ i },*y*_{ i }) is the coordination corresponding to the*v*th element of**T**_{ i }. - (6)
Embed all bits

*e*_{ i }into the chosen vector trees by iteratively implementing step 5. - (7)
Finally, perform the inverse DPT on the watermark-inserted basis subbands

*u*_{ jk }^{ l }(*l*= 2, 3) and the template-embedded ones*u*_{ jk }^{1}in Section 5.1.1 to obtain the watermarked Fourier magnitude*M*_{ w }(*ω*_{ x },*ω*_{ y }). - (8)
Multiply

*M*_{ w }(*ω*_{ x },*ω*_{ y }) by the original phase*Ψ*(*ω*_{ x },*ω*_{ y }) and perform the inverse FT to obtain the watermarked image*I*_{ w }^{ pre }(*x*,*y*) of size 1,024 × 1,024. - (9)
Execute the inverse padding (cropping) operation on

*I*_{ w }^{ pre }(*x*,*y*) to obtain the final watermarked image,*I*_{ w }(*x*,*y*), of size*H*×*W*.

### 5.2. Efficient template-matching algorithm

Because translation invariance has been achieved by taking the Fourier magnitude of the cover image as the DPT input, we merely use the inserted template to estimate the rotation angle and scaling factor. These will be used to correct the rotation and scaling before watermark extraction. Based on the synchronization mechanisms for rotation and scaling in Section 4.2, we develop the efficient template-matching algorithm as follows.

Assume that the received image is *I*
_{
r
}(*x*, *y*). We first preprocess *I*
_{
r
}(*x*, *y*) with the same method as in the embedding stage to give an image size of 1,024 × 1,024. We then calculate the Fourier magnitude of the preprocessed image and decompose the resulting magnitude into a one-level DPT pyramid. This is because only the template inserted at level 1 is required for template matching. This yields the DPT basis subbands *q*
_{
jk
}
^{1}(*x*, *y*)(*j*, *k*∊ {0, 1}). According to Eq. (36), the template matching for rotation and scaling estimation can be performed as follows. The basis subbands *q*
_{
jk
}
^{1}(*x*, *y*) are tuned to any candidate orientation and scale, and the tuned subband is then inversely rotated and dilated. The template is extracted accordingly to compute the correlation with the original template. After all candidate rotation angles and scaling factors have been searched in this way, the orientation and scale corresponding to the maximum correlation are adopted as the estimated parameters for rotation and scaling.

- (1)
Set the range [−180, 180) with step

*Δ*_{ ϕ }(e.g.,*Δ*_{ ϕ }= 0.5) as the search space for the rotation angle, and [*σ*_{1},*σ*_{2}] (e.g., [0.5, 2.0]) with step*Δ*_{ σ }(e.g.,*Δ*_{ σ }= 0.01) as that for the scaling factor. Initialize the search parameters as ϕ = − 180 and*σ*=*σ*_{1}. - (2)For each parameter pair (ϕ,
*σ*), compute the candidate template position as:$\begin{array}{l}{{x}^{\prime}}_{i}=\mathit{round}\left(\right(\left({x}_{i}-\mathit{cx}\right)cos\varphi +({y}_{i}-\mathit{cy})sin\varphi )/\sigma +\mathit{cx})\\ {{y}^{\prime}}_{i}=\mathit{round}\left(\right(\left({y}_{i}-\mathit{cy}\right)cos\varphi -({x}_{i}-\mathit{cx})sin\varphi )/\sigma +\mathit{cy}),\end{array}$(41)where (

*x*_{ i },*y*_{ i })(*i*= 1, 2, …,*N*_{ t }) denotes the original template coordinates determined by key*KEY*_{ t 2}, (*cx*,*cy*) is the geometrical center, and*round*(⋅) is the rounding operation. - (3)Obtain, via the steerability and scalability in Eq. (36), the coefficients at location (
*x*′_{ i },*y*′_{ i }) as:$\begin{array}{ll}\phantom{\rule{1.2em}{0ex}}{q}^{1,\varphi ,\sigma}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)=& ucos\varphi \left({q}_{00}^{1}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)+{q}_{10}^{1}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)\right)\\ +usin\varphi \left({q}_{01}^{1}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)+{q}_{11}^{1}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)\right).\end{array}$(42)

- (4)Calculate the correlation between the extracted and original templates as:$\mathit{Corr}\left(\varphi ,\sigma \right)={\displaystyle \sum _{i=1}^{{N}_{t}}{q}^{1,\varphi ,\sigma}\left({{x}^{\prime}}_{i},{{y}^{\prime}}_{i}\right)}\phantom{\rule{0.5em}{0ex}}\cdot {p}_{i}$(43)

- (5)
Increase the candidate scale

*σ*to*σ*=*σ*+*Δ*_{ σ }while keeping ϕ unchanged. Repeat steps 2 to 4 until*σ*≥*σ*_{2}. - (6)
Augment the candidate rotation angle ϕ by

*Δ*_{ϕ}, i.e., ϕ = ϕ + Δ_{ϕ}, and re-execute steps 2 to 5 until ϕ ≥ 180. - (7)
Find the maximum correlation value Corr(ϕ,

*σ*)_{max}and take the corresponding geometrical parameters (ϕ_{est},*σ*_{est}) as the estimated rotation angle and scaling factor. - (8)
Calculate the real parameters of rotation and dilation attacks as ϕ

_{attack}= ϕ_{est}−*θ*_{ t }and*σ*_{attack}=*σ*_{est}/*σ*_{ t }, respectively. This is because ϕ_{est}and*σ*_{est}are essentially, according to Section 4.2, equal to ϕ_{est}=*θ*_{ t }+ ϕ_{attack}and*σ*_{est}=*σ*_{ t }*σ*_{attack}, respectively, where*θ*_{ t }, ϕ_{attack},*σ*_{ t }, and*σ*_{attack}correspond to*ψ*, ϕ, 1/*λ*, and*σ*in Section 4.2, respectively.

Although the above template-matching algorithm only addresses symmetrical scaling, i.e., the scaling factors along the *x*- and *y*-axes are the same, it can easily be extended to the situation with different scaling factors. To this end, set the parameter space (*ϕ*, *σ*
_{
x
}, *σ*
_{
y
}) with *ϕ*∊ [−180, 180) and *σ*
_{
x
}, *σ*
_{
y
}∊ [*σ*
_{1}, *σ*
_{2}], and then search each parameter space successively, similar to the above algorithm. Nevertheless, this would significantly increase the computational complexity.

### 5.3. Geometrical correction and watermark extraction

*M*

^{ θ,1/σ }(

*ω*

_{ x },

*ω*

_{ y }) denotes the Fourier magnitude of the preprocessed image of size 1,024 × 1,024. We then correct

*M*

^{ θ,1/σ }(

*ω*

_{ x },

*ω*

_{ y }) by rotating counter-clockwise by −

*ϕ*

_{attack}and scaling by

*σ*

_{attack}about the origin, and obtain the Fourier magnitude

*M*

^{0,1}(

*ω*

_{ x },

*ω*

_{ y }) corresponding to the original watermarked image at orientation 0 and scale 1. Next, the watermark is recovered from

*M*

^{0,1}(

*ω*

_{ x },

*ω*

_{ y }) as follows.

- (1)
Decompose

*M*^{0,1}(*ω*_{ x },*ω*_{ y }) into a three-level pyramid via the DPT, which yields the DPT basis subbands*q*_{ jk }^{ l }(*x*,*y*)(*l*= 1, 2, 3;*j*,*k*∊ {0, 1}). - (2)
Use

*q*_{ jk }^{ l }(*x*,*y*) to construct the 20-element vector trees**Z**_{ i }(*i*= 1, …, 1,024 × 1,024/16) in the same way as watermark embedding. Choose*N*_{ m }/*rate*vector trees located in the central area via the secret key*KEY*_{ w 2}. - (3)For each selected vector tree
**Z**_{ i }, extract the encoded message bits as follows:$\sum _{v=1}^{20}{\mathbf{Z}}_{\mathit{iv}}{\mathbf{w}}_{0v}}\phantom{\rule{0.5em}{0ex}}\begin{array}{c}\stackrel{b=0}{>}\\ \underset{b=1}{<}\end{array}\phantom{\rule{1em}{0ex}}{\displaystyle \sum _{v=1}^{20}{\mathbf{Z}}_{\mathit{iv}}{\mathbf{w}}_{1v}},$(44)where

**w**_{ b }(*b*= 0, 1) are as in Eq. (39). - (4)
After completing the extraction of encoded message bits from all

*N*_{ m }/*rate*selected vector trees, run the RA decoding to recover the raw message $\widehat{b}$.

## 6. Experimental results and discussion

*N*

_{ t }= 105 random bits and is inserted in positions with normalized radiuses

*r*∊ {0.3, 0.35, 0.4} and angles

*θ*∊ {1:1:10, 15:10:95, 100:17:359}, where 1, 10, and 17 denote the secret step. The watermark is a sequence of 720 bits that is formed by encoding the

*N*

_{ m }= 60 random message bits with the RA code of rate

*rate*= 1/12. The embedding strengths

*β*

_{ t }and

*β*

_{ w }are adjusted image-by-image such that the peak signal-to-noise ratio (PSNR) is 40 dB. Figure 5 illustrates several watermarked images. This figure demonstrates that the images watermarked by the proposed scheme have feasible visual fidelity. The mean and variance of all PSNRs are calculated as 40.01 dB and 0.01, respectively.

For the 20 generated watermarked images, we impose geometrical attacks (e.g., rotation, scaling, cropping) and common signal processing attacks (e.g., JPEG compression, additive white Gaussian noise (AWGN), median filtering, convolution filtering). We then deploy the efficient template-matching algorithm in Section 5.2 to achieve rotation and scaling synchronization, where the search spaces for rotation and scaling are set as *ϕ*∊ [−180, 180) with step *Δ*
_{
ϕ
} = 0.5 and *σ*∊ [0.5, 2.0] with *Δ*
_{
σ
} = 0.01, respectively. The performance against these attacks is summarized below.

### 6.1. Performance against geometrical attacks

As translation invariance can be theoretically ensured by taking the Fourier magnitude of the cover image as the DPT input, the translation is no longer assessed in this paper. We mainly examine the performance against geometrical attacks such as rotation, scaling, cropping, and row/column line removal, which is practically implemented in StirMark [36, 37].

In StirMark, rotation attacks include rotation without auto cropping, rotation with auto cropping, and rotation with auto cropping and scaling. For these three types of attack, we set the rotation angles as ±2°, ±1°, ±0.75°, ±0.5°, ±0.25°, 45°, and 90°. We then use the efficient template-matching algorithm in Section 5.2 to estimate the rotation angle and scaling factor, followed by geometrical correction and watermark extraction. The experimental simulation shows that the bit error rates (BERs) for all concerned parameters are exactly 0, which demonstrates the high robustness of the proposed scheme to differently implemented rotations.

In addition, we also examine row/column line removal attacks, which are considered to be a kind of geometrical manipulation resulting in local distortions. The frequencies of the removed row and column lines are set in the range [10,100] with step size 10. Simulation results show that the proposed scheme can successfully counteract this attack by achieving BER = 0.

### 6.2. Performance against common signal processing attacks

**Averaged performance against median filtering**

Filter size | 2 | 3 | 4 | 5 |

Average BER | 0 | 0 | 0.276 | 0.276 |

Finally, we test the robustness of the proposed scheme to convolution filtering, which includes sharpening and Gaussian filtering. The simulation result shows that the average BER for sharpening is exactly 0 and that for Gaussian filtering is 0.143. This demonstrates that the proposed scheme is totally insensitive to sharpening but is not sufficiently robust to Gaussian filtering.

### 6.3. Computation time evaluation

In this section, we evaluate the computation time of the proposed scheme. As described in Section 5, the proposed scheme consists of message embedding and extraction processes. As the computation time for the message embedding process is much less than that for the message extraction process, we mainly examine the computation time for message extraction.

The message extraction process includes two stages, namely template matching and message recovery. As the former stage takes up most of the computation time of the message extraction process, below focuses on the analysis on the computation time of template matching. According to Section 5.2, there are total *N*
_{
ϕ
} = 360/*Δ*
_{
ϕ
} candidate rotation angles and *N*
_{
σ
} = (*σ*
_{2} − *σ*
_{1})/*Δ*
_{
σ
} candidate scaling factors. For each candidate rotation angle, all candidate scaling factors are required to search. Therefore, the computational complexity, in unit of *N*
_{
t
}-dimensional correlation calculation, of template matching can be represented as *O*(*N*
_{
ϕ
}
*N*
_{
σ
}).

To illustrate the computation time of template matching, we perform the following experimental simulation. As set in Section 6.1, we adopt *Δ*
_{
ϕ
} = 0.5, *σ*
_{1} = 0.5, *σ*
_{2} = 2.0, *Δ*
_{
σ
} = 0.01, and *N*
_{
t
} = 105 for simulation. Under these settings, we evaluate the computation time of template matching by executing the Matlab code on an Intel personal computer (Intel, Santa Clara, CA, USA) with 2.2-GHz core(TM) 2 Duo CPU and 2-GB memory. The computation time averaged over 10 runs is 4.34 s. This implies that although a brute-force searching approach is employed for template matching, its computation time is not as high as the intuition believes. This is because only a number of template points are incorporated in template matching, and thus, feasible computation time is achieved.

### 6.4. Comparison to related schemes

To further evaluate the proposed watermarking scheme, we compare it with those in [21], [38], and [14], which are denoted as PP-AFFINE, WNZH-DMST, and NIKO-SFND for notational convenience, respectively. PP-AFFINE is, as surveyed in [1], a typical template-based watermarking algorithm with high robustness against affine attacks. WNZH-DMST is another template-based watermarking approach incorporating the deformable multi-scale transform (DMST) that is somewhat similar to the DPT. NIKO-SFND is a kind of salient-feature and normalization-based watermarking scheme with excellent performance against both local and global distortions.

We start with the comparison to PP-AFFINE. In this scheme, three 512 × 512-grey images, namely Baboon, Lena, and Boat, are adopted for performance assessment. Both a 60-bit message and a 14-point template are inserted in the Fourier domain, and the resulting watermarked images have PSNRs no greater than 38 dB. To ensure a fair comparison, we also embed a 60-bit message via the proposed scheme and adjust the embedding strength adaptively to make the PSNRs close to 38 dB. We then employ the same evaluation as [21] for performance comparison.

**Performance against common processing attacks in StirMark 4.1**

Attack | Tank | Globe | Lena | Man | Zelda | |||||
---|---|---|---|---|---|---|---|---|---|---|

DMST | DPT | DMST | DPT | DMST | DPT | DMST | DPT | DMST | DPT | |

JPEG QF | 19 | 16 | 12 | 11 | 17 | 15 | 20 | 15 | 13 | 14 |

AWGN | 4 | 4 | 5 | 5 | 5 | 4 | 5 | 5 | 4 | 4 |

MedianCut | 3 × 3 | 3 × 3 | 3 × 3 | 3 × 3 | 3 × 3 | 3 × 3 | 3 × 3 | 3 × 3 | 3 × 3 | 5 × 5 |

GaussFilt. | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok |

Sharpening | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok |

**Performances against global geometrical attacks in StirMark 4.1**

Attack | Tank | Globe | Lena | Man | Zelda | |||||
---|---|---|---|---|---|---|---|---|---|---|

DMST | DPT | DMST | DPT | DMST | DPT | DMST | DPT | DMST | DPT | |

RotCrp. | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok |

RotScl. | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok | Ok |

Scaling | 0.5 to 1.81 | 0.6 to 2 | 0.5 to 1.5 | 0.6 to 2 | 0.5 to 1.63 | 0.7 to 1.9 | 0.5 to 1.51 | 0.7 to 1.9 | 0.5 to 1.57 | 0.7 to 1.85 |

Crop | 0.62 to 1 | 0.51 to 1 | 0.61 to 1 | 0.55 to 1 | 0.59 to 1 | 0.51 to 1 | 0.76 to 1 | 0.59 to 1 | 0.51 to 1 | 0.51 to 1 |

According to [38], the DMST only has analysis filters and thus is not a pyramid transform. For this sake, WNZH-DMST uses the SPT for template/watermark insertion and message extraction and adopts the DMST to estimate the rotation angle and scaling factor. As the DMST is similar to DPT's analysis filters, the template matching of WNZH-DMST is thus similar to that in the proposed scheme. By recalling the above performance comparison, it makes sense to claim that the proposed scheme is promising to achieve better performance.

In comparison to NIKO-SFND [14], we adopt the same settings as NIKO-SFND for impartial evaluation. In [14], 10 512 × 512-grey images, namely Airplane, Boat, House, Peppers, Splash, Baboon, Couple, Lena, Elaine, and Lake, were used for performance examination. For each image, a 50-bit message was inserted and the PSNR was held at 40 dB. The watermarked images were then polluted with both local and global geometrical attacks and common signal processing attacks. The performance was evaluated by comparing the NIKO-SFND scheme to state-of-the-art approaches belonging to the same category. It was found that NIKO-SFND demonstrated comparable or even better performance than the schemes it was compared with. To ensure a fair comparison, these settings are similarly applied to our scheme, and the performance comparison is then carried out accordingly. In the simulation, we compare three state-of-the-art schemes, i.e., the SIFT-based NIKO_SFND and the schemes presented by Dong et al. [39] and Tian et al. [40]. The SIFT-based NIKO-SFND is one of the best of its class using different salient features.

1. Local geometrical attacks: According to [14], this type of attack includes row/column line removal, jitter, and cropping. The performance of the proposed scheme against these attacks is summarized in Figures 10, 11, and 12, respectively, where Nikolaidis denotes the NIKO-SFND in [14], as for all other figures below. The BERs shown in figures are averaged over 10 test images and the search space (ϕ, σ_{x}, σ_{y}) (see Section 5.2) is adopted in evaluating the attack of row/column line removal. It can be seen that the proposed scheme significantly outperforms the three comparison approaches in counteracting the attacks of row/column line removal and cropping, whereas it is remarkably weaker than the compared schemes. The weakness to jitter attacks comes from the fact that jitter attacks are outside the scope of the proposed scheme.

- 2.
Global geometrical attacks: In [14], the author evaluated the performance of NIKO-SFND against global geometrical attacks such as rotation, scaling, downsampling followed by upsampling, shearing, and general affine transforms. Performance comparisons are given in Figures 13, 14, 15, 16, and 17, respectively. It can be observed from Figure 13 that the proposed scheme successfully estimates and corrects all checked rotation angles. This outperforms the method of Dong et al., obtains a remarkable improvement over NIKO-SFND, and has superiority over Tian et al.

Based on the performance against scaling shown in Figure 14, the proposed scheme exhibits a considerable improvement over the three compared schemes for scaling factors of 0.75, 0.9, 1.1, and 1.5. However, it is much worse for scaling factors of 0.5 and 2.

Figure 15 shows that the proposed scheme has high robustness to the downsampling and upsampling pairs (0.5, 1.5), (1.5, 0.5), (0.7, 1.3), and (1.3, 0.7), outperforming the approaches of Dong et al., Tian et al., and Nikolaidis. Nevertheless, these three approaches outperform the proposed scheme for other cases with a high ratio of information loss.

It is found from Figure 18 that the proposed scheme has better performance than the compared schemes for JPEG QFs from 20 to 50 but is weaker for other cases. Because situations with QF < 20 seldom occur in practice, the proposed scheme is more favorable than the three compared approaches in counteracting JPEG compression.

Figure 19 presents the performance comparison against H.264 intra-frame compression. It can be observed that the proposed scheme has similarly high robustness as the compared three approaches for quality factor values below 25, but it has significantly better performance for other cases.

As shown in Figures 20 and 21, the performance against Gaussian noise addition and low-pass filtering is somewhat similar. According to Figure 20, the proposed scheme has significant superiority over the schemes of Tian et al. and Nikolaidis. It is similar to the scheme of Dong et al. for noise variances from 0.001 to 0.005 but is slightly better for a noise variance of 0.006. The low-pass filtering results shown in Figure 21 demonstrate that the robustness of the proposed scheme is higher than that of the schemes presented by Tian et al. and Nikolaidis, while it is equivalent to that of Dong et al.'s scheme.

## 7. Conclusions

In this paper, we have presented a DPT-based robust image watermarking scheme resilient to rotation, scaling, and translation. We first constructed a DPT with shift-invariance, steerability, and scalability by extending an SPT represented in a closed and polar-separable form. The radial component of the SPT's basis filters was taken as the kernel for designing the scalable basis filters. These were further combined with the steerable basis filters corresponding to the angular components of the SPT's basis filters, resulting in joint scalability and steerability. The shift-invariance was inherited from the SPT by retaining undecimated high-pass and band-pass basis subbands. We also derived interpolation functions for steerability and scalability. These allow the interpolation of any filter (response) at an arbitrary orientation and scale via a linear combination of the DPT's basis filters (responses). By exploiting the characteristics of shift-invariance, steerability, and scalability, we further derived the theoretical synchronization mechanisms for translation, rotation, and scaling.

Based on the constructed DPT with preferable characteristics, we developed a robust image watermarking scheme that is resilient to translation, rotation, and scaling. The translation invariance is achieved by taking the Fourier magnitude of the cover image as the DPT input. The resilience to rotation and scaling is obtained via the synchronization mechanisms for rotation and scaling. At the transmitter, the template and watermark are inserted in the first level of the DPT pyramid and the other two levels, respectively. At the receiver, the rotation angle and scaling factor are estimated via an efficient template-matching algorithm, and these are further used to correct the rotation and scaling attacks on the received image followed by watermark extraction from the corrected image. Extensive simulations show that the proposed scheme is highly robust to geometrical attacks, such as rotation, scaling, translation, cropping, and row/column line removal, as well as common signal processing attacks such as JPEG compression, AWGN, median filtering, and convolution filtering. In addition, the comparison to some excellent related schemes demonstrated that the proposed scheme has a comparable performance against rotation, scaling, translation, cropping, and row/column line removal attacks, whereas it generally achieves a higher robustness to JPEG compression, AWGN, and low-pass filtering.

## Declarations

### Acknowledgments

This work is supported by NSFC (nos. 61202467 and 61100170), the National Research Foundation for the Doctoral Program of Higher Education of China (no. 20120171110037), the Key Program of Natural Science Foundation of Guangdong (no. S2012020011114), and the Scientific Research Foundation for Returned Overseas Chinese Scholars (State Education Ministry).

## Authors’ Affiliations

## References

- Zheng D, Liu Y, Zhao J, Saddik AE: A survey of RST invariant image watermarking algorithms.
*ACM Comput. Surv.*2007, 39(2, Article 5):1-91.View ArticleGoogle Scholar - Kumar A, Santhi V: A review on geometric invariant digital image water-marking techniques.
*Int. J. Comp. Appl.*2011, 12(9):31-36.Google Scholar - Bas P, Chassery JM, Macq B: Geometrically invariant watermarking using feature points.
*IEEE Trans. Image Processing*2002, 11(9):1014-1028. 10.1109/TIP.2002.801587View ArticleGoogle Scholar - Wang X, Wang C, Yang Y, Niu P: A robust blind color image watermarking in quaternion Fourier transform domain.
*J. Syst. Softw.*2013, 86: 255-277. 10.1016/j.jss.2012.08.015View ArticleGoogle Scholar - Wang Y, Doherty JF, Van Dyck RE: A rotation, scaling and translation resilient image watermarking algorithm using circular Gaussian filters. In
*Proc. of the IEEE-EURASIP Workshop on Nonlinear Signal and Image Processing*. Baltimore, MD; 2001.Google Scholar - Lichtenauer J, Setyawana I, Kalker T, Lagendijka R: Exhaustive geometrical search and the false positive watermark detection probability. In
*Proc. of the SPIE-Security and Watermarking of Multimedia Contents V*.*Volume 5020*. Santa Clara, CA; 2003:203-214. 10.1117/12.503186View ArticleGoogle Scholar - Barni M: Effectiveness of exhaustive search and template matching against watermark desynchronization.
*IEEE Trans. Signal Proc. Letters*2005, 12(2):158-161.MathSciNetView ArticleGoogle Scholar - O’Ruanaidh JJK, Pun T: Rotation, scale and translation invariant spread spectrum digital image watermarking.
*Signal Process.*1998, 66(3):303-317. 10.1016/S0165-1684(98)00012-7View ArticleMATHGoogle Scholar - Kim HS, Lee H-K: Invariant image watermark using Zernike moments.
*IEEE Trans. Circuit Syst. Video Technol.*2003, 13(8):766-775. 10.1109/TCSVT.2003.815955View ArticleGoogle Scholar - Tang C-W, Hang H-M: A featured-based robust digital image watermarking scheme.
*IEEE Trans. Signal Processing*2003, 51(4):1123-1129.MathSciNetGoogle Scholar - Teague M: Image analysis via the general theory of moment.
*J. Opt. Soc. Am.*1980, 70(8):920-930. 10.1364/JOSA.70.000920MathSciNetView ArticleGoogle Scholar - Zhang H, Shu H, Coatrieux G, Zhu J, Wu J,
*et al*.: Affine Legendre moment invariants for image watermarking robust to geometrical distortions.*IEEE Trans. Image Process.*2011, PP(99):1055-1068.MathSciNetGoogle Scholar - Xiang S, Kim H-J, Huang J: Invariant image watermarking based on statistical features in the low-frequency domain.
*IEEE Trans. Circuit Syst. Video Technol.*2008, 18(6):777-790.View ArticleGoogle Scholar - Nikolaidis A: Local distortion resistant image watermarking relying on salient feature extraction.
*EURASIP J. Adv. Signal Processing*2012, 2012: 97. 10.1186/1687-6180-2012-97View ArticleGoogle Scholar - Kutter M: Watermarking resistance to translation, rotation, and scaling. In
*Proc. of the International Society for Optical Engineering (SPIE): Multimedia Systems Applications*.*Volume 3528*. Boston, MA; 1998:423.View ArticleGoogle Scholar - Voloshynovskiy S, Deguillaume F, Pun T: Content adaptive watermarking based on a stochastic multiresolution image modeling. In
*Proc. of the 10th European Signal Processing Conference (EUSIPCO’2000)*. Tampere, Finland; 2000:5-8.Google Scholar - Voloshynovskiy S, Deguillaume F, Pun T: Multibit digital watermarking robust against local nonlinear geometrical distortions. In
*Proc. of the International Conference on Image Processing*.*Volume 3*. Thessaloniki, Greece; 2001:999.Google Scholar - Zheng Z, Wang S, Zhao J: RST invariant image watermarking algorithm with mathematical modeling and analysis of the watermarking processes.
*IEEE Trans. Image Process.*2009, 18(5):1055-1068.MathSciNetView ArticleGoogle Scholar - Tsai J, Huang W, Kuo Y: On the Selection of optimal feature region set for robust digital image watermarking.
*IEEE Trans. Image Process.*2011, 20(3):735-743.MathSciNetView ArticleGoogle Scholar - Tsai J-S, Huang W-B, Kuo Y-H, Horng M-F: Joint robustness and security enhancement for feature-based image watermarking using invariant feature regions.
*Signal Process.*2012, 92: 1431-1445. 10.1016/j.sigpro.2011.11.033View ArticleGoogle Scholar - Pereia S, Pun T: Robust template matching for affine resistant image watermarks.
*IEEE Trans. Image Processing*2000, 9(6):1123-1129. 10.1109/83.846253View ArticleGoogle Scholar - Kang X, Huang J, Shi YQ, Lin Y: A DWT-DFT composite watermarking scheme robust to both affine transform and JPEG compression.
*IEEE Trans. Circuit Syst Video Technol*2003, 13(8):776-786. 10.1109/TCSVT.2003.815957View ArticleGoogle Scholar - Ni J, Wang C, Huang J: A RST-Invariant robust DWT-HMM watermarking algorithm incorporating Zernike moments and template. In
*KES2005: Knowledge-Based Intelligent Information & Engineering Systems*.*Volume 3681*. Edited by: Khosla R, Howlett R, Jain LC. Heidelberg: Springer; 2005:1233-1239. 10.1007/11552413_176View ArticleGoogle Scholar - Ni J, Zhang R, Huang J, Wang C, Li Q: A rotation-Invariant secure image watermarking algorithm incorporating steerable pyramid transform. In
*IWDW2006: Digital Watermarking*.*Volume 4283*. 5th edition. Edited by: Shi YQ, Jeon B. Heidelberg: Springer; 2006:446-460. Int’l Workshop on Digital Watermarking, Lecture Notes Computer Science 10.1007/11922841_36View ArticleGoogle Scholar - Bogumi D: An asymmetric image watermarking scheme resistant against geometrical distortions.
*Signal Processing: Image Communication*2006, 21: 59-66. 10.1016/j.image.2005.06.005Google Scholar - Fu YG, Shen R, Lu H: Watermarking scheme based on support vector machine for color images.
*IEE Electronics Letters*2004, 40(16):986-7. 10.1049/el:20040600View ArticleGoogle Scholar - Tsai HH, Sun DW: Color image watermark extraction based on support vector machines.
*Inf. Sci.*2007, 177(2):550-69. 10.1016/j.ins.2006.05.002View ArticleGoogle Scholar - Peng H, Wang J, Wang W: Image watermarking method in multi-wavelet domain based on support vector machines.
*J. Syst. Softw.*2010, 83(8):1470-7. 10.1016/j.jss.2010.03.006View ArticleGoogle Scholar - Simoncelli EP, Freeman WT, Adelson EH, Heeger DJ: Shiftable multi-scale transform.
*IEEE Trans. Information Theory*1992, 38(2):587-607. 10.1109/18.119725MathSciNetView ArticleGoogle Scholar - Freeman WT, Adelson EH: The design and use of steerable filters.
*IEEE Trans. PAMI*1991, 13(9):891-906. 10.1109/34.93808View ArticleGoogle Scholar - Perona P: Deformable kernels for early vision. In
*Proc. of the third Int. Conf. on Computer Vision and Pattern Recognition (CVPR)*. Lahaina, Maui; 1991:222-227.Google Scholar - Perona P: Deformable kernels for early vision.
*IEEE Trans. Pattern Anal. Machine Intelligence*1995, 17(5):488-499. 10.1109/34.391394View ArticleGoogle Scholar - Karasaridis A, Simoncelli EP: A filter design technique for steerable pyramid transform. In
*Proc. of the 21th International Conference on Acoustics, Speech, and Signal Processing*.*Volume 4*. Atlanta, GA; 1996:2387-2390.Google Scholar - Portilla J, Strela V, Wainwright MJ, Simoncelli EP: Image denoising using scale mixtures of Gaussians in the wavelet domain.
*IEEE Trans. Image Processing*2003, 12(11):1338-1351. 10.1109/TIP.2003.818640MathSciNetView ArticleMATHGoogle Scholar - Divsalar D, Jin H, Mceliece RJ: Coding theorems for turbo-like codes. In
*Proc of the 36th Annual Allerton Conf. on Communication, Control and Comp*. Monticello, IL; 1998:525-539.Google Scholar - Petitcolas FAP: Watermarking schemes evaluation.
*IEEE Trans. Signal Processing*2000, 17(5):58-64. 10.1109/79.879339View ArticleGoogle Scholar - Petitcolas FAP, Stir M:
*IOP Publishing PhysicsWeb, 2012*. 2013. . Accessed 16 Feb. 2013 http://www.cl.cam.ac.uk/~fapp2/watermarking/stirmark/ Google Scholar - Wang C, Ni J, Zhuo H, Huang J: A geometrically resilient robust image watermarking scheme using deformable multi-scale transform. In
*Proc. of the Intl. Conf. on Image Processing 2010*. Hong Kong; 2010:3677-3680.View ArticleGoogle Scholar - Dong P, Brankov JG, Galatsanos NP, Yang Y, Davoine F: Digital Watermarking Robust to Geometric Distortions.
*IEEE Trans. Image Process.*2005, 14(12):2140-2150.View ArticleGoogle Scholar - Tian H, Zhao Y, Ni R, Pan J-S: Spread spectrum-based image watermarking resistant to rotation and scaling using radon transform. In
*Proc of the Sixth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP 2010)*. Darmstadt; 2010:442-445.View ArticleGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.