Both watermark embedding and detection procedures require that a proper preprocessing of the original image has taken place, so that the watermark embedding or detection areas can be located. Section 2.1 describes the first preprocessing step where the original image is transformed geometrically to a standard form. Section 2.2 briefly overviews the four different feature extraction methods that will alternatively act upon the normalized image to produce the reference points both for watermark embedding and detection.
2.1 Image normalization
The first step prior to watermark embedding and detection is image normalization. This serves to provide the next step of feature extraction with a standard form of the original image, in which to search for strong feature points. The difference from other methods in the literature is that they employ image normalization in circular patches that have already been extracted from the original image. The problem, as stated in Section 1, is that the normalized and afterwards watermarked patches have to be inversely normalized and overlayed on the original image, leading to interpolation errors and, thus, visible artifacts. In the current article, we implement the image normalization method proposed in [19]. Here we should point out that the method described in [19] is the first step of a watermarking technique which, however, affects the whole of the image. Our aim in the present article is to provide a technique that only affects the image regionally, since we wish to cope with local image distortions. If we let I(x, y) be the original image, then the normalized image is g(x, y) = I (x
α
, y
α
), where
(1)
and is a translation matrix, is a x-shearing matrix, is a y-shearing matrix, and is a scaling matrix.
The values of the parameters d1, d2 are calculated as
where m10, m01, m00 are geometric moments of the original image I(x, y)
(3)
If we let I
T
(x, y) be the image after translation normalization, the value of the parameter β is calculated as a root of
(4)
where are the central moments of I
T
(x, y)
(5)
In case of a singe real root and two complex conjugate roots, the value of β is chosen as the real one. In case of three real roots, the value is chosen as the median. The value of γ is calculated as
(6)
where are the central moments of I
XT
(x, y) which is the image I
T
(x, y) after x-shearing normalization. Finally, the values of α and δ are derived given that I
Y XT
(x, y) (the image I
XT
(x, y) after y-shearing normalization) is resized to a specific size (e.g., 512 × 512 in our experiments) to provide the final normalized image I
SY XT
(x, y). The signs of these parameter values are determined by the constraint that both and are positive. Examples of the original "Lena" and "Lake" images and respective normalized images using the above described method are shown in Figure 1.
This normalized representation of the original image is the input for the next step of preprocessing that is necessary for both watermark embedding and detection.
2.2 Feature extraction
The second step of the preprocessing stage is the feature extraction step. A great variety of feature extraction methods has been proposed in the literature. Lately, there is a tendency of using the so-called scale-space methods such as SIFT [20] for watermarking purposes [18, 21–23]. In our study, we employed this as well as other feature detectors proposed in the literature, but not in the context of image watermarking, during the past few years. These detectors are, more specifically, the radial symmetry transform (RST) introduced in [24], the speeded up robust features (SURF) [25, 26] and the features from accelerated segment test (FAST) [27, 28]. As we will show in the experimental results section, all of them perform adequately well for our application, although their relative performance varies.
2.2.1 Radial symmetry transform
To compute the RST first we have to construct two images, the magnitude projection image M
n
and the orientation projection image O
n
of the normalized image at every radius n that we have selected. These images are initialized to zero and are subsequently updated at each point depending on how the point is affected by the gradient vector at a point a distance n away. Let p = (x, y) be a point and g(p) the gradient vector at that point, determined by applying the 3 × 3 Sobel operator at the respective point of the normalized image. The coordinates of the so-called positively-affected pixel are
(7)
and those of the negatively-affected pixel are
(8)
The pixel values of the magnitude projection and orientation projection images are updated as follows
(9)
(10)
(11)
(12)
Next, we have to define
(13)
where k
n
is a scaling factor to normalize M
n
and O
n
across different radii. Once Õ
n
is defined, we compute
(14)
where α is the radial strictness parameter. The larger the value of α, the stricter the required radial symmetry. Finally, F
n
is convolved with a 2D Gaussian filter A
n
to produce the radial symmetry contribution at radius n
The overall RST (symmetry map) is calculated by simply averaging radial symmetry contributions for all of the radii considered
(16)
where N is the set of radii. A non-maximum suppression and thresholding algorithm [29] is applied to the symmetry map S to localize the strongly symmetric points of the normalized image. An example for the images of Figure 1 is depicted in Figure 2 for N = {1, 3, 5} and α = 1. The value of the radius for non-maximum suppression was chosen to be 3 and that of the threshold to be 5.
2.2.2 Scale-invariant feature transform
The main idea of this detector is to search for candidate stable feature points across a series of image scales. First, the so-called scale space of the normalized image is constructed by convolving the image I(x, y) with a variable-scale Gaussian
(17)
The potentially stable feature points are detected as local extrema of the function D(x, y, σ) constructed as follows
(18)
that is, a convolution of the image with a difference of Gaussians. k is a factor that determines the difference between consecutive scales. An octave of scale space is a series of D(x, y, σ) functions spanning a doubling of σ. Each octave is divided in s intervals and, thus, k = 21/s. For each new octave, the Gaussian image produced with the doubled value of σ at the previous octave is first downsampled by a factor of 2 at each dimension. The local minima and maxima are found by 3D search in the 8 neighbors of the current scale and the respective 9 neighbors in each of the previous and the next scale.
To correctly localize feature points, candidate points are fitted to the nearby data by interpolation. The Taylor expansion of the function D(x, y, σ) is given by
(19)
where D and its derivatives are calculated at the candidate feature point and x = (x, y, σ) T is the offset from this point. The location of the extremum is found by taking the derivative of this expansion and setting it to zero, giving
(20)
If the offset is larger than 0.5 in any dimension, then the extremum should be closer to another candidate feature point. If so, the interpolation is again performed around a different point. Otherwise the offset is added to the candidate point to produce the interpolated estimate of the extremum.
To discard feature points of low contrast, the value of the second-order Taylor expansion is computed at the offset . If this value is less than 0.03 then the candidate point is discarded. Otherwise it is kept, and its final location and scale are, respectively, y + and σ, where y is the original location of the candidate point at scale σ.
Another action that should be taken is to eliminate feature points with strong edge response. To do so, we first have to compute the second-order Hessian matrix H
(21)
whose eigenvalues are proportional to the principal curvatures of D. If we let α be the larger eigenvalue and β the smaller one, then it can be shown that
(22)
where r = α/β, Tr(H) = D
xx
+ D
yy
= α + β is the trace of H and Det(H) = D
xx
D
yy
-(D
xy
)2 = αβ is the determinant of H. If the ratio R for a certain candidate feature point is larger than (r
th
+ 1)2/r
th
, then the feature point is rejected. The method sets the threshold eigenvalue ratio to r
th
= 10.
In our experiments the values of the various parameters involved in this method were chosen in accordance with [20]. Only the strength threshold for local maxima of the scale space was chosen to be equal to 0.05 to reduce the number of produced feature points. Examples of feature points extracted from the normalized versions of "Lena" and "Lake" are shown in Figure 3.
2.2.3 Speeded up robust features
This method was introduced as an alternative to SIFT focusing on computational cost reduction. A fast way of computing the Hessian matrix using integral images is proposed. This approach approximates the second order Gaussian derivatives by box filters. These, in turn, are used to compute the approximate determinant of the Hessian matrix. Instead of subsampling the filtered image of a previous layer, the scale space is constructed by increasing the filter size. For each new octave, the filter size increase per layer is doubled, and so is the sampling interval for the extracted feature points.
In the experiments that we conducted, the number of octaves that were analyzed was 5, the initial sampling interval was 2 and the Hessian response threshold was chosen to be 0.004. The feature points extracted from the normalized versions of "Lena" and "Lake" are presented in Figure 4.
2.2.4 Features from accelerated segment test
This feature detector should be more precisely called a corner detector. To test if a certain pixel p is a corner, 16 pixels lying on a circle centered at this pixel (specifically, a Bresenham circle of radius 3) are tested for similarity of intensity to the center pixel. If N contiguous pixels lying on this circle are all brighter than the center pixel by a quantity T (that is Ip→x≥ I
p
+T, x ∈ {1 . . . 16}) or darker than it by the same quantity (that is Ip→x≤ I
p
- T, x ∈ {1 . . . 16}), then the center pixel is considered a corner. A non-maximum suppression step follows to reduce the number of corner points. Since there is no score function on which to apply the suppression, we define one as [28]
(23)
where
(24)
After suppression only the candidates having score value greater than all their 8 neighbors are preserved. The parameter values used in our experiments where N = 12 and T = 60. For the "Lena" and "Lake" images, the feature points extracted from their normalized versions are shown in Figure 5.