The analysis in Section 3 shows that the DCT coefficient distribution of SDJPEG patches follows a weighted summation of Gaussian components with the same standard deviation (as shown in (7) and Figure 5). However, in order to detect SDJPEG patches based on (7), two questions have to be addressed: (i) How to obtain discriminative features that capture the differences in the (m,n)th DCT coefficient distributions between SDJPEG and SSJPEG patches? (ii) How to select DCT modes which provide high discriminative power since the differences in the DCT coefficient distributions of SDJPEG and SSJPEG patches would vary for different DCT mode? In the following, we will address these questions.
4.1 Discriminative feature extraction based on the (m,n)th DCT coefficients
For a specific DCT mode, say (m,n)th, the following detection algorithm is carried out by determining whether the histogram h
m,n
(f) is similar to that of .
Given a specific quantization step qm,n, we project the histogram hm,n(f) onto the interval , and the sum of the histogram function within the interval is defined as
(8)
where Nqm,n is the maximum absolute value of the (m,n)th DCT coefficient.
According to (7), for SDJPEG-compressed image patches, follows a weighted summation of Gaussian components with a standard deviation of . Hence, would follow a specific distribution determined by and qm,n. Based on the different ratios of , can be obtained as follows. It should be noted that in our discussions, the Gaussian distribution, G(0, σ), is assumed to be bounded in [-3σ, 3σ], and any outliers are omitted.
If , i.e., all the Gaussian components in are isolated, would follow a Gaussian distribution of (as shown in Figure 6a), i.e.,
(9)
If , some Gaussian components would overlap with each other and would follow a distribution of a mixture of (p + 1) Gaussian distributions (as shown in Figure 6b,c), i.e.,
(10)
where .
From (9) and (10) and Figure 6, it is observed that for the isolated or slight overlapping cases (p = 0 or p = 1), has a distinctive peak at small |f|. The peak at small |f| becomes more prominent with larger . Such phenomenon does not occur for SSJPEG patches. Figure 7 illustrates shm,n(f) distributions of the image Lena.bmp of Figure 5 (c1) after SDJPEG and SSJPEG with various parameter settings.
From Figure 7, it is observed that the energy of shm,n(f) for SDJPEG image patches is concentrated on the center region whereas the energy of shm,n(f) for SSJPEG patches is almost evenly distributed. It should be noted that when the size of the image patch is small, e.g., 64 × 64, the total number of DCT coefficients is small (64 coefficients in total), and it is difficult to estimate the actual distribution of the patch accurately and robustly using such limited data. In order to solve the above problem of inadequate data, the 1-D feature similar to that in [23] is adopted to differentiate between SDJPEG and SSJPEG image patches as follows:
(11)
where representing the central region and representing the peripheral region.
For SDJPEG patches, the reference value of the feature, denoted by , can be derived from and is determined by and qm,n. The reference value of SSJPEG patch, denoted by , is obtained as follows. Since the SSJPEG patch has never been compressed with the block structure starting from the top-left corner of the image patch, the DCT coefficients of SSJPEG can be assumed distributed approximately as the original uncompressed image patch [8, 14]. Moreover, is equivalent to the distribution of the quantization error of the (m,n)th DCT component with the quantization step q
m,n
. Hence, can be approximated using the aligned JPEG quantization error estimation approach introduced in Section 3.1. Then can be derived from (8) and (11). Note that for the low-frequency DCT modes, the quantization step q
m,n
is relatively smaller compared with the dynamic range of the DCT coefficients and thus is almost evenly distributed and .
The final discriminative feature indicating the likelihood of an image patch having been SDJPEG compressed is derived by normalizing the extracted feature with the reference values and , i.e., . Note that sn
m,n
is small for SDJPEG patches and large for SSJPEG patches.
4.2 Discriminative feature extraction based on all the DCT coefficients
The analysis in Section 4.1 shows that sn
m,n
for the SDJPEG image patches is smaller than those for the SSJPEG patches. Such phenomenon becomes more prominent with the increase of , and thus the DCT components with larger would be more discriminative in SDJPEG detection. Our analysis in Section 3 shows that for a specific image, can be estimated theoretically when QF2 and the shift coordinates (xS, yS) are known. Figure 8 gives some examples of with different QF2 and (xS, yS).
Figure 8 shows the following: (i) Shifted JPEG compression with lower quality (QF2) will introduce larger spread to the DCT coefficients. (ii) for different DCT modes varies from each other and are not proportional to their corresponding quantization step. For instance, in Figure 8 (b2), the quantization steps for (2,1) and (2,2) are the same while and are quite different. (iii) Even for the same DCT mode, the variations caused by shift JPEG compression do not remain constant with different coordinate shifts, e.g., for the (2,3)th DCT coefficient, is quite different when (xS, yS) changes between (4,4) and (1,6). Hence, can be obtained by a table-lookup with the knowledge of QF2 and (xS, yS). In order to have large value of , smaller values of or larger values of q
m,n
are preferred. For any value of the quality factor, Q = [qm,n] = λQdefault, where Qdefault is the default quantization table defined by the Independent JPEG Group (IJG) and λ is a constant determined by the quality factor. Hence, we have . It should be noted that in practice, only QF2 is known. We will show how QF1 can be estimated later.
The discriminative feature extraction considering all the DCT coefficients runs as follows:
-
1.
Input the prior information: the image patch, the quality factor of the final compression QF2, and the coordinate shift (x S, y S) (which is given by an exhaustive enumeration on 63 possible coordinate shifts, see Section 4.3).
-
2.
According to the image patch, QF2 and (x S, y S), estimate which is introduced by the shifted double JPEG compression. Calculate the discriminative table DIS = [dis(m,n)] for low frequencies (where m + n < 8)
-
3.
Sort all the DCT modes according to their discriminative value in descending order and the first N c (N c = 3 in our experiment) components are selected to construct the candidate set, i.e., {}. Estimate the quantization step (for the unknown QF1) of these DCT modes by analyzing their histograms. It should be noted that if the coefficients of a DCT mode in the candidate set has too many zero values (>80% of the total number of coefficients), the quantization step cannot be estimated accurately. Hence, considering the quantization noise caused by shifted compression, for any (m
i
, n
i
) whose coefficients are concentrated in the range of , i.e., , the mode is discarded and replaced with the mode with the next largest discriminative value outside the candidate set.
-
4.
For the i th mode (m
i
,n
i
) in the candidate set, extract the normalized discriminative feature for the (m
i
,n
i
)th mode, i.e., sn
mi
,
ni
, using the approach described in Section 4.1 with the estimated and . The final discriminative feature for the image patch, denoted by sn all, is obtained by averaging over . In step 3, the quantization step q m,n(QF1) can be obtained by an exhaustive search among the reference SDJPEG's where sh m,n(f) is the most similar to. However, this approach is quite computationally expensive. Since exhibits periodic-like pattern with a period of q m,n(QF1) to reduce the complexity, the approach similar to that in [17] is adopted, i.e., the fast Fourier transform is applied to the histogram , and the peak of the spectrum with the DC removed is extracted to estimate the quantization step of the (m,n)th DCT mode q m,n(QF1). With the quantization step of the (m,n)th DCT mode q m,n(QF1) estimated, the quality factor QF1 can be estimated by comparing q m,n(QF1) with the default quantization table Q default. In order to improve the robustness, all q m,n(QF1) values in the candidate set are estimated. The median value of the predicted QF1 is taken as the quality factor of the first compression, and all q m,n(QF1) values in the candidate set are refined using the estimated quality factor.
4.3 Crop-and-paste image tampering detection by detecting the SDJPEG effects
In order to detect crop-and-paste image tampering, the JPEG image is divided into a series of B × B subimages. Each B × B subimage is examined to detect whether it contains any image patch having SDJPEG effects, which runs as follows:
-
(i)
For a specific coordinate shift (x S, y S) (0 ≤ x S, y S ≤ 7 and (x S, y S) ≠ (0, 0)), crop an image patch from the subimage with the size of (B - 8) × (B - 8) and starting from (x S, y S).
-
(ii)
With (x S, y S) and QF2, extract the discriminative feature sn all for . Then the SDJPEG effect map (SEM) for (x S, y S) is set to sn all, i.e., SEM (x S, y S) = sn all.
-
(iii)
Repeat steps (i) and (ii) for all the 63 possible coordinate shifts to obtain the SEM for the B × B subimage.
-
(iv)
Compare SEM with those of the positive (containing SDJPEG patches) and negative (not containing any SDJPEG patch) samples in the training database to detect whether the subimage has been tampered with. The Fisher's linear discriminant analysis (LDA) [24] is adopted as the classifier in our approach.
-
(v)
Loop through steps (i) to (iv) for all B × B subimages in the image to detect the suspicious regions that might have been tampered with.