The analysis in Section 3 shows that the DCT coefficient distribution of SDJPEG patches follows a weighted summation of Gaussian components with the same standard deviation (as shown in (7) and Figure 5). However, in order to detect SDJPEG patches based on (7), two questions have to be addressed: (i) How to obtain discriminative features that capture the differences in the (m,n)th DCT coefficient distributions between SDJPEG and SSJPEG patches? (ii) How to select DCT modes which provide high discriminative power since the differences in the DCT coefficient distributions of SDJPEG and SSJPEG patches would vary for different DCT mode? In the following, we will address these questions.
4.1 Discriminative feature extraction based on the (m,n)th DCT coefficients
For a specific DCT mode, say (m,n)th, the following detection algorithm is carried out by determining whether the histogram h_{
m,n
}(f) is similar to that of {\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right).
Given a specific quantization step q_{m,n}, we project the histogram h_{m,n}(f) onto the interval \left(\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}\right.,\left(\right)close=")">\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}\n, and the sum of the histogram function within the interval is defined as
\mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}\left(\mathit{f}\right)={\displaystyle \sum _{\mathit{i}=\mathit{N}}^{\mathit{N}}{\mathit{h}}_{\mathit{m},\mathit{n}}\left(\mathit{f}+\mathit{i}\times {\mathit{q}}_{\mathit{m},\mathit{n}}\right)},\phantom{\rule{1em}{0ex}}\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}\le \mathit{f}<\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}
(8)
where Nq_{m,n} is the maximum absolute value of the (m,n)th DCT coefficient.
According to (7), for SDJPEGcompressed image patches, {\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) follows a weighted summation of Gaussian components with a standard deviation of {\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}. Hence, \mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) would follow a specific distribution determined by {\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathit{SJR}} and q_{m,n}. Based on the different ratios of {\mathit{q}}_{\mathit{m},\mathit{n}}/{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}, \mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) can be obtained as follows. It should be noted that in our discussions, the Gaussian distribution, G(0, σ), is assumed to be bounded in [3σ, 3σ], and any outliers are omitted.
If \frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}\ge 3{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}, i.e., all the Gaussian components in {\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) are isolated, \mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) would follow a Gaussian distribution of \mathit{G}\left(0,{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\right) (as shown in Figure 6a), i.e.,
\begin{array}{l}\mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right)={\displaystyle \sum _{\mathit{i}=\mathit{N}}^{\mathit{N}}{\mathit{w}}_{\mathit{i}}\mathit{G}\left(0,{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\right)}=\mathit{G}\left(0,{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\right)\\ \phantom{\rule{4.2em}{0ex}}\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}\le \mathit{f}<\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}\end{array}
(9)
If \mathit{p}\times \frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}<3{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\le \left(\mathit{p}+1\right)\times \frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}\phantom{\rule{1em}{0ex}}\mathit{p}=1,2,\dots, some Gaussian components would overlap with each other and \mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) would follow a distribution of a mixture of (p + 1) Gaussian distributions (as shown in Figure 6b,c), i.e.,
\mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right)=\left\{\begin{array}{l}\sum _{\mathit{i}=\mathit{N}}^{\mathit{N}}{\mathit{w}}_{\mathit{i}}\left[{\displaystyle \sum _{\mathit{j}=0}^{\mathit{p}}\mathit{G}\left(\mathit{f}+\mathit{j}\times {\mathit{q}}_{\mathit{m},\mathit{n}},{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\right)}\right]\\ \phantom{\rule{3em}{0ex}}={\displaystyle \sum _{\mathit{j}=0}^{\mathit{p}}\mathit{G}\left(\mathit{f}+\mathit{j}\times {\mathit{q}}_{\mathit{m},\mathit{n}},{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\right)}\phantom{\rule{1em}{0ex}}{\mathit{q}}_{\mathit{m},\mathit{n}}/2\le \mathit{f}\le 0\\ \sum _{\mathit{i}=\mathit{N}}^{\mathit{N}}{\mathit{w}}_{\mathit{i}}\left[{\displaystyle \sum _{\mathit{j}=0}^{\mathit{p}}\mathit{G}\left(\mathit{f}\mathit{j}\times {\mathit{q}}_{\mathit{m},\mathit{n}},{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\right)}\right]\\ \phantom{\rule{3em}{0ex}}={\displaystyle \sum _{\mathit{j}=0}^{\mathit{p}}\mathit{G}\left(\mathit{f}\mathit{j}\times {\mathit{q}}_{\mathit{m},\mathit{n}},{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\right)}\phantom{\rule{1em}{0ex}}0\le \mathit{f}\le {\mathit{q}}_{\mathit{m},\mathit{n}}/2\end{array}\right.\phantom{\rule{1em}{0ex}}
(10)
where \mathit{p}+1=\mathrm{ceiling}\phantom{\rule{0.2em}{0ex}}\left(\frac{3{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}}{{\mathit{q}}_{\mathit{m},\mathit{n}}/2}\right).
From (9) and (10) and Figure 6, it is observed that for the isolated or slight overlapping cases (p = 0 or p = 1), \mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) has a distinctive peak at small f. The peak at small f becomes more prominent with larger {\mathit{q}}_{\mathit{m},\mathit{n}}/{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}. Such phenomenon does not occur for SSJPEG patches. Figure 7 illustrates sh_{m,n}(f) distributions of the image Lena.bmp of Figure 5 (c1) after SDJPEG and SSJPEG with various parameter settings.
From Figure 7, it is observed that the energy of sh_{m,n}(f) for SDJPEG image patches is concentrated on the center region whereas the energy of sh_{m,n}(f) for SSJPEG patches is almost evenly distributed. It should be noted that when the size of the image patch is small, e.g., 64 × 64, the total number of DCT coefficients is small (64 coefficients in total), and it is difficult to estimate the actual distribution of the patch accurately and robustly using such limited data. In order to solve the above problem of inadequate data, the 1D feature similar to that in [23] is adopted to differentiate between SDJPEG and SSJPEG image patches as follows:
{\mathit{s}}_{\mathit{m},\mathit{n}}={\displaystyle \underset{{\mathit{R}}_{2}}{\int}\mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}\left(\mathit{f}\right)\mathit{df}}/{\displaystyle \underset{{\mathit{R}}_{1}}{\int}\mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}\left(\mathit{f}\right)\mathit{df}}
(11)
where {\mathit{R}}_{1}=\left(\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{6},\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{6}\right) representing the central region and {\mathit{R}}_{2}=\left(\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2},\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{3}\right){\displaystyle \cup}\left(\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{3},\frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{2}\right) representing the peripheral region.
For SDJPEG patches, the reference value of the feature, denoted by {\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}, can be derived from \mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) and is determined by {\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}} and q_{m,n}. The reference value of SSJPEG patch, denoted by {\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SS}}, is obtained as follows. Since the SSJPEG patch has never been compressed with the block structure starting from the topleft corner of the image patch, the DCT coefficients of SSJPEG can be assumed distributed approximately as the original uncompressed image patch [8, 14]. Moreover, {\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SS}}\left(\mathit{f}\right) is equivalent to the distribution of the quantization error of the (m,n)th DCT component with the quantization step q_{
m,n
}. Hence, {\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SS}}\left(\mathit{f}\right) can be approximated using the aligned JPEG quantization error estimation approach introduced in Section 3.1. Then {\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SS}} can be derived from (8) and (11). Note that for the lowfrequency DCT modes, the quantization step q_{
m,n
} is relatively smaller compared with the dynamic range of the DCT coefficients and thus \mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SS}}\left(\mathit{f}\right) is almost evenly distributed and {\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SS}}\approx 1>{\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}.
The final discriminative feature indicating the likelihood of an image patch having been SDJPEG compressed is derived by normalizing the extracted feature with the reference values {\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}} and {\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SS}}, i.e., \mathit{s}{\mathit{n}}_{\mathit{m},\mathit{n}}=\left({\mathit{s}}_{\mathit{m},\mathit{n}}{\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\right)/\left({\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SS}}{\mathit{s}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\right). Note that sn_{
m,n
} is small for SDJPEG patches and large for SSJPEG patches.
4.2 Discriminative feature extraction based on all the DCT coefficients
The analysis in Section 4.1 shows that sn_{
m,n
} for the SDJPEG image patches is smaller than those for the SSJPEG patches. Such phenomenon becomes more prominent with the increase of {\mathit{q}}_{\mathit{m},\mathit{n}}/{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}, and thus the DCT components with larger {\mathit{q}}_{\mathit{m},\mathit{n}}/{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}} would be more discriminative in SDJPEG detection. Our analysis in Section 3 shows that for a specific image, {\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}} can be estimated theoretically when QF_{2} and the shift coordinates (x_{S}, y_{S}) are known. Figure 8 gives some examples of {\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\phantom{\rule{0.48em}{0ex}}1\le \mathit{m},\mathit{n}\le 8 with different QF_{2} and (x_{S}, y_{S}).
Figure 8 shows the following: (i) Shifted JPEG compression with lower quality (QF_{2}) will introduce larger spread to the DCT coefficients. (ii) {\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}} for different DCT modes varies from each other and are not proportional to their corresponding quantization step. For instance, in Figure 8 (b2), the quantization steps for (2,1) and (2,2) are the same while {\mathit{\sigma}}_{2,1}^{\mathrm{SJR}} and {\mathit{\sigma}}_{2,2}^{\mathrm{SJR}} are quite different. (iii) Even for the same DCT mode, the variations caused by shift JPEG compression do not remain constant with different coordinate shifts, e.g., for the (2,3)th DCT coefficient, {\mathit{\sigma}}_{2,3}^{\mathrm{SJR}} is quite different when (x_{S}, y_{S}) changes between (4,4) and (1,6). Hence, {\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}} can be obtained by a tablelookup with the knowledge of QF_{2} and (x_{S}, y_{S}). In order to have large value of {\mathit{q}}_{\mathit{m},\mathit{n}}/{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathit{SJR}}, smaller values of {\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}} or larger values of q_{
m,n
} are preferred. For any value of the quality factor, Q = [q_{m,n}] = λQ_{default}, where Q_{default} is the default quantization table defined by the Independent JPEG Group (IJG) and λ is a constant determined by the quality factor. Hence, we have \frac{{\mathit{q}}_{\mathit{m},\mathit{n}}}{{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}}=\mathit{\lambda}\times \frac{{\mathit{q}}_{\mathrm{default}}\left(\mathit{m},\mathit{n}\right)}{{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}}=\mathit{\lambda}\times {\mathrm{dis}}_{\mathit{m},\mathit{n}}\phantom{\rule{1em}{0ex}}1\le \mathit{m},\mathit{n}\le 8. It should be noted that in practice, only QF_{2} is known. We will show how QF_{1} can be estimated later.
The discriminative feature extraction considering all the DCT coefficients runs as follows:

1.
Input the prior information: the image patch, the quality factor of the final compression QF_{2}, and the coordinate shift (x _{S}, y _{S}) (which is given by an exhaustive enumeration on 63 possible coordinate shifts, see Section 4.3).

2.
According to the image patch, QF_{2} and (x _{S}, y _{S}), estimate {\mathit{\sigma}}^{\mathrm{SJR}}=\left[{\mathit{\sigma}}_{\mathit{m},\mathit{n}}^{\mathrm{SJR}}\right] which is introduced by the shifted double JPEG compression. Calculate the discriminative table DIS = [dis(m,n)] for low frequencies (where m + n < 8)

3.
Sort all the DCT modes according to their discriminative value in descending order and the first N _{c} (N _{c} = 3 in our experiment) components are selected to construct the candidate set, i.e., {\left({\mathit{m}}_{1},{\mathit{n}}_{1}\right),\left({\mathit{m}}_{2},{\mathit{n}}_{2}\right),\dots ,\left({\mathit{m}}_{{\mathit{N}}_{\mathrm{c}}},{\mathit{n}}_{{\mathit{N}}_{\mathrm{c}}}\right)}. Estimate the quantization step {\mathit{q}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}\left({\mathrm{QF}}_{1}\right) (for the unknown QF_{1}) of these DCT modes by analyzing their histograms. It should be noted that if the coefficients of a DCT mode in the candidate set has too many zero values (>80% of the total number of coefficients), the quantization step cannot be estimated accurately. Hence, considering the quantization noise caused by shifted compression, for any (m _{
i
}, n _{
i
}) whose coefficients are concentrated in the range of \left[3\times {\mathit{\sigma}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}^{\mathrm{SJR}},3\times {\mathit{\sigma}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}^{\mathrm{SJR}}\right], i.e., {\displaystyle \underset{\mathit{f}=3\times {\mathit{\sigma}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}^{\mathrm{SJR}}}{\overset{3\times {\mathit{\sigma}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}^{\mathrm{SJR}}}{\int}}{\mathit{h}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}^{\mathrm{SD}}\left(\mathit{f}\right)}\ge 80\mathrm{\%}, the mode is discarded and replaced with the mode with the next largest discriminative value outside the candidate set.

4.
For the i th mode (m _{
i
},n _{
i
}) in the candidate set, extract the normalized discriminative feature for the (m _{
i
},n _{
i
})th mode, i.e., sn _{
mi
},_{
ni
}, using the approach described in Section 4.1 with the estimated {\mathit{\sigma}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}^{\mathrm{SJR}} and {\mathit{q}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}\left({\mathrm{QF}}_{1}\right). The final discriminative feature for the image patch, denoted by sn _{all}, is obtained by averaging over \mathit{s}{\mathit{n}}_{{\mathit{m}}_{\mathit{i}},{\mathit{n}}_{\mathit{i}}}\phantom{\rule{0.48em}{0ex}}1\le \mathit{i}\le {\mathit{N}}_{\mathrm{c}}. In step 3, the quantization step q _{m,n}(QF_{1}) can be obtained by an exhaustive search among the reference SDJPEG's \mathit{s}{\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) where sh _{m,n}(f) is the most similar to. However, this approach is quite computationally expensive. Since {\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right) exhibits periodiclike pattern with a period of q _{m,n}(QF_{1}) to reduce the complexity, the approach similar to that in [17] is adopted, i.e., the fast Fourier transform is applied to the histogram {\mathit{h}}_{\mathit{m},\mathit{n}}^{\mathrm{SD}}\left(\mathit{f}\right), and the peak of the spectrum with the DC removed is extracted to estimate the quantization step of the (m,n)th DCT mode q _{m,n}(QF_{1}). With the quantization step of the (m,n)th DCT mode q _{m,n}(QF_{1}) estimated, the quality factor QF_{1} can be estimated by comparing q _{m,n}(QF_{1}) with the default quantization table Q _{default}. In order to improve the robustness, all q _{m,n}(QF_{1}) values in the candidate set are estimated. The median value of the predicted QF_{1} is taken as the quality factor of the first compression, and all q _{m,n}(QF_{1}) values in the candidate set are refined using the estimated quality factor.
4.3 Cropandpaste image tampering detection by detecting the SDJPEG effects
In order to detect cropandpaste image tampering, the JPEG image is divided into a series of B × B subimages. Each B × B subimage is examined to detect whether it contains any image patch having SDJPEG effects, which runs as follows:

(i)
For a specific coordinate shift (x _{S}, y _{S}) (0 ≤ x _{S}, y _{S} ≤ 7 and (x _{S}, y _{S}) ≠ (0, 0)), crop an image patch {\mathrm{IMG}}_{{\mathit{x}}_{\mathrm{S}},{\mathit{y}}_{\mathrm{S}}} from the subimage with the size of (B  8) × (B  8) and starting from (x _{S}, y _{S}).

(ii)
With (x _{S}, y _{S}) and QF_{2}, extract the discriminative feature sn _{all} for {\mathrm{IMG}}_{{\mathit{x}}_{\mathrm{S}},{\mathit{y}}_{\mathrm{S}}}. Then the SDJPEG effect map (SEM) for (x _{S}, y _{S}) is set to sn _{all}, i.e., SEM (x _{S}, y _{S}) = sn _{all}.

(iii)
Repeat steps (i) and (ii) for all the 63 possible coordinate shifts to obtain the SEM for the B × B subimage.

(iv)
Compare SEM with those of the positive (containing SDJPEG patches) and negative (not containing any SDJPEG patch) samples in the training database to detect whether the subimage has been tampered with. The Fisher's linear discriminant analysis (LDA) [24] is adopted as the classifier in our approach.

(v)
Loop through steps (i) to (iv) for all B × B subimages in the image to detect the suspicious regions that might have been tampered with.