Detection of tampered region for JPEG images by using mode-based first digit features
© Li et al.; licensee Springer. 2012
Received: 5 March 2012
Accepted: 13 August 2012
Published: 30 August 2012
With the widespread availability of image editing software, digital images have been becoming easy to manipulate and edit even for non-professional users. For a tampered Joint Photographic Experts Group (JPEG) image, the tampered region usually has different JPEG compression history from the authentic region, which can be used to detect and locate the tampered region. In this article, we propose to apply the statistical features of the first digits of individual alternate current modes and support vector machine to detect and locate the tampered region. Experimental results show that our proposed method is effective for detecting three popularly used image manipulations. Its expectation of the percentage of overlap between the detected tampered region and the truth tampered region is higher than the existing algorithms.
With the development of increasingly sophisticated digital image processing software, it has been becoming easy to create image forgery from one or multiple images without leaving visible clues. As a result, people’s confidence in the reliability and veracity of digital images is declining. Furthermore, some applications may also bring legal crisis. Therefore, developing technologies to identify whether the content of an image has been tampered is becoming increasingly important.
Digital image forensic technologies include passive (blind) detection and active detection. The active detection includes active fragile digital watermarking, digital signature technology, and others. However, active detection only works when prior information can be embedded into original images. Therefore, to some extent due to the limitations of active detection, it cannot fundamentally prevent the development of image tampering. Ultimately, we should pay more attention to the passive detection method. Although a forged image may easily escape one or a few detection algorithms, it is difficult to escape all detection algorithms. Therefore, researchers have been developing more passive detection algorithms to detect the tampered images.
Currently, Joint Photographic Experts Group (JPEG) is the most widelyused image format. Human eyes have a higher sensitivity for the low-frequency signal than the high-frequency signal. Through reducing the high-frequency information, JPEG compression allows images to retain a high compression ratio and simultaneously obtain a satisfactory image quality. For a tampered JPEG image, the tampered region usually has different JPEG compression history from the authentic region. The tampered digital image is generally difficult to be identified by human eyes; however, it is usually left behind some invisible clues or statistical artifacts. Based on these clues or artifacts, JPEG digital forensic technologies have undergone continuous development and improvement.
Popescu and Farid proposed an efficient technique to detect image recompression with resample effect, which always appears in the quantized discrete cosine transform (DCT) coefficient histogram. Based on the DCT of small fixed-size image blocks, Huang et al. presented an efficient technique to automatically detect duplicated regions in a tampered image. This method fails if the tampered region comes from other images. Stamm xand Liu proposed an algorithm for detecting forged images by statistical intrinsic fingerprints. This method can detect global and local contrast enhancement, identify histogram equalization, and detect global addition of noise to a previously JPEG-compressed image. Peng et al. proposed a novel scheme to detect and locate the tampered region based on compound statistics features, which is effective for copy–paste image forensics between various images. However, the detection results in[3, 4] become unsatisfactory when local manipulations with small tampered regions are conducted.
Farid proposed a tampered region detection method for the copy–paste operation based on JPEG Ghost. This method only works when the original JPEG quality factor of the tampered region is lower than that of the untampered region, and is also lower than the resaved quality factor of the composite image, which limits the usage of the method. Liu et al. proposed a passive copy–move forgery detection method by computing the averaged sum of absolute difference (SAD). The method fails when the original quality factors of the inserted region and the authentic region are equal or almost equal. In addition, the obtained SAD image is a grayscale image, and the authors detect the tampered region from the SAD image by using threshold and mathematical morphology methods, which will significantly reduce the accuracy of locating the tampered region. Fan and de Queiroz proposed an algorithm to detect whether an image has previously been JPEG compressed and further locate the whole position of block artifacts. The detection result of this method is easy to be interfered by mismatched block artifacts when a JPEG image is tampered by copy–paste. Li et al. proposed a passive detection method for the doctored JPEG image via block artifact grid extraction. This method is effective for copy–paste, inpainting, and cropping manipulations with the doctored image saved in an uncompressed format, such as BMP and TIF. It fails if the image is saved in a JPEG format after being manipulated. Zhao et al. presented a passive digital image forensic technique for detecting the tampered region of an inpainting JPEG image when the tampered image is saved in uncompressed format or in JPEG format.
Lin et al. proposed an automatic tampered JPEG images detection method by examining the double quantization effect hidden among the DCT coefficients. The authors calculated the block posterior probability map (BPPM) according to Bayesian statistical characteristics of DCT coefficient histograms of a tampered JPEG image, and then located the tampered region by thresholding the BPPM. In this method, the obtained BPPM is only 1/64 of the original to-be-examined image in size, which may affect the final location accuracy of the tampered region, especially for small tampered region. Fu et al. proposed that all JPEG coefficients (quantized DCT coefficients) of a singly compressed JPEG image follow the generalized Benford’s law, and applied it to detect whether a bitmap image undergoes JPEG compressed previously, and if so, to estimate the original JPEG quality factor. Based on the above development, Li et al. proposed mode-based first digit features (MBFDF) to detect whether a JPEG image has undergone double JPEG compression. This method is superior to all previous methods for distinguishing between single and double JPEG compression. However, both methods in[11, 12] can only reveal the compression history of a given image, and cannot detect the local tampered region in a given image.
In this article, we propose a tampered region detecting algorithm based on machine learning and the statistical properties of the first digits, which are obtained from JPEG coefficients of individual AC modes. The rest of the article is organized as follows. “Analysis of the first digits’ probability distribution by Benford’s law” section focuses on the first digits’ probability distribution of JPEG coefficients of singly and doubly compressed JPEG images. In “Detection algorithm for the tampered region” section, we describe a technique to detect whether any part of the detected image has different compression history from the remaining region. In “Experimental results and statistical analysis” section, we present experimental results and their statistical analysis. Conclusions are drawn in at last section.
Analysis of the first digits’ probability distribution by Benford’s law
As we know, JPEG image compression is divided into the following steps: 8 × 8 block extraction, DCT transform, quantization, and coding. An original uncompressed image is first partitioned into 8 × 8 pixel blocks. Then each block is converted to frequency space by a 2D DCT. The value located in the upper-left corner of the block is called direct currentcoefficient, and the other 63 values are called alternate current (AC) coefficients. Next, each block DCT coefficients are quantized by the JPEG quantization table.
where d is the value of the first digits and p(d) denotes the probability of digital d.
where N, s, and q are model parameters to precisely describe the distribution. Different compression factors correspond to different N, s, and q values.
Un-compressed image database (UCID) is a color image database including 1,338 uncompressed TIFF images, which span a wide range of indoor and outdoor scenes with the size of 512 × 384. In our experiments, we conduct single JPEG compression three times (QF = 70, 80, and 90) and double JPEG compression three times (QF1, QF2 = 55, 70; 65, 80; 75, 90) for all 1,338 images in the UCID database. Note that unless specified in the article, we refer double JPEG compression to that an image is compressed twice by the same or different JPEG quality factors successively in the 8 × 8 blocks.
Detection algorithm for the tampered region
Assuming a JPEG image is saved in JPEG format after being tampered, the un-tampered region usually has different compression history from the tampered region(s). This study is to detect and locate the tampered region(s) in a manipulated image. In this article, we put forward a novel detecting method. Figure3 shows the work flow of our algorithm. The main detection steps are as follows:
Step 1. Train a two-class support vector machine (SVM) by using the MBFDF described above for, say, 1000 randomly selected singly JPEG compressed images (the original uncompressed images are from UCID) and their counterparts: the JPEG doubly compressed images with different QF values.
Step 2. Divide a test image into continuous non-overlapping 8 × 8 pixel blocks.
Step 3. Centering at each block, take a sub-image with the size of (2n + 1) × (2n + 1) blocks, where.
Step 4. For each sub-image, calculate its first digits’ probability distribution of JPEG coefficients of the first i AC modes to obtain a feature vector of i × 9 dimensions, where each 9 features are probabilities of the nine first digits of one AC mode.
From a statistical point of view, the larger the n is, the more obvious the statistical characteristics are. However, with the increasing of n, the accuracy of locating the tampered region will decrease. Therefore, in order to achieve high accuracy in locating the tampered region, the value of n should be small. However, the smaller the n, the more noise appears in the detection result. As a compromise, n is usually set as 1 or 2, and i is ranging from 15 to 25.
There are three kinds of popularly used manipulations, (1) copy–paste manipulation with the inserted region coming from the uncompressed images (referred to as JPEG + uncompressed); (2) copy–paste manipulation with the inserted region coming from JPEG images (referred to as JPEG + JPEG); (3) inpainting manipulation on JPEG images (referred to as JPEG + inpainting). In each manipulation, the composite image is finally saved in JPEG format. Now, we introduce the tampered region detecting method for the above three manipulations, respectively.
JPEG + uncompressed
For an original image with JPEG quality factor QF1, we insert an uncompressed image such as TIF, BMP, and then save the composite image at JPEG quality factor QF2 (). In this tampering scheme, the tampered region undergoes single JPEG compression, but the un-tampered region undergoes double JPEG compression.
JPEG + JPEG
While an image was tampered with JPEG + JPEG manipulation, the un-tampered region undergoes double JPEG compression with blocks matching. Although the inserted region undergoes double JPEG compression, the probability of matching between the 8 × 8 grid of the original image and that of the copy–paste inserted image is only 1/64. Therefore, we can regard the tampered region of the composite image as singly compressed region in our proposed algorithm.
JPEG + inpainting
Inpainting is also a usually used imperceptible image tampering method, which selects some neighboring pixels to replace the original information in order to hide particular objects in the original image. In this case, the tampered region consists of some random pixels. When an original image with the JPEG quality factor QF1 is manipulated in the way of inpainting, and is then saved at JPEG quality factor QF2, we can consider that the tampered region undergoes single JPEG compression with quality factor QF2 and the un-tampered region undergoes double JPEG compression with the primary quality factor QF1 and the secondary quality factor QF2. Therefore, the tampered region could be available distinguished from the un-tampered region by our proposed algorithm.
Experimental results and statistical analysis
To further testify the efficacy of our proposed algorithm, we randomly choose 1,000 singly compressed images and their doubly compressed counterparts from UCID database to train a two-class classification SVM, and randomly choose 700 images from another color image database with each of size 768 × 576 as the test set. First, we conduct single JPEG compression for all 700 uncompressed images with JPEG quality factor QF1. A central portion for each singly compressed image is tampered with JPEG + uncompressed and JPEG + JPEG manipulations, respectively, and then the entire image is saved at JPEG quality factor QF2. Due to the tampered and un-tampered regions generated by the JPEG + inpainting manipulation have the same compression history, respectively, as those generated by the JPEG + uncompressed manipulation, we will not discuss the JPEG + inpainting manipulation individually. In all of our experiments, the size of central tampered region is 150 × 150 pixels. We choose the top 20 AC modes to calculate the feature vector, and the parameter used for determining the size of sub-image is n = 1. The JPEG quality factor QF1 ranges from 50 to 65 in a step of 5 and the JPEG quality factor QF2 ranges from 75 to 95 in a step of 5. Next, we detect these tampered images by applying our proposed algorithm.
The ME and STD of OL for the JPEG + uncompressed manipulation
150 × 150
The ME and STD of DE for the JPEG + uncompressed manipulation
150 × 150
The ME and STD of OL for the JPEG + JPEG manipulation
150 × 150
The ME and STD of DE for the JPEG + JPEG manipulation
150 × 150
In this article, we focus on analyzing the first digits’ probability distributions of JPEG coefficients for images with different JPEG compression history, and further present an efficient and automatic detection method by using MBFDF to decide whether a given JPEG image has locally been manipulated or not, and if so, to locate the tampered region.
There are several advantages with the proposed method. First, it can accurately detect and locate the tampered region. Second, it is effective for different kinds of forgery techniques: (1) copy–paste manipulation with the inserted region coming from uncompressed images; (2) copy–paste manipulation with the inserted region coming from JPEG images; (3) inpainting manipulation on JPEG images. Third, it is an automatic tampered JPEG images detecting method and we donot require any prior knowledge. Finally, the detection accuracy is high and DE is small.
This study was supported by the National Natural Science Foundation of China (Grant No. 61172184) and the Hunan Provincial Natural Science Foundation of China (Grant No. 12JJ6062). The authors would like to thank Drs. Bin Li and Gang Yu for their valuable suggestions.
- Popescu A, Farid H: Statistical tools for digital forensics. Lecture Notes Comput. Sci. 2005, 3200: 395-407.Google Scholar
- Huang Y, Lu W, Sun W, Long D: Improved DCT-based detection of copy-move forgery in images. Forensic Sci. Int. 2011, 206: 178-184. 10.1016/j.forsciint.2010.08.001View ArticleGoogle Scholar
- Stamm MC, Liu KJR: Forensic detection of image manipulation using statistical intrinsic fingerprints. IEEE Trans. Inf. Forensics Security 2011, 5(3):492-506.View ArticleGoogle Scholar
- Peng F, Nie Y, Long M: A complete passive blind image copy-move forensics scheme based on compound statistic features. Forensics Sci. Int. 2011, 212: e21-e25. 10.1016/j.forsciint.2011.06.011View ArticleGoogle Scholar
- Farid H: Exposing digital forgeries from JPEG ghosts. IEEE Trans. Inf. Forensics Security 2009, 4(1):154-160.MathSciNetView ArticleGoogle Scholar
- Liu Z, Li X, Zhao Y: Passive detection of copy-paste tampering for digital image forensics, inProc. Fourth Int. Conf. Intelligent Comput. Technol. Automation 2011, 2: 649-652.Google Scholar
- Fan Z, de Queiroz RL: Identification of bitmap compression history: JPEG detection and quantizer estimation. IEEE Trans. Image Process. 2003, 12(2):230-235. 10.1109/TIP.2002.807361View ArticleGoogle Scholar
- Li W, Yuan Y, Yu N: Passive detection of doctored JPEG image via block artifact grid extraction. Signal Process. 2009, 89(9):1821-1829. 10.1016/j.sigpro.2009.03.025View ArticleMATHGoogle Scholar
- Zhao YQ, Liao M, Shih FY, Shi YQ: Tampered region detection of inpainting JPEG images. Optik – Int. J. Light Electron Opt. 2012. 10.1016/j.ijleo.2012.08.018Google Scholar
- Lin Z, He J, Tang X, Tang C-K: Fast, automatic and fine-grained tampered JPEG image detection via DCT coefficient analysis. Pattern Recognit. 2009, 42: 2492-2501. 10.1016/j.patcog.2009.03.019View ArticleMATHGoogle Scholar
- Fu D, Shi YQ, Su Q: A generalized Benford’s law for JPEG coefficients and its applications in image forensics. Proc. SPIE 2007, 6505: 65051L1-65051L11.Google Scholar
- Li B, Shi YQ, Huang J: Detecting doubly compressed JPEG images by using mode based first digit features. In IEEE International Workshop on Multimedia Signal Processing. Cairns, Queensland, Australia; 2008:730-735.Google Scholar
- Schaefer G, Stich M: UCID—an uncompressed colour image database. Technical Report, School of Computing and Mathematics. Nottingham Trent University, UK; 2003.Google Scholar
- Criminisi A, Perez P, Toyama K: Region filling and object removal by exemplar-based inpainting. IEEE Trans. Image Process. 2004, 13(9):1200-1212. 10.1109/TIP.2004.833105View ArticleGoogle Scholar
- Olmos A, Kingdom FAA: McGill Calibrated Colour Image Database. 2004. http://tabby.vision.mcgill.caGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.