- Research
- Open Access
Modified BCH data hiding scheme for JPEG steganography
- Vasily Sachnev^{1} and
- Hyoung Joong Kim^{2}Email author
https://doi.org/10.1186/1687-6180-2012-89
© Sachnev and Kim; licensee Springer. 2012
- Received: 14 June 2011
- Accepted: 26 April 2012
- Published: 26 April 2012
Abstract
In this article, a new Bose-Chaudhuri-Hochquenghem (BCH)-based data hiding scheme for JPEG steganography is presented. Traditional data hiding approaches hide data into each block, where all the blocks are not overlapping each other. However, in the proposed method, two consecutive blocks can be overlapped to form a combined block which is larger than a single block, but smaller than two consecutive nonoverlapping blocks in size. In order to embed more amounts of data into the combined block than a single block, the BCH-based data hiding scheme has to be redesigned. In this article, we propose a way to get a joint solution for hiding data into two blocks with intersected coefficients such that any modification of the intersected area does not affect the data hiding process into both blocks. Due to hiding more amounts of data into the intersected area, embedding capacity is increased. On the other hand, the nonzero DCT coefficient stream is modified to achieve better steganalysis and to reduce the distortion impact after data hiding. This approach carefully inserts or removes 1 or -1 coefficients into or from the DCT coefficient stream according to the rule proposed in this article. Experimental results show that the proposed algorithms work well and their performance is significant.
Keywords
- BCH
- steganography
- less detectable data hiding
1. Introduction
One of the first steganography methods for JPEG images embeds data by changing the least-significant bit values of the quantized discrete cosine transform (DCT) coefficients. However, this method can easily be detected by a statistical analysis. Thus, for a good while, evading the statistical analysis has been a major concern. Provos [1] divides the DCT coefficients into two disjoint subsets, hides data into the first subset, and compensates the distorted histogram by modifying the second subset. Other methods in [2, 3] use a similar approach. On the other hand, Solanki et al. [4] utilize the robust watermarking scheme for steganography purposes. They embed data into image in the spatial domain by using a technique robust against JPEG compression. Their scheme provides less degradation onto the features of the DCT coefficients, and, as a result, its detectability was low against old version of the statistical steganalysis.
Another way to survive against steganalysis is reducing the number of modified coefficients. Traditionally, each nonzero DCT coefficient has been modified. As a result, embedding capacity is as much as the number of nonzero DCT coefficients. However, the maximum possible embedding capacity trades off the detectability. Westfeld [5] has used a matrix encoding (ME) technique to lower detectability by sacrificing the embedding capacity. The ME technique exploits the Hamming code which is designed for error correction. His scheme hides many bits by flipping at most one coefficient in each block. This approach was the first instance of using the error correcting code for data hiding.
Fridrich et al. [6–13] use the concept of the "minimal distortion" to enhance the security (i.e., by reducing distortion). The perturbed quantization steganography utilizes the wet paper coding.
Later, Kim et al. [14] have improved the performance of the ME by reducing the distortion impact. In fact, their modified matrix encoding (MME) method changes more number of coefficients compared to the ME. However, they show that the distortion impact after modifying one coefficient may be larger than that after modifying two coefficients. Thus, it is obvious that modifying one coefficient or two per block may have less distortion and lower detectability against the steganalysis. Note that MME requires the original uncompressed image for data hiding, but not for decoding.
Schönfeld and Winkler [15] have proposed a new way to hide data using more powerful error correction code. They use a structured Bose-Chaudhuri-Hochquenghem (BCH) code [2]. Zhang et al. [16] have significantly improved the original BCH-based data hiding scheme. Their improved method can easily find the flip positions and defeat the steganalysis well compared to the existing methods. Later, Sachnev et al. [17] apply a heuristic optimization technique for the data hiding scheme over the BCH coding and modify the stream of the input DCT coefficients to reduce the distortion. Their method considerably outperforms the steganography method proposed by Zhang et al. [16].
Recently, Filler and Fridrich [18] have proposed a remarkable framework which minimizes a distortion measure as a weighted norm of the difference between cover and stego feature vectors. In their approach, the distortion is not necessarily an additive function over the pixels because the features may contain higher-order statistics such as sample transition probability matrices of pixels or DCT coefficients modeled as Markov chains [19–21]. When the distortion measure is defined as a sum of local potentials, practical near-optimal embedding methods can be implemented with syndrome-trellis codes [22].
Most of the above-mentioned steganographic methods use the nonoverlapping blocks of the DCT coefficients for hiding secret message. Such a blockwise embedding scheme divides both the stream of the DCT coefficients and hidden message into the separate blocks and solves the equations for hiding data for each block individually. Recent methods like MME [14], BCH-based steganography methods [15–17] may produce several alternative solutions. Thus, such a data hiding method can choose a solution with the lowest distortion impact. Past investigation over the BCH data hiding scheme finds that BCH usually allows redundant number of possible solutions. It means that a solution with acceptable distortion impact can be achieved from the reduced set of possible solutions. Hence, the embedding efficiency of the BCH steganographic methods can be increased by reducing the number of possible solutions and keeping similar distortion impact compared to the original approach.
In the proposed method, two blocks of the DCT coefficients form a combined block sharing common coefficients in the intersected part between two consecutive blocks. Such a design achieves high embedding efficiency by hiding data twice into the intersected area. The number of possible joint solutions for both blocks (i.e., solutions which valid for both blocks) is always smaller than the number of all possible solutions for two independent blocks. The reduced number of possible solutions can increase distortion, but not significantly. Besides, the number of possible solutions can easily be controlled by changing size of the intersected area. The smaller size of the intersected area, the larger number of possible joint solutions. Similar approach has been tested for Hamming code in [23].
However, the higher size of the intersected area, the higher embedding efficiency of the proposed method. In the proposed method, the block of the DCT coefficients can be modified by inserting new nonzero coefficients 1 or -1, or removing coefficients 1 or -1. Such modification is carried out carefully and sophisticatedly in order to reduce distortion caused by excessive hiding.
The rest of the article is organized as follows. Section 2 explains the details of the BCH coding. Section 3 presents the BCH-based modified data hiding scheme. In Section 4, we propose the inserting-removing strategy. The encoder and decoder are presented in Section 5. Section 6 provides the experimental results. Finally, Section 7 concludes the article.
2. BCH syndrome coding
The BCH codes are the well known and widely used family of the error correction codes. BCH code (n, k, t) can correct t bits by inserting n - k additional bits to the original message k such that syndrome of resulted n bits is equal to 0. In general, BCH codes were invented for error correction and cannot directly be used for data hiding. An efficient method of using powerful BCH codes for data hiding has been presented in [15–17].
2.1. BCH syndrome coding
Assume that the original stream of binary data is V = {v_{0}, v_{1}, v_{2}, ..., v_{n-1}}, and the modified stream of binary data after data hiding is R = {r_{0}, r_{1}, r_{2}, ..., r_{n-1}}. The streams V and R over GF(2^{ m }) can be represented as V(x) = v_{0} + v_{1}·x + v_{2}·x^{2} + v_{3}·x^{3} + ⋯ + v_{n-1}·x^{n-1}, and R(x) = r_{0} + r_{1}·x + r_{2}·x^{2} + r_{3}·x^{3} + ⋯ + r_{n-1}·x^{n-1}, respectively.
where u = {u_{0}, u_{1}, u_{2}, ..., u_{ l }} are the positions of the elements in V to be flipped in order to get R.
2.2. Lookup tables
In this article, we utilized the method of Zhao et al. [24] based on the fast lookup tables for finding roots of quadratic and cubic polynomial of σ(x). Similar approach has been used in [16, 17].
2.3. Solutions
Hiding message m to the binary stream V requires to find the positions of the coefficients to be flipped. In this article, we used a method presented in [16, 17] to get one, two, three, or four flips solutions. The set of all possible solutions for one, two, three, or four flips has to be stored in the look up tables J_{1}, J_{2}, J_{3}, and J_{4}, respectively. The notation J_{3}(S) returns all three flips solutions for syndrome S = {S_{1}S_{2}}. Similarly, we can get all possible solutions for block n_{1} with syndrome S^{I}, for block n_{2} with syndrome S^{II}, as J^{I} = {J_{1}(S^{I}) J_{2}(S^{I}) J_{3}(S^{I}) J_{4}(S^{I})} and for block n_{2} with syndrome S^{II} as J^{II} = {J_{1}(S^{II}) J_{2}(S^{II}) J_{3}(S^{II}) J_{4}(S^{II})}, respectively. The look up tables' size is (2^{2·m}- 1) × nS where nS is a number of stored solutions.
3. Proposed data hiding scheme
One of the two main contributions of this article is to present a systematic algorithm for the joint solutions. The proposed BCH-based data hiding scheme requires to find a joint solution for both blocks n_{1} and n_{2} using the guidelines from Section 2.1 such that the intersected area does not affect the result. For example, let 8 bits be hidden into 15 coefficients from a_{1} to a_{15} using the BCH-based steganography. Then, another 8 bits can be hidden into the next block having another 15 coefficients from a_{11} to a_{25}. This is the traditional approach. As a result, 16 bits can be hidden into 30 coefficients. However, our new approach hides the same amount of data into 25 coefficients a_{1} to a_{25}. Eight bits are hidden into the coefficients from a_{1} to a_{15}, and another eight bits into the coefficients from a_{11} to a_{25}. Data hiding algorithm requires to find syndromes S^{I} and S^{II} (Equation 6) for each block n_{1} and n_{2}, respectively.
- 1.
Hiding data into the block n _{2} first.
- (a)
Some solutions for hiding data into the block n _{1} do not modify the coefficients in the intersected area. Thus, solutions for the block n _{2} have to be obtained using the original syndrome S ^{II}. Some solutions are valid since they do not modify the coefficients in the intersected area. These solutions are called specified solutions.
- (b)
Some solutions for the block n _{1} modify the coefficients in the intersected area. These modifications in the intersected area affect the syndrome for the block n _{2}. Thus, the new syndrome for the block n _{2} is obtained as S ^{II} new. Some new solutions are valid since they do not modify the coefficients already modified by the n _{1} in the intersected area.
- 2.
Hiding data into the block n _{2} first.
- (a)
Some solutions for hiding data into the block n _{2} do not modify the coefficients in the intersected area. Thus, solutions for the block n _{1} have to be obtained using the original syndrome S ^{I} . Some solutions are valid since they do not modify the coefficients in the intersected area.
- (b)
Some solutions for the block n _{2} modify the coefficients in the intersected area. These modifications in the intersected area affect the syndromes for the block n _{1}. Thus, the new syndrome for the block n _{1} is obtained as ${S}_{new}^{\mathsf{\text{I}}}$. Some new solutions are valid since they do not modify the coefficients already modified by the n _{2} in the intersected area.
The joint solutions for a combined block unify the solutions for the block n_{2} and its syndrome S^{II} and the specified solutions for the block n_{1} and its syndrome S^{I} new (in case of 2(a), ${S}_{new}^{\mathsf{\text{I}}}={S}^{\mathsf{\text{I}}}$).
In general, the proposed modified BCH data hiding schemes hides 4·m bits of data to the block of 2·(2^{ m }-1)-|I| by using the BCH scheme (2^{ m }-1, k, 2) for blocks n_{1} and n_{2}.
where m defines the proper BCH-based scheme for the proposed method, N is the number of nonzero DCT coefficients, M is the hidden message, n^{ p } = 2·(2^{ m }-1)-|I| is the size of the combined block, 4·m is the capacity of the combined block.
3.1. Data hiding algorithm
where R_{1} and R_{2} are the modified streams of the binary coefficients obtained from n_{1} and n_{2} (see Figure 1); H is a parity-check matrix from Equation (1).
Note that, hiding message m_{1} to block n_{1} modifies the block n_{2} and vice versa, due to the intersected part. Hence, we need proper positions to flip by solving Equation (9) for correct decoding.
Among all possible solutions, the proposed method unifies the solutions for blocks n_{1} and n_{2}, such that the flip positions cover only nonintersected area for both blocks (i.e., ${J}_{s}^{\mathsf{\text{I}}}={J}^{\mathsf{\text{I}}}\notin I$ and ${J}_{s}^{\mathsf{\text{II}}}={J}^{\mathsf{\text{II}}}\notin I$, for blocks n_{1} and n_{2}). In other words, it is desirable to hide data into the block n_{1} using the solutions from ${J}_{s}^{\mathsf{\text{I}}}$ that do not affect the block n_{2}, and vice versa. According to the above explanation, ${J}_{s}^{\mathsf{\text{I}}}$ and ${J}_{s}^{\mathsf{\text{II}}}$ unify the specified solutions for the blocks n_{1} and n_{2}, respectively. Here, note that superscript indexes X^{I} and X^{II} present different items for blocks n_{1} and n_{2}, respectively.
However, even if some flip positions j from the block n_{1} belong to the intersected area I. Thus, we can consider the effect of those j to get a new solutions for the block n_{2} and vice versa.
where ${S}^{\mathsf{\text{II}}}=\left\{{S}_{1}^{\mathsf{\text{II}}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{S}_{2}^{\mathsf{\text{II}}}\right\}$ is the syndrome for blocks n_{2}; ${S}_{new}^{\mathsf{\text{II}}}=\left\{{P}_{1}^{\mathsf{\text{II}}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{P}_{2}^{\mathsf{\text{II}}}\right\}$ is a new syndrome for blocks n_{2} after hiding data to block n_{1}; l is the number of the flip positions (j_{1}, ..., j_{ l }) from the block n_{1} belonged to the intersected area I (i.e., j_{1}, ..., j_{ l } = J^{ I } (S^{ I } ) ∈ I); and the values β_{1}, ..., β_{ l }are computed using Equation (13) for the flipping positions $\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)=F\left({j}_{1},\dots ,{j}_{l}\right)$ from the intersected area I for the block n_{2}. Function F converts indexes (j_{1}, ..., j_{ l }) of the intersected area from the block n_{1} to the corresponding indexes $\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)$ from the block n_{2}. For example, solution for the block n_{1} illustrated in Figure 1 is ${J}^{\mathsf{\text{I}}}\left({S}^{\mathsf{\text{I}}}\right)=\left[\begin{array}{cc}\hfill 3\hfill & \hfill 11\hfill \end{array}\right]$. j_{1} = 11 ∈ I, where index 1 means the first coefficient form the intersected area I. Coefficient j_{1} = 11 is located in the 11th position of the combined block. However, 11th coefficient in the combined block is the 15th coefficient in the block n_{2} (i.e., $F\left({j}_{1}\right)=F\left(11\right)={j}_{1}^{\prime}=15$ see Figure 1). Thus, even if the flip positions for blocks n_{1} and n_{2} are different (i.e., j_{1} = 11 and ${j}_{1}^{\prime}=15$), those coefficients have the same location in the combined block.
Finally, the solution for the block n_{2} can be obtained as $\left\{{j}_{1}^{\prime},\dots ,{j}_{l}^{\prime},\phantom{\rule{0.3em}{0ex}}{J}_{s}^{\mathsf{\text{II}}}\left({S}_{new}^{\mathsf{\text{II}}}\right)\right\}$. Presented solution sufficiently hides message m_{2} into the block n_{2}.
The joint solution hides both messages m_{1} and m_{2} into the combined blocks. The joint solution $\left\{{J}^{\mathsf{\text{I}}}\left({S}^{\mathsf{\text{I}}}\right),\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{J}_{s}^{\mathsf{\text{II}}}\left({S}_{new}^{\mathsf{\text{II}}}\right)\right\}$ unifies the solutions for the blocks n_{1} and n_{2}. In this example, the flipping positions from the intersected area are the part of J^{I}(S^{I}).
where ${S}^{\mathsf{\text{I}}}=\left\{\begin{array}{cc}\hfill {S}_{1}^{\mathsf{\text{I}}}\hfill & \hfill {S}_{2}^{\mathsf{\text{I}}}\hfill \end{array}\right\}$ is the syndrome for blocks n_{1}; ${S}_{new}^{\mathsf{\text{II}}}=\left\{\begin{array}{cc}\hfill {P}_{1}^{\mathsf{\text{II}}}\hfill & \hfill {P}_{2}^{\mathsf{\text{II}}}\hfill \end{array}\right\}$ is the new syndrome of the block n_{1} after hiding data to block n_{2}; l is the number of flip positions $\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)$ for the block n_{2} belonged to the intersected area I (i.e., $\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)={J}^{\mathsf{\text{II}}}\left({S}^{\mathsf{\text{II}}}\right)\in I$); β_{1}, ..., β_{ l } are computed using Equation (15) for the flipping positions $\left({j}_{1},...,{j}_{l}\right)={F}^{-1}\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)$ from the intersected area I for the block n_{1}; function F^{-1} (i.e., the inverse function of F) converts the indexes of the coefficients of intersected area $\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)$ from the block n_{2} to the corresponding indexes (j_{1}, ..., j_{ l }) from the block n_{1}. For example, if ${J}^{\mathsf{\text{II}}}\left({S}^{\mathsf{\text{II}}}\right)=\left[\begin{array}{cc}\hfill 1\hfill & \hfill 15\hfill \end{array}\right]$, then ${j}_{1}^{\prime}=15\in I$, then ${j}_{1}={F}^{-1}\left({j}_{1}^{\prime}\right)={F}^{-1}\left(15\right)=11$ (see Figure 1).
The solution for the block n_{1} can be obtained as $\left\{\left({j}_{1},...,{j}_{l}\right)\phantom{\rule{0.3em}{0ex}}{J}_{s}^{\mathsf{\text{I}}}\left({S}_{new}^{\mathsf{\text{I}}}\right)\right\}$. Presented solution sufficiently hides message m_{1} into the block n_{1}.
Joint solution for hiding both messages m_{1} and m_{2} is $\left\{{J}_{s}^{\mathsf{\text{I}}}\left({S}_{new}^{\mathsf{\text{I}}}\right)\phantom{\rule{0.3em}{0ex}}{J}^{\mathsf{\text{II}}}\left({S}^{\mathsf{\text{II}}}\right)\right\}$. Here, the flipping positions from the intersected area $\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)$ are the part of J^{II}(S^{II}). Corresponding flipping positions $\left({j}_{1},...,{j}_{l}\right)={F}^{-1}\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)$ are the part of the solution for the block n_{1}.
Note that there are several solutions in J^{I} and J^{II} for syndromes S^{I} and S^{II}, respectively. Presented method may generate one joint solution for each solution from J^{I}(S^{I}) and J^{II}(S^{II}).
The complete procedure for getting all possible joint solutions for any syndromes is presented as follows:
- (a)
Define two blocks of the DCT coefficients n _{1} and n _{2} (see Figure 1). Compute syndromes S ^{I} and S ^{II} using corresponding binary streams v' and v".
- (b)
Find all possible solutions j ^{I} = J ^{I}(S ^{I}) and j ^{II} = J ^{II}(S ^{II}) for blocks n _{1} and n _{2} by using the syndromes S ^{I} and S ^{II}.
- (c)
For each solution j ^{I}(p) (p = 1, 2, 3,..,k, where k is the number of solutions) process follows:
- i.
Define flip positions j _{1}, ..., j_{ l } from the intersected area I.
- ii.
Convert j _{1}, ..., j_{ l } to ${j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}$ (corresponding flip positions from the block n _{2}). Compute corresponding β using Equation 13. Compute new syndrome ${S}_{new}^{\mathsf{\text{II}}}$ using Equation 10.
- iii.
Using a new syndrome ${S}_{new}^{\mathsf{\text{II}}}$ get new flips solutions as ${j}_{new}^{\mathsf{\text{II}}}={J}_{s}^{\mathsf{\text{II}}}\left({S}_{new}^{\mathsf{\text{II}}}\right)$.
- iv.
For each solution ${j}_{new}^{\mathsf{\text{II}}}\left(q\right)$ (q = 1, 2, 3,...,z, where z is the number of solutions) store the joint solution: $\left\{{j}^{\mathsf{\text{I}}}\left(p\right),{j}_{new}^{\mathsf{\text{II}}}\left(q\right)\right\}$.
- (d)
For each solution j ^{II} (p) (p = 1, 2, 3,...,k) process follows:
- i.
Define flip positions ${j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}$ from the intersected area I for block n _{2}.
- ii.
Convert ${j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}$ to j _{1}, ..., j_{ l } Compute corresponding β. Compute new syndrome ${S}_{new}^{\mathsf{\text{I}}}$ using Equation 11.
- iii.
Using a new syndrome ${S}_{new}^{\mathsf{\text{I}}}$ get new flips solutions as ${j}_{new}^{\mathsf{\text{I}}}={J}_{s}^{\mathsf{\text{I}}}\left({S}_{new}^{\mathsf{\text{I}}}\right).$
- iv.
For each solution ${j}_{new}^{\mathsf{\text{I}}}\left(q\right)$ (q = 1, 2, 3,...,z, where z is the number of solutions) store the joint solution: $\left\{{j}_{new}^{\mathsf{\text{I}}}\left(q\right),{j}^{\mathsf{\text{II}}}\left(p\right)\right\}$.
The stored joint solutions are used further to hide data with better performance. Note that the proposed method needs to search the best solution among k·q possible candidates for each block (see steps c and d). Thus, computational complexity of the proposed search algorithm is O(n^{2}).
3.2. Two-stage embedding technique
In order to enhance the performance of the blockwise methods (i.e., ME, MME, BCH-based data hiding, etc.), we utilize almost all the DCT coefficients for data hiding. The proposed method uses two different embedding schemes together. Two schemes use the different block sizes ${n}_{1}^{p}$ and ${n}_{2}^{p}$, and have different payloads ${m}_{1}^{p}$ and ${m}_{2}^{p}$.
This method divides the stream of the DCT coefficients (c_{1}, c_{2}, ..., c_{ N }) and the message M into two parts and hides data into each part separately. The optimal number of the blocks (k_{1} and k_{2}) for both schemes can be computed as follows:
where N is the number of DCT coefficients.
The presented two-scheme embedding method improves the performance of data hiding by using the proper distribution of the available DCT coefficients among two different modified BCH schemes. First scheme uses ${m}_{1}^{p}=4\cdot m$ obtained from inequality (8), the second scheme uses ${m}_{p}^{2}=4\cdot \left(m+1\right)$. Note that the second scheme has higher embedding efficiency. The efficiency of the two schemes embedding refers to the ratio between number of blocks k_{1} and k_{2} for the schemes 1 and 2, respectively. The larger the value k_{1} (smaller ratio k_{1}/k_{2}), the higher efficiency of the proposed two schemes embedding for the same m.
Accuracy of the steganalysis [20] for different sizes of the intersected areas and payloads
Payload bpc | Accuracy of the stege analysis | |||||
---|---|---|---|---|---|---|
Proposed method | BCH[16]\improvement | |||||
0.05 | I _{sh 2} | 50.12\0.09 | ||||
10% | 30% | 50% | ||||
I _{sh 1} | 30% | 50.11 | 50.08 | 50.04 | ||
50% | 50.05 | 50.06 | 50.03 | |||
0.1 | I _{sh 2} | 51.54\1.51 | ||||
10% | 30% | 50% | ||||
I _{sh 1} | 30% | 50.11 | 50.06 | 50.04 | ||
40% | 50.05 | 50.07 | 50.03 | |||
0.15 | I _{sh 2} | 57.13\6.58 | ||||
30% | 35% | 40% | ||||
I _{sh 1} | 10% | 50.25 | 50.31 | 50.48 | ||
15% | 50.29 | 50.28 | 50.55 | |||
0.17 | I _{sh 2} | 60.03\7.22 | ||||
25% | 35% | 45% | ||||
I _{sh 1} | 5% | 53.87 | 54.01 | 53.96 | ||
15% | 53.10 | 52.28 | 52.81 | |||
30% | 53.91 | 54.12 | 53.89 | |||
0.2 | I _{sh 2} | 65.54\6.62 | ||||
10% | 30% | 50% | ||||
I _{sh 1} | 5% | 59.81 | 59.51 | 60.02 | ||
15% | 59.26 | 58.19 | 59.01 | |||
30% | 59.98 | 60.11 | 59.45 | |||
0.22 | I _{sh 2} | 73.06\10.6 | ||||
25% | 35% | 50% | ||||
I _{sh 1} | 30% | 62.18 | 62.01 | 62.24 | ||
50% | 65.21 | 65.53 | 65.31 | |||
0.25 | I _{sh 2} | 80.45\11.33 | ||||
30% | 40% | 50% | ||||
I _{sh 1} | 50% | 69.38 | 69.25 | 69.13 |
Accuracy of the steganalysis [25] for different sizes of the intersected areas and payloads
Payload bpc | Accuracy of the stege analysis | |||||
---|---|---|---|---|---|---|
Proposed method | BCH[16]\improvement | |||||
0.05 | I _{sh 2} | 50.12\0.1 | ||||
10% | 30% | 50% | ||||
I _{sh 1} | 30% | 50.13 | 50.08 | 50.10 | ||
50% | 50.06 | 50.05 | 50.02 | |||
0.1 | I _{sh 2} | 51.54\1.48 | ||||
10% | 30% | 50% | ||||
I _{sh 1} | 30% | 50.10 | 50.07 | 50.08 | ||
50% | 50.09 | 50.11 | 50.06 | |||
0.15 | I _{sh 2} | 57.03\3.92 | ||||
30% | 35% | 40% | ||||
I _{sh 1} | 10% | 53.11 | 53.01 | 52.89 | ||
15% | 52.71 | 52.88 | 52.78 | |||
0.17 | I _{sh 2} | 60.34\3.29 | ||||
25% | 35% | 45% | ||||
I _{sh 1} | 5% | 57.90 | 57.28 | 57.61 | ||
15% | 57.46 | 57.05 | 57.82 | |||
30% | 57.95 | 58.10 | 58.14 | |||
0.2 | I _{sh 2} | 66.43\3.38 | ||||
10% | 30% | 50% | ||||
I _{sh 1} | 5% | 63.88 | 63.57 | 64.21 | ||
15% | 63.51 | 63.05 | 64.18 | |||
30% | 64.22 | 64.30 | 64.12 | |||
0.22 | I _{sh 2} | 75.15\7.65 | ||||
25% | 35% | 50% | ||||
I _{sh 1} | 30% | 67.94 | 67.50 | 67.82 | ||
50% | 68.33 | 68.52 | 68.12 | |||
0.25 | I _{sh 2} | 82.79\8.39 | ||||
30% | 40% | 50% | ||||
I _{sh 1} | 50% | 74.28 | 74.38 | 74.40 |
The most appropriate intersected area size versus payload
Payload size (bit per nonzero coefficient) | |||||||
---|---|---|---|---|---|---|---|
0.05 (%) | 0.1 (%) | 0.15 (%) | 0.17 (%) | 0.2 (%) | 0.22 (%) | 0.25 (%) | |
Scheme ${m}_{p}^{1}$ | 50 | 50 | 10 | 15 | 15 | 30 | 50 |
Scheme ${m}_{p}^{2}$ | 50 | 50 | 30 | 35 | 30 | 35 | 50 |
Boldface numbers in Tables 1 and 2 link to the lowest accuracy and show the most appropriate intersected area size for each tested payload. Data hiding by using the most appropriate intersected area always shows better results. Tables 1 and 2 also indicate a difference between the proposed method and the original BCH-based steganography method [16] in terms of performance of the steganalysis [20, 25]. The most appropriate intersected area size presented in Table 3 was used later for other experiments.
4. Inserting-removing strategy
The performance of the proposed method can significantly be increased by using inserting-removing strategy. The proposed strategy is based on fact that the block of the 2^{m} -1 DCT coefficients can be modified before data hiding by inserting or removing coefficients 1 and -1. Data hiding to modified stream of DCT coefficients may result lower distortion and, as a result, lower detectability of the steganalysis. Such a modification has to be carried out carefully and sophisticatedly in order to reduce distortion.
where B is the 8 × 8 block of the image pixels; a' is the block of original DCT coefficients; a_{ q } is the block of DCT coefficients divided by corresponding coefficients from quantization matrix Q; a_{ r } is the block of quantized DCT coefficients; Q_{ f } is a quality factor.
According to the proposed inserting-removing strategy, the stream a of nonrounded DCT coefficients obtained from the blocks a_{ q } is divided into three sets: modifiable c_{ m } = a ∈ (-∞; -1.5) ∪ (1.5;∞), removable c_{ R } = a ∈ [-1.5; -0.5) ∪ (0.5;1.5], and insertable c_{ Ins } = a ∈ [-0.5; -0.25) ∪ (0.25;0.5]. Set c unifies modifiable, insertable, and removable sets (i.e., c = c_{ m } ∪ c_{ R } ∪ c_{ Ins }). The set C = c_{ m } ∪ c_{ R } contains all nonzero rounded DCT coefficients. According to Equation (17), only the nonzero DCT coefficients (i.e., set C) have the corresponding informative coefficients and can be used for hiding data.
The proposed steganographic method uses the stream of n_{ p } nonzero DCT coefficients from the set C for data hiding. In general, set C is the subset of the unified set c. Thus, each block unifies the n_{ p } coefficients form set C and some insertable coefficients from the set c (i.e., ${c}_{b}={c}_{m}^{\prime}\cup {c}_{R}^{\prime}\cup {c}_{Ins}^{\prime}$, where ${C}^{\prime}={c}_{m}^{\prime}\cup {c}_{R}^{\prime}$ is the block of n_{ p } nonzero DCT coefficients from the set C). Inserting or removing of any coefficients from ${c}_{Ins}^{\prime}$ and ${c}_{R}^{\prime}$ produces a new block C' with new solution for data hiding. As a result, inserting-removing strategy significantly increases the number of possible solutions and helps to find the most appropriate solution with the lowest distortion.
where Q is the corresponding quantization coefficient of the quantization table.
where l is the number of flipped coefficients.
5. Encoder and decoder
The encoder of the proposed steganographic method based on modified BCH data hiding scheme and inserting-removing strategy is organized as follows:
- 1.
Divide image I_{ m } into nonoverlapped 8 × 8 blocks of pixels and process DCT, quantization and rounding as presented in (16). Remove DC coefficients. Obtain a', a_{ q }, a_{ r } , and streams of DCT coefficients a. Permute stream a using K and any pseudo-random generator. Obtain stream c = a ∈ (-∞; -0.25) ∪ (0.25;∞) from the permuted stream a.
- 2.
Define sets: modifiable c_{ m } , insertable c_{ Ins } , and removable c_{ R } .
- 3.
Define parameters for schemes 1 and 2, and number of the blocks k _{1} and k _{2} using (14) and (15). Divide message M into two parts: ${M}_{1}={m}_{1}^{p}\cdot {k}_{1}$ and ${M}_{2}={m}_{2}^{p}\cdot {k}_{2}$.
- 4.
Start from the first block i = 1. Define the i th block of the DCT coefficients ${c}_{{b}_{i}}={c}_{{m}_{i}}^{\prime}\cup {c}_{{R}_{i}}^{\prime}\cup {c}_{{Ins}_{i}}^{\prime}$, where ${c}_{{m}_{i}}^{\prime}$, ${c}_{{R}_{i}}^{\prime}$, and ${c}_{{Ins}_{i}}^{\prime}$ are the modifiable, removable, and insertable subsets for the current block. If i = k _{1} +1 switch to the scheme 2.
- 5.
Define the block of nonzero rounded DCT coefficients ${\mathsf{\text{C}}}_{\mathsf{\text{i}}}^{\prime}={c}_{{m}_{i}}^{\prime}\cup {c}_{Ri}^{\prime}$.
- 6.
Get the solutions for the block ${C}_{i}^{\prime}$ using the modified BCH data hiding scheme (see the algorithm in Section 3). Compute the distortion D for each solution using Equation (20). Choose solution J_{ m } with the lowest distortion D_{ m } and store it.
- 7.
Modify the block ${C}_{i}^{\prime}$ by inserting or removing coefficients from the subsets ${c}_{{R}_{i}}^{\prime}$, and ${c}_{{Ins}_{i}}^{\prime}$. Obtain a new block: (i) after removing ${C}_{i}^{\prime}={c}_{{m}_{i}}^{\prime}\cup {c}_{{R}_{i}}^{\u2033}$, where ${c}_{{R}_{i}}^{\u2033}={c}_{{R}_{i}}^{\prime}-{c}_{{R}_{i}}^{\prime}\left(p\right)$ is the modified removable set and ${c}_{{R}_{i}}^{\prime}\left(p\right)$ is the removed coefficient; (ii) after inserting ${C}_{i}^{\prime}={c}_{{m}_{i}}^{\prime}\cup {c}_{{R}_{i}}^{\prime}\cup {c}_{{Ins}_{i}}^{\prime}\left(q\right)$, where ${c}_{{Ins}_{i}}^{\prime}\left(q\right)=\pm 1$ is the inserted coefficient. p and q are the current position for insertion and removing.
- 8.
Repeat steps 5-6 for all insertable and removable coefficients from ${c}_{{R}_{i}}^{\prime}$, and ${c}_{{Ins}_{i}}^{\prime}$.
- 9.
Among all stored solutions J_{ m } choose solution with the lowest distortion D_{ m } . Modify one, two, or three coefficients according to the best solution (see explanation in Section 2) and, if necessary, insert or remove coefficient in the block ${c}_{{b}_{i}}$.
- 10.
Process all k _{1} + k _{2} blocks using steps 4-9. Obtain the modified stream ${c}^{\prime}=\left\{{c}_{{b}_{1}},{c}_{{b}_{2}},\dots ,{c}_{{b}_{{k}_{2}+{k}_{2}}}\right\}$.
- 11.
Recover the original sequence order of the DCT coefficients a from the modified stream c' using the secret key K and utilized pseudo-random generator. Add DC coefficients, round the coefficients a', and obtain the modified JPEG image ${I}_{m}^{\prime}$.
The decoder of the proposed steganographic method is organized as follows:
- 1.
Read the DCT coefficients from the JPEG file. Permute them using the secret key K and utilized pseudo-random generator. Remove the DC coefficients. Obtain the stream of nonzero DCT coefficients C.
- 2.
Using Equations (15) and (16) define parameters of the schemes 1 and 2, and the number of blocks k _{1} and k _{2}. Here, N = |C|.
- 3.
Divide C into the blocks according to the k _{1} and k _{2}.
- 4.
Decode data from each block using (9).
The steganographic method based only on modified BCH data hiding scheme skips the steps 7 and 8.
6. Experimental results
where P_{ a } is the probability of misdetection (i.e., the unmodified image is classified as modified) and P_{ b } is the probability of misclassification (i.e., the modified image is classified as unmodified).
In our experiments, we test both methods: (1) based only on the modified BCH-based data hiding scheme; and (2) the modified BCH-based data hiding scheme with the proposed inserting-removing strategy. The proposed methods achieve high error probability for all the tested payloads. For payloads up to 0.1 bpc, both methods have detectability close to 50%, meaning that the steganalysis cannot distinguish the unmodified images from the modified. This probability is almost equal to that of the coin toss. For higher payloads around 0.15 and 0.2 bpc, the proposed methods show much better performance compared to the MME. Significant improvement over the MME is justified on the fact of using methods with larger embedding efficiency (i.e., the BCH-based schemes with large m). The proposed method also shows better results compared to the methods based on the original BCH-based schemes. Hence, the proposed method with the inserting-removing strategy shows the significant improvement over the method with modified BCH-based data hiding scheme only, by 0.0363, 0.0414, and 0.0392 points in terms of error probabilities for payloads 0.15, 0.2, and 0.25, respectively. For payload of 0.25 bpc, both methods show 0.2961 and 0.3353 of the error probability. The error probabilities are better than those of the MME [14], original BCH-based [16], heuristic BCH-based scheme [17], and syndrome trellis code STC [22] proposed by Kodovsky and Fridrich. Such improvement was achieved by using modified BCH-based data hiding and unique inserting-removing strategy.
7. Conclusion
In this article, an efficient data hiding technique for steganography is presented. The proposed BCH-based data hiding scheme uses two blocks to form a single combined block. A new data hiding strategy enables to get a joint solution for two blocks with intersected coefficients. Due to intersection, the proposed method requires small number of coefficients for hiding the same amount of data compared with the original nonoverlapping blockwise approaches. As a result, the proposed method can use the BCH-based schemes with large m (i.e., lager capacity). Even though the proposed method requires to use the same BCH-based scheme (for 0.17 and 0.2 bpc), the efficiency of data hiding is still high because the proposed two-scheme embedding has a lower ratio k_{1}\k_{2} compared to the original BCH-based scheme. The proposed BCH-based data hiding scheme significantly outperforms the MME and original BCH-based steganography in terms of the error probabilities and accuracy against the steganalysis. The proposed two-scheme embedding technique (see Equations 14 and 15) enables to use almost all the available DCT coefficients. The proposed strategy based on inserting and removing coefficients 1 or -1 increases the number of possible solutions and significantly decreases the total distortion. The experimental results show that the inserting-removing strategy significantly improves the performance of the proposed method. The combination of the modified BCH-based and the inserting-removing strategy achieves higher error probabilities and lower accuracy against the powerful steganalysis.
Declarations
Acknowledgements
This study was supported by the Catholic University of Korea, National Research Foundation of Korea (grant 2011-0013695), ITRC and BK21 Project, Korea University and IT R&D program (Development of anonymity-based u-knowledge security technology, 2007-S001-01).
Authors’ Affiliations
References
- Provos N: Defending against statistical steganalysis. In Proc of 10th USENIX Security Symposium. Washington, DC; 2001:24-24.Google Scholar
- Eggers J, Bauml R, Girod B: A communications approach to steganography. In Proc of EI SPIE. Volume vol. 4675. San Jose, CA; 2002:26-37.Google Scholar
- Noda H, Niimi M, Kawaguchi E: Application of QIM with dead zone for histogram preserving JPEG steganography. In Proc of ICIP. Geneva, Italy; 2005.Google Scholar
- Solanki K, Sakar A, Manjunath BS: YASS: Yet another steganographic scheme that resists blind steganalysis. Lect Notes Comput Sci 2007, 2939: 154-167.Google Scholar
- Westfeld A: High capacity despite better steganalysis (F5--a steganographic algorithm). Lect Notes Comput Sci 2001, 2137: 289-302.View ArticleGoogle Scholar
- Fridrich J: Minimizing the embedding impact in steganography. In Proc of ACM Multimedia and Security Workshop. Geneva, Switzerland; 2006:2-10.Google Scholar
- Fridrich J: Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes. Lect Notes Comput Sci 2005, 3200: 67-81.View ArticleGoogle Scholar
- Fridrich J, Filler T: Practical methods for minimizing embedding impact in steganography. In Proc EI SPIE. Volume vol. 6505. San Jose, CA; 2007:2-3.Google Scholar
- Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography using wet paper codes. In Proc of ACM Workshop on Multimedia and Security. Magdeburg, Germany; 2004:4-15.Google Scholar
- Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography. ACM Multimedia Secur J 2005, 11(2):98-107.View ArticleGoogle Scholar
- Fridrich J, Pevny T, Kodovsky J: Statistically undetectable JPEG steganography: dead ends, challenges, and opportunities. In Proc of ACM Workshop on Multimedia and Security. Dallas, TX; 2007:3-15.Google Scholar
- Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography. ACM Multimedia Secur J 2005, 11(2):98-107.View ArticleGoogle Scholar
- Fridrich J, Goljan M, Soukal D: Wet paper coding with improved embedding efficiency. IEEE Trans Inf Secur Forensics 2005, 1(1):102-110.View ArticleGoogle Scholar
- Kim YH, Duric Z, Richards D: Modified matrix encoding technique for minimal distortion steganography. Lect Notes Comput Sci 2006, 4437: 314-327.View ArticleGoogle Scholar
- Schönfeld D, Winkler A: Reducing the complexity of syndrome coding for embedding. Lect Notes Comput Sci 2008, 4567: 145-158.View ArticleGoogle Scholar
- Zhang R, Sachnev V, Kim HJ: Fast BCH syndrome coding for steganography. Lect Notes Comput Sci 2009, 5806: 48-58.View ArticleGoogle Scholar
- Sachnev V, Kim HJ, Zhang R: Less detectable JPEG steganography method based on heuristic optimization and BCH syndrome coding. In Proc of ACM Workshop on Multimedia and Security. Princeton, NJ; 2009:131-139.View ArticleGoogle Scholar
- Filler T, Fridrich J: Steganography using Gibbs random fields. In Proceedings of ACM Multimedia and Security Workshop. Rome, Italy; 2010:199-212.View ArticleGoogle Scholar
- Upham D[http://www.funet.fi/pub/crypt/stegangraphy/jpeg-jsteg-v4.diff.gz]
- Pevny T, Fridrich J: Merging Markov and DCT features for multi-class JPEG steganalysis. In Proc of SPIE. Volume vol. 6505. San Jose, CA; 2007:3-4.Google Scholar
- Shi YQ, Chen C, Chen W: Markov process based approach to effective attacking JPEG steganography. Lect Notes Comput Sci 2006, 4437: 249-264.View ArticleGoogle Scholar
- Filler T, Judas J, Fridrich J: Minimizing embedding impact in steganography using trellis-coded quantization. IEEE Trans Inf Secur Forensics 2011, 6(3):920-935.View ArticleGoogle Scholar
- Rifa-Pous H, Rifa J: Product perfect codes and steganography. Digital Signal Process 2009, 19: 764-769.View ArticleGoogle Scholar
- Zhao Z, Wu F, Yu S, Zhou J: A lookup table based fast algorithm for finding roots of quadratic or cubic polynomials in the GF(2^{m}). J Huazhong Univ Sci Technol (Nat Sci Ed.) 2005, 33(1):70-73.MathSciNetGoogle Scholar
- Kodovsky J, Fridrich J: Calibration revisited. In Proceedings of the 11th ACM Multimedia & Security Workshop. Edited by: Dittmann J, Craver S, Fridrich J. Princeton, NJ; 2009.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.