In this article, a new Bose-Chaudhuri-Hochquenghem (BCH)-based data hiding scheme for JPEG steganography is presented. Traditional data hiding approaches hide data into each block, where all the blocks are not overlapping each other. However, in the proposed method, two consecutive blocks can be overlapped to form a combined block which is larger than a single block, but smaller than two consecutive nonoverlapping blocks in size. In order to embed more amounts of data into the combined block than a single block, the BCH-based data hiding scheme has to be redesigned. In this article, we propose a way to get a joint solution for hiding data into two blocks with intersected coefficients such that any modification of the intersected area does not affect the data hiding process into both blocks. Due to hiding more amounts of data into the intersected area, embedding capacity is increased. On the other hand, the nonzero DCT coefficient stream is modified to achieve better steganalysis and to reduce the distortion impact after data hiding. This approach carefully inserts or removes 1 or -1 coefficients into or from the DCT coefficient stream according to the rule proposed in this article. Experimental results show that the proposed algorithms work well and their performance is significant.

1. Introduction

One of the first steganography methods for JPEG images embeds data by changing the least-significant bit values of the quantized discrete cosine transform (DCT) coefficients. However, this method can easily be detected by a statistical analysis. Thus, for a good while, evading the statistical analysis has been a major concern. Provos [1] divides the DCT coefficients into two disjoint subsets, hides data into the first subset, and compensates the distorted histogram by modifying the second subset. Other methods in [2, 3] use a similar approach. On the other hand, Solanki et al. [4] utilize the robust watermarking scheme for steganography purposes. They embed data into image in the spatial domain by using a technique robust against JPEG compression. Their scheme provides less degradation onto the features of the DCT coefficients, and, as a result, its detectability was low against old version of the statistical steganalysis.

Another way to survive against steganalysis is reducing the number of modified coefficients. Traditionally, each nonzero DCT coefficient has been modified. As a result, embedding capacity is as much as the number of nonzero DCT coefficients. However, the maximum possible embedding capacity trades off the detectability. Westfeld [5] has used a matrix encoding (ME) technique to lower detectability by sacrificing the embedding capacity. The ME technique exploits the Hamming code which is designed for error correction. His scheme hides many bits by flipping at most one coefficient in each block. This approach was the first instance of using the error correcting code for data hiding.

Fridrich et al. [6–13] use the concept of the "minimal distortion" to enhance the security (i.e., by reducing distortion). The perturbed quantization steganography utilizes the wet paper coding.

Later, Kim et al. [14] have improved the performance of the ME by reducing the distortion impact. In fact, their modified matrix encoding (MME) method changes more number of coefficients compared to the ME. However, they show that the distortion impact after modifying one coefficient may be larger than that after modifying two coefficients. Thus, it is obvious that modifying one coefficient or two per block may have less distortion and lower detectability against the steganalysis. Note that MME requires the original uncompressed image for data hiding, but not for decoding.

Schönfeld and Winkler [15] have proposed a new way to hide data using more powerful error correction code. They use a structured Bose-Chaudhuri-Hochquenghem (BCH) code [2]. Zhang et al. [16] have significantly improved the original BCH-based data hiding scheme. Their improved method can easily find the flip positions and defeat the steganalysis well compared to the existing methods. Later, Sachnev et al. [17] apply a heuristic optimization technique for the data hiding scheme over the BCH coding and modify the stream of the input DCT coefficients to reduce the distortion. Their method considerably outperforms the steganography method proposed by Zhang et al. [16].

Recently, Filler and Fridrich [18] have proposed a remarkable framework which minimizes a distortion measure as a weighted norm of the difference between cover and stego feature vectors. In their approach, the distortion is not necessarily an additive function over the pixels because the features may contain higher-order statistics such as sample transition probability matrices of pixels or DCT coefficients modeled as Markov chains [19–21]. When the distortion measure is defined as a sum of local potentials, practical near-optimal embedding methods can be implemented with syndrome-trellis codes [22].

Most of the above-mentioned steganographic methods use the nonoverlapping blocks of the DCT coefficients for hiding secret message. Such a blockwise embedding scheme divides both the stream of the DCT coefficients and hidden message into the separate blocks and solves the equations for hiding data for each block individually. Recent methods like MME [14], BCH-based steganography methods [15–17] may produce several alternative solutions. Thus, such a data hiding method can choose a solution with the lowest distortion impact. Past investigation over the BCH data hiding scheme finds that BCH usually allows redundant number of possible solutions. It means that a solution with acceptable distortion impact can be achieved from the reduced set of possible solutions. Hence, the embedding efficiency of the BCH steganographic methods can be increased by reducing the number of possible solutions and keeping similar distortion impact compared to the original approach.

In the proposed method, two blocks of the DCT coefficients form a combined block sharing common coefficients in the intersected part between two consecutive blocks. Such a design achieves high embedding efficiency by hiding data twice into the intersected area. The number of possible joint solutions for both blocks (i.e., solutions which valid for both blocks) is always smaller than the number of all possible solutions for two independent blocks. The reduced number of possible solutions can increase distortion, but not significantly. Besides, the number of possible solutions can easily be controlled by changing size of the intersected area. The smaller size of the intersected area, the larger number of possible joint solutions. Similar approach has been tested for Hamming code in [23].

However, the higher size of the intersected area, the higher embedding efficiency of the proposed method. In the proposed method, the block of the DCT coefficients can be modified by inserting new nonzero coefficients 1 or -1, or removing coefficients 1 or -1. Such modification is carried out carefully and sophisticatedly in order to reduce distortion caused by excessive hiding.

The rest of the article is organized as follows. Section 2 explains the details of the BCH coding. Section 3 presents the BCH-based modified data hiding scheme. In Section 4, we propose the inserting-removing strategy. The encoder and decoder are presented in Section 5. Section 6 provides the experimental results. Finally, Section 7 concludes the article.

2. BCH syndrome coding

The BCH codes are the well known and widely used family of the error correction codes. BCH code (n, k, t) can correct t bits by inserting n - k additional bits to the original message k such that syndrome of resulted n bits is equal to 0. In general, BCH codes were invented for error correction and cannot directly be used for data hiding. An efficient method of using powerful BCH codes for data hiding has been presented in [15–17].

2.1. BCH syndrome coding

The generalized parity-check matrix H for BCH coding is presented as follows:

Assume that the original stream of binary data is V = {v_{0}, v_{1}, v_{2}, ..., v_{n-1}}, and the modified stream of binary data after data hiding is R = {r_{0}, r_{1}, r_{2}, ..., r_{n-1}}. The streams V and R over GF(2^{m}) can be represented as V(x) = v_{0} + v_{1}·x + v_{2}·x^{2} + v_{3}·x^{3} + ⋯ + v_{n-1}·x^{n-1}, and R(x) = r_{0} + r_{1}·x + r_{2}·x^{2} + r_{3}·x^{3} + ⋯ + r_{n-1}·x^{n-1}, respectively.

The embedded message m can be computed as follows:

\mathit{m}=\mathbf{R}\cdot {\mathbf{H}}^{T}

(3)

Thus, the hiding message m to V requires to find R such that

\mathbf{R}\cdot {\mathbf{H}}^{T}=\mathit{m}

(4)

The difference between V and R shows the number and location of the elements in V to be flipped.

In this article, we utilized the method of Zhao et al. [24] based on the fast lookup tables for finding roots of quadratic and cubic polynomial of σ(x). Similar approach has been used in [16, 17].

2.3. Solutions

Hiding message m to the binary stream V requires to find the positions of the coefficients to be flipped. In this article, we used a method presented in [16, 17] to get one, two, three, or four flips solutions. The set of all possible solutions for one, two, three, or four flips has to be stored in the look up tables J_{1}, J_{2}, J_{3}, and J_{4}, respectively. The notation J_{3}(S) returns all three flips solutions for syndrome S = {S_{1}S_{2}}. Similarly, we can get all possible solutions for block n_{1} with syndrome S^{I}, for block n_{2} with syndrome S^{II}, as J^{I} = {J_{1}(S^{I}) J_{2}(S^{I}) J_{3}(S^{I}) J_{4}(S^{I})} and for block n_{2} with syndrome S^{II} as J^{II} = {J_{1}(S^{II}) J_{2}(S^{II}) J_{3}(S^{II}) J_{4}(S^{II})}, respectively. The look up tables' size is (2^{2·m}- 1) × nS where nS is a number of stored solutions.

3. Proposed data hiding scheme

In the proposed BCH data hiding scheme, we combine two BCH blocks of 2^{m} - 1 DCT coefficients into one, such that BCH blocks intersect each other. Figure 1 shows the block diagram of coefficients for the proposed scheme. In the presented example, (a_{1}, a_{2}, a_{3}, ..., a_{25}) is the combined block of the DCT coefficients; \left({v}_{1}^{\prime},{v}_{2}^{\prime},{v}_{3}^{\prime},\dots ,{v}_{15}^{\prime}\right) and \left({v}_{1}^{\u2033},{v}_{2}^{\u2033},{v}_{3}^{\u2033},\dots ,{v}_{15}^{\u2033}\right) are the corresponding binary coefficients for the BCH blocks n_{1} and n_{2}, respectively. Intersected area I covers five coefficients a_{11}, a_{12}, a_{13}, a_{14}, and a_{15} in this example. Such a scheme can hide more amounts of data by exploiting the intersected area using any kind of coding schemes.

One of the two main contributions of this article is to present a systematic algorithm for the joint solutions. The proposed BCH-based data hiding scheme requires to find a joint solution for both blocks n_{1} and n_{2} using the guidelines from Section 2.1 such that the intersected area does not affect the result. For example, let 8 bits be hidden into 15 coefficients from a_{1} to a_{15} using the BCH-based steganography. Then, another 8 bits can be hidden into the next block having another 15 coefficients from a_{11} to a_{25}. This is the traditional approach. As a result, 16 bits can be hidden into 30 coefficients. However, our new approach hides the same amount of data into 25 coefficients a_{1} to a_{25}. Eight bits are hidden into the coefficients from a_{1} to a_{15}, and another eight bits into the coefficients from a_{11} to a_{25}. Data hiding algorithm requires to find syndromes S^{I} and S^{II} (Equation 6) for each block n_{1} and n_{2}, respectively.

There are two possible ways for hiding data into the combined blocks. Either hiding data into the block n_{1} first, or into the block n_{2} first. The proposed algorithm for getting a joint solution is designed as follows:

1.

Hiding data into the block n_{2} first.

(a)

Some solutions for hiding data into the block n_{1} do not modify the coefficients in the intersected area. Thus, solutions for the block n_{2} have to be obtained using the original syndrome S^{II}. Some solutions are valid since they do not modify the coefficients in the intersected area. These solutions are called specified solutions.

(b)

Some solutions for the block n_{1} modify the coefficients in the intersected area. These modifications in the intersected area affect the syndrome for the block n_{2}. Thus, the new syndrome for the block n_{2} is obtained as S^{II} new. Some new solutions are valid since they do not modify the coefficients already modified by the n_{1} in the intersected area.

Among all possible solutions for the block n_{2} and new syndrome {S}_{new}^{\mathsf{\text{II}}} (in case of 1(a), {S}_{new}^{\mathsf{\text{II}}}={S}^{\mathsf{\text{II}}}), choose the solutions which do not have flipping positions in the intersected area (i.e., valid or specified solutions). Thus, the joint solutions for a combined block unify the solutions for the block n_{1} and its syndrome S^{I} and the specified solutions for the block n_{2} and its syndrome S^{II} new.

2.

Hiding data into the block n_{2} first.

(a)

Some solutions for hiding data into the block n_{2} do not modify the coefficients in the intersected area. Thus, solutions for the block n_{1} have to be obtained using the original syndrome S^{I} . Some solutions are valid since they do not modify the coefficients in the intersected area.

(b)

Some solutions for the block n_{2} modify the coefficients in the intersected area. These modifications in the intersected area affect the syndromes for the block n_{1}. Thus, the new syndrome for the block n_{1} is obtained as {S}_{new}^{\mathsf{\text{I}}}. Some new solutions are valid since they do not modify the coefficients already modified by the n_{2} in the intersected area.

The joint solutions for a combined block unify the solutions for the block n_{2} and its syndrome S^{II} and the specified solutions for the block n_{1} and its syndrome S^{I} new (in case of 2(a), {S}_{new}^{\mathsf{\text{I}}}={S}^{\mathsf{\text{I}}}).

In general, the proposed modified BCH data hiding schemes hides 4·m bits of data to the block of 2·(2^{m}-1)-|I| by using the BCH scheme (2^{m}-1, k, 2) for blocks n_{1} and n_{2}.

The proper BCH-based data hiding scheme needs a suitable parameter m for hiding message M into the stream of N nonzero DCT coefficients. The parameter m can be obtained as follows:

where m defines the proper BCH-based scheme for the proposed method, N is the number of nonzero DCT coefficients, M is the hidden message, n^{p} = 2·(2^{m}-1)-|I| is the size of the combined block, 4·m is the capacity of the combined block.

3.1. Data hiding algorithm

The proposed method requires to find the solution for two blocks n_{1} and n_{2} for hiding two messages m_{1} and m_{2} together such that

where R_{1} and R_{2} are the modified streams of the binary coefficients obtained from n_{1} and n_{2} (see Figure 1); H is a parity-check matrix from Equation (1).

Note that, hiding message m_{1} to block n_{1} modifies the block n_{2} and vice versa, due to the intersected part. Hence, we need proper positions to flip by solving Equation (9) for correct decoding.

Among all possible solutions, the proposed method unifies the solutions for blocks n_{1} and n_{2}, such that the flip positions cover only nonintersected area for both blocks (i.e., {J}_{s}^{\mathsf{\text{I}}}={J}^{\mathsf{\text{I}}}\notin I and {J}_{s}^{\mathsf{\text{II}}}={J}^{\mathsf{\text{II}}}\notin I, for blocks n_{1} and n_{2}). In other words, it is desirable to hide data into the block n_{1} using the solutions from {J}_{s}^{\mathsf{\text{I}}} that do not affect the block n_{2}, and vice versa. According to the above explanation, {J}_{s}^{\mathsf{\text{I}}} and {J}_{s}^{\mathsf{\text{II}}} unify the specified solutions for the blocks n_{1} and n_{2}, respectively. Here, note that superscript indexes X^{I} and X^{II} present different items for blocks n_{1} and n_{2}, respectively.

However, even if some flip positions j from the block n_{1} belong to the intersected area I. Thus, we can consider the effect of those j to get a new solutions for the block n_{2} and vice versa.

For this purpose, Equation (9) can be rewritten as follows:

where {S}^{\mathsf{\text{II}}}=\left\{{S}_{1}^{\mathsf{\text{II}}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{S}_{2}^{\mathsf{\text{II}}}\right\} is the syndrome for blocks n_{2}; {S}_{new}^{\mathsf{\text{II}}}=\left\{{P}_{1}^{\mathsf{\text{II}}}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{P}_{2}^{\mathsf{\text{II}}}\right\} is a new syndrome for blocks n_{2} after hiding data to block n_{1}; l is the number of the flip positions (j_{1}, ..., j_{
l
}) from the block n_{1} belonged to the intersected area I (i.e., j_{1}, ..., j_{
l
} = J^{I} (S^{I} ) ∈I); and the values β_{1}, ..., β_{
l
}are computed using Equation (13) for the flipping positions \left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)=F\left({j}_{1},\dots ,{j}_{l}\right) from the intersected area I for the block n_{2}. Function F converts indexes (j_{1}, ..., j_{
l
}) of the intersected area from the block n_{1} to the corresponding indexes \left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right) from the block n_{2}. For example, solution for the block n_{1} illustrated in Figure 1 is {J}^{\mathsf{\text{I}}}\left({S}^{\mathsf{\text{I}}}\right)=\left[\begin{array}{cc}\hfill 3\hfill & \hfill 11\hfill \end{array}\right]. j_{1} = 11 ∈I, where index 1 means the first coefficient form the intersected area I. Coefficient j_{1} = 11 is located in the 11th position of the combined block. However, 11th coefficient in the combined block is the 15th coefficient in the block n_{2} (i.e., F\left({j}_{1}\right)=F\left(11\right)={j}_{1}^{\prime}=15 see Figure 1). Thus, even if the flip positions for blocks n_{1} and n_{2} are different (i.e., j_{1} = 11 and {j}_{1}^{\prime}=15), those coefficients have the same location in the combined block.

Finally, the solution for the block n_{2} can be obtained as \left\{{j}_{1}^{\prime},\dots ,{j}_{l}^{\prime},\phantom{\rule{0.3em}{0ex}}{J}_{s}^{\mathsf{\text{II}}}\left({S}_{new}^{\mathsf{\text{II}}}\right)\right\}. Presented solution sufficiently hides message m_{2} into the block n_{2}.

The joint solution hides both messages m_{1} and m_{2} into the combined blocks. The joint solution \left\{{J}^{\mathsf{\text{I}}}\left({S}^{\mathsf{\text{I}}}\right),\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{J}_{s}^{\mathsf{\text{II}}}\left({S}_{new}^{\mathsf{\text{II}}}\right)\right\} unifies the solutions for the blocks n_{1} and n_{2}. In this example, the flipping positions from the intersected area are the part of J^{I}(S^{I}).

Similarly, we can get a joint solution by using the current solution for block n_{2} (i.e., J^{II}(S^{II})). For this purpose, Equation (9) can be rewritten again as follows:

where {S}^{\mathsf{\text{I}}}=\left\{\begin{array}{cc}\hfill {S}_{1}^{\mathsf{\text{I}}}\hfill & \hfill {S}_{2}^{\mathsf{\text{I}}}\hfill \end{array}\right\} is the syndrome for blocks n_{1}; {S}_{new}^{\mathsf{\text{II}}}=\left\{\begin{array}{cc}\hfill {P}_{1}^{\mathsf{\text{II}}}\hfill & \hfill {P}_{2}^{\mathsf{\text{II}}}\hfill \end{array}\right\} is the new syndrome of the block n_{1} after hiding data to block n_{2}; l is the number of flip positions \left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right) for the block n_{2} belonged to the intersected area I (i.e., \left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right)={J}^{\mathsf{\text{II}}}\left({S}^{\mathsf{\text{II}}}\right)\in I); β_{1}, ..., β_{
l
} are computed using Equation (15) for the flipping positions \left({j}_{1},...,{j}_{l}\right)={F}^{-1}\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right) from the intersected area I for the block n_{1}; function F^{-1} (i.e., the inverse function of F) converts the indexes of the coefficients of intersected area \left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right) from the block n_{2} to the corresponding indexes (j_{1}, ..., j_{
l
}) from the block n_{1}. For example, if {J}^{\mathsf{\text{II}}}\left({S}^{\mathsf{\text{II}}}\right)=\left[\begin{array}{cc}\hfill 1\hfill & \hfill 15\hfill \end{array}\right], then {j}_{1}^{\prime}=15\in I, then {j}_{1}={F}^{-1}\left({j}_{1}^{\prime}\right)={F}^{-1}\left(15\right)=11 (see Figure 1).

The solution for the block n_{1} can be obtained as \left\{\left({j}_{1},...,{j}_{l}\right)\phantom{\rule{0.3em}{0ex}}{J}_{s}^{\mathsf{\text{I}}}\left({S}_{new}^{\mathsf{\text{I}}}\right)\right\}. Presented solution sufficiently hides message m_{1} into the block n_{1}.

Joint solution for hiding both messages m_{1} and m_{2} is \left\{{J}_{s}^{\mathsf{\text{I}}}\left({S}_{new}^{\mathsf{\text{I}}}\right)\phantom{\rule{0.3em}{0ex}}{J}^{\mathsf{\text{II}}}\left({S}^{\mathsf{\text{II}}}\right)\right\}. Here, the flipping positions from the intersected area \left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right) are the part of J^{II}(S^{II}). Corresponding flipping positions \left({j}_{1},...,{j}_{l}\right)={F}^{-1}\left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right) are the part of the solution for the block n_{1}.

Note that there are several solutions in J^{I} and J^{II} for syndromes S^{I} and S^{II}, respectively. Presented method may generate one joint solution for each solution from J^{I}(S^{I}) and J^{II}(S^{II}).

The proposed method requires to find values β from the flip positions (j_{1}, ..., j_{
l
}) or \left({j}_{1}^{\prime},\dots ,{j}_{l}^{\prime}\right). The relationship between β and flip position j is presented as follows:

The complete procedure for getting all possible joint solutions for any syndromes is presented as follows:

For a given combined block of binary coefficients a and two messagesm_{1}andm_{2}process follows:

(a)

Define two blocks of the DCT coefficients n_{1} and n_{2} (see Figure 1). Compute syndromes S^{I} and S^{II} using corresponding binary streams v' and v".

(b)

Find all possible solutions j^{I} = J^{I}(S^{I}) and j^{II} = J^{II}(S^{II}) for blocks n_{1} and n_{2} by using the syndromes S^{I} and S^{II}.

(c)

For each solution j^{I}(p) (p = 1, 2, 3,..,k, where k is the number of solutions) process follows:

i.

Define flip positions j_{1}, ..., j_{
l
} from the intersected area I.

ii.

Convert j_{1}, ..., j_{
l
} to {j}_{1}^{\prime},\dots ,{j}_{l}^{\prime} (corresponding flip positions from the block n_{2}). Compute corresponding β using Equation 13. Compute new syndrome {S}_{new}^{\mathsf{\text{II}}} using Equation 10.

iii.

Using a new syndrome {S}_{new}^{\mathsf{\text{II}}} get new flips solutions as {j}_{new}^{\mathsf{\text{II}}}={J}_{s}^{\mathsf{\text{II}}}\left({S}_{new}^{\mathsf{\text{II}}}\right).

iv.

For each solution {j}_{new}^{\mathsf{\text{II}}}\left(q\right) (q = 1, 2, 3,...,z, where z is the number of solutions) store the joint solution: \left\{{j}^{\mathsf{\text{I}}}\left(p\right),{j}_{new}^{\mathsf{\text{II}}}\left(q\right)\right\}.

(d)

For each solution j^{II} (p) (p = 1, 2, 3,...,k) process follows:

i.

Define flip positions {j}_{1}^{\prime},\dots ,{j}_{l}^{\prime} from the intersected area I for block n_{2}.

ii.

Convert {j}_{1}^{\prime},\dots ,{j}_{l}^{\prime} to j_{1}, ..., j_{
l
} Compute corresponding β. Compute new syndrome {S}_{new}^{\mathsf{\text{I}}} using Equation 11.

iii.

Using a new syndrome {S}_{new}^{\mathsf{\text{I}}} get new flips solutions as {j}_{new}^{\mathsf{\text{I}}}={J}_{s}^{\mathsf{\text{I}}}\left({S}_{new}^{\mathsf{\text{I}}}\right).

iv.

For each solution {j}_{new}^{\mathsf{\text{I}}}\left(q\right) (q = 1, 2, 3,...,z, where z is the number of solutions) store the joint solution: \left\{{j}_{new}^{\mathsf{\text{I}}}\left(q\right),{j}^{\mathsf{\text{II}}}\left(p\right)\right\}.

The stored joint solutions are used further to hide data with better performance. Note that the proposed method needs to search the best solution among k·q possible candidates for each block (see steps c and d). Thus, computational complexity of the proposed search algorithm is O(n^{2}).

3.2. Two-stage embedding technique

In order to enhance the performance of the blockwise methods (i.e., ME, MME, BCH-based data hiding, etc.), we utilize almost all the DCT coefficients for data hiding. The proposed method uses two different embedding schemes together. Two schemes use the different block sizes {n}_{1}^{p} and {n}_{2}^{p}, and have different payloads {m}_{1}^{p} and {m}_{2}^{p}.

This method divides the stream of the DCT coefficients (c_{1}, c_{2}, ..., c_{
N
}) and the message M into two parts and hides data into each part separately. The optimal number of the blocks (k_{1} and k_{2}) for both schemes can be computed as follows:

The relation between the numbers of blocks for the schemes 1 and 2 is presented as follows:

The computed {k}_{1}^{\prime} and {k}_{2}^{\prime} are noninteger numbers. Thus, we have to choose the nearest integers {k}_{1}=\lceil {k}_{1}^{\prime}\rceil \pm 1 and {k}_{2}=\lceil {k}_{2}^{\prime}\rceil \pm 1 such that:

The presented two-scheme embedding method improves the performance of data hiding by using the proper distribution of the available DCT coefficients among two different modified BCH schemes. First scheme uses {m}_{1}^{p}=4\cdot m obtained from inequality (8), the second scheme uses {m}_{p}^{2}=4\cdot \left(m+1\right). Note that the second scheme has higher embedding efficiency. The efficiency of the two schemes embedding refers to the ratio between number of blocks k_{1} and k_{2} for the schemes 1 and 2, respectively. The larger the value k_{1} (smaller ratio k_{1}/k_{2}), the higher efficiency of the proposed two schemes embedding for the same m.

The two-scheme embedding method enables to use different sizes of the intersected area for both schemes I_{sh 1}and I_{sh 2}, respectively (see Tables 1 and 2 We test several sizes of the intersected areas and several payloads. In the experiments, we try to hide data into a set of 4,000 natural images and compute performance against the steganalysis [20, 25] for different sizes of the intersected areas and payloads. Results are presented in Tables 1, 2, and 3.

Boldface numbers in Tables 1 and 2 link to the lowest accuracy and show the most appropriate intersected area size for each tested payload. Data hiding by using the most appropriate intersected area always shows better results. Tables 1 and 2 also indicate a difference between the proposed method and the original BCH-based steganography method [16] in terms of performance of the steganalysis [20, 25]. The most appropriate intersected area size presented in Table 3 was used later for other experiments.

4. Inserting-removing strategy

The performance of the proposed method can significantly be increased by using inserting-removing strategy. The proposed strategy is based on fact that the block of the 2^{m} -1 DCT coefficients can be modified before data hiding by inserting or removing coefficients 1 and -1. Data hiding to modified stream of DCT coefficients may result lower distortion and, as a result, lower detectability of the steganalysis. Such a modification has to be carried out carefully and sophisticatedly in order to reduce distortion.

The proposed inserting-removing strategy uses the stream of nonrounded quantized DCT coefficients a_{
q
} computed as follows:

where B is the 8 × 8 block of the image pixels; a' is the block of original DCT coefficients; a_{
q
} is the block of DCT coefficients divided by corresponding coefficients from quantization matrix Q; a_{
r
} is the block of quantized DCT coefficients; Q_{
f
} is a quality factor.

Each nonzero integer DCT coefficient has a corresponding informative bit computed as follows:

According to the proposed inserting-removing strategy, the stream a of nonrounded DCT coefficients obtained from the blocks a_{
q
} is divided into three sets: modifiable c_{
m
} = a∈ (-∞; -1.5) ∪ (1.5;∞), removable c_{
R
} = a∈ [-1.5; -0.5) ∪ (0.5;1.5], and insertable c_{
Ins
} = a∈ [-0.5; -0.25) ∪ (0.25;0.5]. Set c unifies modifiable, insertable, and removable sets (i.e., c = c_{
m
}∪c_{
R
}∪c_{
Ins
}). The set C = c_{
m
}∪c_{
R
} contains all nonzero rounded DCT coefficients. According to Equation (17), only the nonzero DCT coefficients (i.e., set C) have the corresponding informative coefficients and can be used for hiding data.

The proposed steganographic method uses the stream of n_{
p
} nonzero DCT coefficients from the set C for data hiding. In general, set C is the subset of the unified set c. Thus, each block unifies the n_{
p
} coefficients form set C and some insertable coefficients from the set c (i.e., {c}_{b}={c}_{m}^{\prime}\cup {c}_{R}^{\prime}\cup {c}_{Ins}^{\prime}, where {C}^{\prime}={c}_{m}^{\prime}\cup {c}_{R}^{\prime} is the block of n_{
p
} nonzero DCT coefficients from the set C). Inserting or removing of any coefficients from {c}_{Ins}^{\prime} and {c}_{R}^{\prime} produces a new block C' with new solution for data hiding. As a result, inserting-removing strategy significantly increases the number of possible solutions and helps to find the most appropriate solution with the lowest distortion.

In the proposed improved matrix encoding, we use the same measure for computing distortion similar to MME [14]. The distortion for each DCT coefficient is computed as follows:

The encoder of the proposed steganographic method based on modified BCH data hiding scheme and inserting-removing strategy is organized as follows:

For a given bitmap image I_{
m
}, payload P, quality factor Q_{
f
}, and secret key K process follows:

1.

Divide image I_{
m
} into nonoverlapped 8 × 8 blocks of pixels and process DCT, quantization and rounding as presented in (16). Remove DC coefficients. Obtain a', a_{
q
}, a_{
r
} , and streams of DCT coefficients a. Permute stream a using K and any pseudo-random generator. Obtain stream c = a∈ (-∞; -0.25) ∪ (0.25;∞) from the permuted stream a.

2.

Define sets: modifiable c_{
m
} , insertable c_{
Ins
} , and removable c_{
R
} .

3.

Define parameters for schemes 1 and 2, and number of the blocks k_{1} and k_{2} using (14) and (15). Divide message M into two parts: {M}_{1}={m}_{1}^{p}\cdot {k}_{1} and {M}_{2}={m}_{2}^{p}\cdot {k}_{2}.

4.

Start from the first block i = 1. Define the i th block of the DCT coefficients {c}_{{b}_{i}}={c}_{{m}_{i}}^{\prime}\cup {c}_{{R}_{i}}^{\prime}\cup {c}_{{Ins}_{i}}^{\prime}, where {c}_{{m}_{i}}^{\prime}, {c}_{{R}_{i}}^{\prime}, and {c}_{{Ins}_{i}}^{\prime} are the modifiable, removable, and insertable subsets for the current block. If i = k_{1} +1 switch to the scheme 2.

5.

Define the block of nonzero rounded DCT coefficients {\mathsf{\text{C}}}_{\mathsf{\text{i}}}^{\prime}={c}_{{m}_{i}}^{\prime}\cup {c}_{Ri}^{\prime}.

6.

Get the solutions for the block {C}_{i}^{\prime} using the modified BCH data hiding scheme (see the algorithm in Section 3). Compute the distortion D for each solution using Equation (20). Choose solution J_{
m
} with the lowest distortion D_{
m
} and store it.

7.

Modify the block {C}_{i}^{\prime} by inserting or removing coefficients from the subsets {c}_{{R}_{i}}^{\prime}, and {c}_{{Ins}_{i}}^{\prime}. Obtain a new block: (i) after removing {C}_{i}^{\prime}={c}_{{m}_{i}}^{\prime}\cup {c}_{{R}_{i}}^{\u2033}, where {c}_{{R}_{i}}^{\u2033}={c}_{{R}_{i}}^{\prime}-{c}_{{R}_{i}}^{\prime}\left(p\right) is the modified removable set and {c}_{{R}_{i}}^{\prime}\left(p\right) is the removed coefficient; (ii) after inserting {C}_{i}^{\prime}={c}_{{m}_{i}}^{\prime}\cup {c}_{{R}_{i}}^{\prime}\cup {c}_{{Ins}_{i}}^{\prime}\left(q\right), where {c}_{{Ins}_{i}}^{\prime}\left(q\right)=\pm 1 is the inserted coefficient. p and q are the current position for insertion and removing.

8.

Repeat steps 5-6 for all insertable and removable coefficients from {c}_{{R}_{i}}^{\prime}, and {c}_{{Ins}_{i}}^{\prime}.

9.

Among all stored solutions J_{
m
} choose solution with the lowest distortion D_{
m
} . Modify one, two, or three coefficients according to the best solution (see explanation in Section 2) and, if necessary, insert or remove coefficient in the block {c}_{{b}_{i}}.

10.

Process all k_{1} + k_{2} blocks using steps 4-9. Obtain the modified stream {c}^{\prime}=\left\{{c}_{{b}_{1}},{c}_{{b}_{2}},\dots ,{c}_{{b}_{{k}_{2}+{k}_{2}}}\right\}.

11.

Recover the original sequence order of the DCT coefficients a from the modified stream c' using the secret key K and utilized pseudo-random generator. Add DC coefficients, round the coefficients a', and obtain the modified JPEG image {I}_{m}^{\prime}.

The decoder of the proposed steganographic method is organized as follows:

For the given modified JPEG image{I}_{m}^{\prime}, quality factor Q_{
f
}, secret key K, and size of the payload p = |P| process follows:

1.

Read the DCT coefficients from the JPEG file. Permute them using the secret key K and utilized pseudo-random generator. Remove the DC coefficients. Obtain the stream of nonzero DCT coefficients C.

2.

Using Equations (15) and (16) define parameters of the schemes 1 and 2, and the number of blocks k_{1} and k_{2}. Here, N = |C|.

3.

Divide C into the blocks according to the k_{1} and k_{2}.

4.

Decode data from each block using (9).

The steganographic method based only on modified BCH data hiding scheme skips the steps 7 and 8.

6. Experimental results

In these experiments, we try to hide different amount of data into the set of uncompressed images using the proposed BCH-based data hiding scheme with and without the inserting-removing strategy. The set of modified and original compressed images is analyzed by two powerful steganalysis algorithm proposed by Pevny and Fridrich [20] and Kodovsky and Fridrich [25]. Those methods use 274 and 548 different features of the DCT coefficients, respectively. The union of the 274 or 548 features from the unmodified and modified images are used for making the models for the support vector machine (SVM) with parameter C = 10^{4} and kernel width γ = 10^{-4}. A set of 4,000 natural uncompressed images (768*512) downloaded from Corel Draw and obtained from several digital cameras is used in our experiments. Proposed method needs 1-5 min for hiding data to each image. Experiments are carried out for seven different payloads (0.05, 0.1, 0.15, 0.17, 0.20, 0.22, and 0.25 bits per nonzero coefficient--bpc) and quality factor 75. SVM training process needs a set of 3,000 images (1,500 original and 1,500 stego images) for 7 different payload sizes. The SVM engine tests for 7 obtained models to test a set of 1,000 images (500 original and 500 stego) for 7 different payload sizes. The result shows the error probabilities of the steganalysis for each tested payload (see Figures 2 and 3).

The error probability is computed as follows:

e=\frac{1}{2}\left({P}_{a}+{P}_{b}\right),

(28)

where P_{
a
} is the probability of misdetection (i.e., the unmodified image is classified as modified) and P_{
b
} is the probability of misclassification (i.e., the modified image is classified as unmodified).

In our experiments, we test both methods: (1) based only on the modified BCH-based data hiding scheme; and (2) the modified BCH-based data hiding scheme with the proposed inserting-removing strategy. The proposed methods achieve high error probability for all the tested payloads. For payloads up to 0.1 bpc, both methods have detectability close to 50%, meaning that the steganalysis cannot distinguish the unmodified images from the modified. This probability is almost equal to that of the coin toss. For higher payloads around 0.15 and 0.2 bpc, the proposed methods show much better performance compared to the MME. Significant improvement over the MME is justified on the fact of using methods with larger embedding efficiency (i.e., the BCH-based schemes with large m). The proposed method also shows better results compared to the methods based on the original BCH-based schemes. Hence, the proposed method with the inserting-removing strategy shows the significant improvement over the method with modified BCH-based data hiding scheme only, by 0.0363, 0.0414, and 0.0392 points in terms of error probabilities for payloads 0.15, 0.2, and 0.25, respectively. For payload of 0.25 bpc, both methods show 0.2961 and 0.3353 of the error probability. The error probabilities are better than those of the MME [14], original BCH-based [16], heuristic BCH-based scheme [17], and syndrome trellis code STC [22] proposed by Kodovsky and Fridrich. Such improvement was achieved by using modified BCH-based data hiding and unique inserting-removing strategy.

7. Conclusion

In this article, an efficient data hiding technique for steganography is presented. The proposed BCH-based data hiding scheme uses two blocks to form a single combined block. A new data hiding strategy enables to get a joint solution for two blocks with intersected coefficients. Due to intersection, the proposed method requires small number of coefficients for hiding the same amount of data compared with the original nonoverlapping blockwise approaches. As a result, the proposed method can use the BCH-based schemes with large m (i.e., lager capacity). Even though the proposed method requires to use the same BCH-based scheme (for 0.17 and 0.2 bpc), the efficiency of data hiding is still high because the proposed two-scheme embedding has a lower ratio k_{1}\k_{2} compared to the original BCH-based scheme. The proposed BCH-based data hiding scheme significantly outperforms the MME and original BCH-based steganography in terms of the error probabilities and accuracy against the steganalysis. The proposed two-scheme embedding technique (see Equations 14 and 15) enables to use almost all the available DCT coefficients. The proposed strategy based on inserting and removing coefficients 1 or -1 increases the number of possible solutions and significantly decreases the total distortion. The experimental results show that the inserting-removing strategy significantly improves the performance of the proposed method. The combination of the modified BCH-based and the inserting-removing strategy achieves higher error probabilities and lower accuracy against the powerful steganalysis.

References

Provos N: Defending against statistical steganalysis. In Proc of 10th USENIX Security Symposium. Washington, DC; 2001:24-24.

Fridrich J: Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes. Lect Notes Comput Sci 2005, 3200: 67-81.

Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography using wet paper codes. In Proc of ACM Workshop on Multimedia and Security. Magdeburg, Germany; 2004:4-15.

Fridrich J, Pevny T, Kodovsky J: Statistically undetectable JPEG steganography: dead ends, challenges, and opportunities. In Proc of ACM Workshop on Multimedia and Security. Dallas, TX; 2007:3-15.

Sachnev V, Kim HJ, Zhang R: Less detectable JPEG steganography method based on heuristic optimization and BCH syndrome coding. In Proc of ACM Workshop on Multimedia and Security. Princeton, NJ; 2009:131-139.

Zhao Z, Wu F, Yu S, Zhou J: A lookup table based fast algorithm for finding roots of quadratic or cubic polynomials in the GF(2^{m}). J Huazhong Univ Sci Technol (Nat Sci Ed.) 2005, 33(1):70-73.

This study was supported by the Catholic University of Korea, National Research Foundation of Korea (grant 2011-0013695), ITRC and BK21 Project, Korea University and IT R&D program (Development of anonymity-based u-knowledge security technology, 2007-S001-01).

Author information

Authors and Affiliations

School of Information, Communications, and Electronic Engineering, The Catholic University of Korea, Bucheon, 420-743, Republic of Korea

Vasily Sachnev

CIST, Korea University, Seoul, 136-701, Republic of Korea

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Sachnev, V., Kim, H.J. Modified BCH data hiding scheme for JPEG steganography.
EURASIP J. Adv. Signal Process.2012, 89 (2012). https://doi.org/10.1186/1687-6180-2012-89