Modified BCH data hiding scheme for JPEG steganography

Sachnev, Vasily; Kim, Hyoung Joong

doi:10.1186/1687-6180-2012-89

Research
Open access
Published: 26 April 2012

Modified BCH data hiding scheme for JPEG steganography

Vasily Sachnev¹ &
Hyoung Joong Kim²

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 89 (2012) Cite this article

4266 Accesses
8 Citations
Metrics details

Abstract

In this article, a new Bose-Chaudhuri-Hochquenghem (BCH)-based data hiding scheme for JPEG steganography is presented. Traditional data hiding approaches hide data into each block, where all the blocks are not overlapping each other. However, in the proposed method, two consecutive blocks can be overlapped to form a combined block which is larger than a single block, but smaller than two consecutive nonoverlapping blocks in size. In order to embed more amounts of data into the combined block than a single block, the BCH-based data hiding scheme has to be redesigned. In this article, we propose a way to get a joint solution for hiding data into two blocks with intersected coefficients such that any modification of the intersected area does not affect the data hiding process into both blocks. Due to hiding more amounts of data into the intersected area, embedding capacity is increased. On the other hand, the nonzero DCT coefficient stream is modified to achieve better steganalysis and to reduce the distortion impact after data hiding. This approach carefully inserts or removes 1 or -1 coefficients into or from the DCT coefficient stream according to the rule proposed in this article. Experimental results show that the proposed algorithms work well and their performance is significant.

1. Introduction

One of the first steganography methods for JPEG images embeds data by changing the least-significant bit values of the quantized discrete cosine transform (DCT) coefficients. However, this method can easily be detected by a statistical analysis. Thus, for a good while, evading the statistical analysis has been a major concern. Provos [1] divides the DCT coefficients into two disjoint subsets, hides data into the first subset, and compensates the distorted histogram by modifying the second subset. Other methods in [2, 3] use a similar approach. On the other hand, Solanki et al. [4] utilize the robust watermarking scheme for steganography purposes. They embed data into image in the spatial domain by using a technique robust against JPEG compression. Their scheme provides less degradation onto the features of the DCT coefficients, and, as a result, its detectability was low against old version of the statistical steganalysis.

Another way to survive against steganalysis is reducing the number of modified coefficients. Traditionally, each nonzero DCT coefficient has been modified. As a result, embedding capacity is as much as the number of nonzero DCT coefficients. However, the maximum possible embedding capacity trades off the detectability. Westfeld [5] has used a matrix encoding (ME) technique to lower detectability by sacrificing the embedding capacity. The ME technique exploits the Hamming code which is designed for error correction. His scheme hides many bits by flipping at most one coefficient in each block. This approach was the first instance of using the error correcting code for data hiding.

Fridrich et al. [6–13] use the concept of the "minimal distortion" to enhance the security (i.e., by reducing distortion). The perturbed quantization steganography utilizes the wet paper coding.

Later, Kim et al. [14] have improved the performance of the ME by reducing the distortion impact. In fact, their modified matrix encoding (MME) method changes more number of coefficients compared to the ME. However, they show that the distortion impact after modifying one coefficient may be larger than that after modifying two coefficients. Thus, it is obvious that modifying one coefficient or two per block may have less distortion and lower detectability against the steganalysis. Note that MME requires the original uncompressed image for data hiding, but not for decoding.

Schönfeld and Winkler [15] have proposed a new way to hide data using more powerful error correction code. They use a structured Bose-Chaudhuri-Hochquenghem (BCH) code [2]. Zhang et al. [16] have significantly improved the original BCH-based data hiding scheme. Their improved method can easily find the flip positions and defeat the steganalysis well compared to the existing methods. Later, Sachnev et al. [17] apply a heuristic optimization technique for the data hiding scheme over the BCH coding and modify the stream of the input DCT coefficients to reduce the distortion. Their method considerably outperforms the steganography method proposed by Zhang et al. [16].

Recently, Filler and Fridrich [18] have proposed a remarkable framework which minimizes a distortion measure as a weighted norm of the difference between cover and stego feature vectors. In their approach, the distortion is not necessarily an additive function over the pixels because the features may contain higher-order statistics such as sample transition probability matrices of pixels or DCT coefficients modeled as Markov chains [19–21]. When the distortion measure is defined as a sum of local potentials, practical near-optimal embedding methods can be implemented with syndrome-trellis codes [22].

Most of the above-mentioned steganographic methods use the nonoverlapping blocks of the DCT coefficients for hiding secret message. Such a blockwise embedding scheme divides both the stream of the DCT coefficients and hidden message into the separate blocks and solves the equations for hiding data for each block individually. Recent methods like MME [14], BCH-based steganography methods [15–17] may produce several alternative solutions. Thus, such a data hiding method can choose a solution with the lowest distortion impact. Past investigation over the BCH data hiding scheme finds that BCH usually allows redundant number of possible solutions. It means that a solution with acceptable distortion impact can be achieved from the reduced set of possible solutions. Hence, the embedding efficiency of the BCH steganographic methods can be increased by reducing the number of possible solutions and keeping similar distortion impact compared to the original approach.

In the proposed method, two blocks of the DCT coefficients form a combined block sharing common coefficients in the intersected part between two consecutive blocks. Such a design achieves high embedding efficiency by hiding data twice into the intersected area. The number of possible joint solutions for both blocks (i.e., solutions which valid for both blocks) is always smaller than the number of all possible solutions for two independent blocks. The reduced number of possible solutions can increase distortion, but not significantly. Besides, the number of possible solutions can easily be controlled by changing size of the intersected area. The smaller size of the intersected area, the larger number of possible joint solutions. Similar approach has been tested for Hamming code in [23].

However, the higher size of the intersected area, the higher embedding efficiency of the proposed method. In the proposed method, the block of the DCT coefficients can be modified by inserting new nonzero coefficients 1 or -1, or removing coefficients 1 or -1. Such modification is carried out carefully and sophisticatedly in order to reduce distortion caused by excessive hiding.

The rest of the article is organized as follows. Section 2 explains the details of the BCH coding. Section 3 presents the BCH-based modified data hiding scheme. In Section 4, we propose the inserting-removing strategy. The encoder and decoder are presented in Section 5. Section 6 provides the experimental results. Finally, Section 7 concludes the article.

2. BCH syndrome coding

The BCH codes are the well known and widely used family of the error correction codes. BCH code (n, k, t) can correct t bits by inserting n - k additional bits to the original message k such that syndrome of resulted n bits is equal to 0. In general, BCH codes were invented for error correction and cannot directly be used for data hiding. An efficient method of using powerful BCH codes for data hiding has been presented in [15–17].

2.1. BCH syndrome coding

The generalized parity-check matrix H for BCH coding is presented as follows:

H = [\begin{matrix} 1 & α & α^{2} & \dots & α^{n - 1} \\ 1 & (α^{3}) & {(α^{3})}^{2} & \dots & {(α^{3})}^{n - 1} \\ ⋮ & ⋮ \\ 1 & (α^{2 t - 1}) & {(α^{2 t - 1})}^{2} & \dots & {(α^{2 t - 1})}^{n - 1} \end{matrix}]

(1)

Let t be 2. Then, the parity-check matrix is expressed as follows:

H = [\begin{matrix} 1 & α & α^{2} & \dots & α^{n - 1} \\ 1 & (α^{3}) & {(α^{3})}^{2} & \dots & {(α^{3})}^{n - 1} \end{matrix}]

(2)

Assume that the original stream of binary data is V = {v₀, v₁, v₂, ..., v_n-1}, and the modified stream of binary data after data hiding is R = {r₀, r₁, r₂, ..., r_n-1}. The streams V and R over GF(2^m) can be represented as V(x) = v₀ + v₁·x + v₂·x² + v₃·x³ + ⋯ + v_n-1·x^n-1, and R(x) = r₀ + r₁·x + r₂·x² + r₃·x³ + ⋯ + r_n-1·x^n-1, respectively.

The embedded message m can be computed as follows:

m = R \cdot H^{T}

(3)

Thus, the hiding message m to V requires to find R such that

R \cdot H^{T} = m

(4)

The difference between V and R shows the number and location of the elements in V to be flipped.

R = V + E

(5)

or

E = x^{u_{1}} + x^{u_{2}} + x^{u_{3}} + \dots + x^{u_{l}},

where u = {u₀, u₁, u₂, ..., u_l} are the positions of the elements in V to be flipped in order to get R.

Using Equations (3) and (4), the syndrome S can be computed as follows:

S = m - V \cdot H^{T} = E \cdot H^{T} .

(6)

If t is 2, then

S = {[\begin{matrix} S_{1} & S_{2} \end{matrix}]}^{T} = E \cdot H^{T} .

(7)

2.2. Lookup tables

In this article, we utilized the method of Zhao et al. [24] based on the fast lookup tables for finding roots of quadratic and cubic polynomial of σ(x). Similar approach has been used in [16, 17].

2.3. Solutions

Hiding message m to the binary stream V requires to find the positions of the coefficients to be flipped. In this article, we used a method presented in [16, 17] to get one, two, three, or four flips solutions. The set of all possible solutions for one, two, three, or four flips has to be stored in the look up tables J₁, J₂, J₃, and J₄, respectively. The notation J₃(S) returns all three flips solutions for syndrome S = {S₁S₂}. Similarly, we can get all possible solutions for block n₁ with syndrome S^I, for block n₂ with syndrome S^II, as J^I = {J₁(S^I) J₂(S^I) J₃(S^I) J₄(S^I)} and for block n₂ with syndrome S^II as J^II = {J₁(S^II) J₂(S^II) J₃(S^II) J₄(S^II)}, respectively. The look up tables' size is (2^2·m- 1) × nS where nS is a number of stored solutions.

3. Proposed data hiding scheme

In the proposed BCH data hiding scheme, we combine two BCH blocks of 2^m - 1 DCT coefficients into one, such that BCH blocks intersect each other. Figure 1 shows the block diagram of coefficients for the proposed scheme. In the presented example, (a₁, a₂, a₃, ..., a₂₅) is the combined block of the DCT coefficients; $(v_{1}^{'}, v_{2}^{'}, v_{3}^{'}, \dots, v_{15}^{'})$ and $(v_{1}^{″}, v_{2}^{″}, v_{3}^{″}, \dots, v_{15}^{″})$ are the corresponding binary coefficients for the BCH blocks n₁ and n₂, respectively. Intersected area I covers five coefficients a₁₁, a₁₂, a₁₃, a₁₄, and a₁₅ in this example. Such a scheme can hide more amounts of data by exploiting the intersected area using any kind of coding schemes.

One of the two main contributions of this article is to present a systematic algorithm for the joint solutions. The proposed BCH-based data hiding scheme requires to find a joint solution for both blocks n₁ and n₂ using the guidelines from Section 2.1 such that the intersected area does not affect the result. For example, let 8 bits be hidden into 15 coefficients from a₁ to a₁₅ using the BCH-based steganography. Then, another 8 bits can be hidden into the next block having another 15 coefficients from a₁₁ to a₂₅. This is the traditional approach. As a result, 16 bits can be hidden into 30 coefficients. However, our new approach hides the same amount of data into 25 coefficients a₁ to a₂₅. Eight bits are hidden into the coefficients from a₁ to a₁₅, and another eight bits into the coefficients from a₁₁ to a₂₅. Data hiding algorithm requires to find syndromes S^I and S^II (Equation 6) for each block n₁ and n₂, respectively.

There are two possible ways for hiding data into the combined blocks. Either hiding data into the block n₁ first, or into the block n₂ first. The proposed algorithm for getting a joint solution is designed as follows:

1.
Hiding data into the block n ₂ first.
(a)
Some solutions for hiding data into the block n ₁ do not modify the coefficients in the intersected area. Thus, solutions for the block n ₂ have to be obtained using the original syndrome S ^II. Some solutions are valid since they do not modify the coefficients in the intersected area. These solutions are called specified solutions.
(b)
Some solutions for the block n ₁ modify the coefficients in the intersected area. These modifications in the intersected area affect the syndrome for the block n ₂. Thus, the new syndrome for the block n ₂ is obtained as S ^II new. Some new solutions are valid since they do not modify the coefficients already modified by the n ₁ in the intersected area.

Among all possible solutions for the block n₂ and new syndrome $S_{n e w}^{II}$ (in case of 1(a), $S_{n e w}^{II} = S^{II}$ ), choose the solutions which do not have flipping positions in the intersected area (i.e., valid or specified solutions). Thus, the joint solutions for a combined block unify the solutions for the block n₁ and its syndrome S^I and the specified solutions for the block n₂ and its syndrome S^II new.

2.
Hiding data into the block n ₂ first.
(a)
Some solutions for hiding data into the block n ₂ do not modify the coefficients in the intersected area. Thus, solutions for the block n ₁ have to be obtained using the original syndrome S ^I . Some solutions are valid since they do not modify the coefficients in the intersected area.
(b)
Some solutions for the block n ₂ modify the coefficients in the intersected area. These modifications in the intersected area affect the syndromes for the block n ₁. Thus, the new syndrome for the block n ₁ is obtained as $S_{n e w}^{I}$ . Some new solutions are valid since they do not modify the coefficients already modified by the n ₂ in the intersected area.

The joint solutions for a combined block unify the solutions for the block n₂ and its syndrome S^II and the specified solutions for the block n₁ and its syndrome S^I new (in case of 2(a), $S_{n e w}^{I} = S^{I}$ ).

In general, the proposed modified BCH data hiding schemes hides 4·m bits of data to the block of 2·(2^m-1)-|I| by using the BCH scheme (2^m-1, k, 2) for blocks n₁ and n₂.

The proper BCH-based data hiding scheme needs a suitable parameter m for hiding message M into the stream of N nonzero DCT coefficients. The parameter m can be obtained as follows:

\frac{4 \cdot m \cdot N}{2 \cdot (2^{m} - 1) - | I |} \geq |M|,

(8)

where m defines the proper BCH-based scheme for the proposed method, N is the number of nonzero DCT coefficients, M is the hidden message, n^p = 2·(2^m-1)-|I| is the size of the combined block, 4·m is the capacity of the combined block.

3.1. Data hiding algorithm

The proposed method requires to find the solution for two blocks n₁ and n₂ for hiding two messages m₁ and m₂ together such that

\{\begin{matrix} m_{1} = H \cdot R_{1} \\ m_{2} = H \cdot R_{2} \end{matrix}

(9)

where R₁ and R₂ are the modified streams of the binary coefficients obtained from n₁ and n₂ (see Figure 1); H is a parity-check matrix from Equation (1).

Note that, hiding message m₁ to block n₁ modifies the block n₂ and vice versa, due to the intersected part. Hence, we need proper positions to flip by solving Equation (9) for correct decoding.

Among all possible solutions, the proposed method unifies the solutions for blocks n₁ and n₂, such that the flip positions cover only nonintersected area for both blocks (i.e., $J_{s}^{I} = J^{I} \notin I$ and $J_{s}^{II} = J^{II} \notin I$ , for blocks n₁ and n₂). In other words, it is desirable to hide data into the block n₁ using the solutions from $J_{s}^{I}$ that do not affect the block n₂, and vice versa. According to the above explanation, $J_{s}^{I}$ and $J_{s}^{II}$ unify the specified solutions for the blocks n₁ and n₂, respectively. Here, note that superscript indexes X^I and X^II present different items for blocks n₁ and n₂, respectively.

However, even if some flip positions j from the block n₁ belong to the intersected area I. Thus, we can consider the effect of those j to get a new solutions for the block n₂ and vice versa.

For this purpose, Equation (9) can be rewritten as follows:

\begin{matrix} P_{1}^{II} = S_{1}^{II} + β_{1} + \dots + β_{l} \\ P_{2}^{II} = S_{2}^{II} + β_{1}^{3} + \dots + β_{l}^{3} \end{matrix}

(10)

where $S^{II} = \{S_{1}^{II} S_{2}^{II}\}$ is the syndrome for blocks n₂; $S_{n e w}^{II} = \{P_{1}^{II} P_{2}^{II}\}$ is a new syndrome for blocks n₂ after hiding data to block n₁; l is the number of the flip positions (j₁, ..., j_l) from the block n₁ belonged to the intersected area I (i.e., j₁, ..., j_l = J^I (S^I ) ∈ I); and the values β₁, ..., β_lare computed using Equation (13) for the flipping positions $(j_{1}^{'}, \dots, j_{l}^{'}) = F (j_{1}, \dots, j_{l})$ from the intersected area I for the block n₂. Function F converts indexes (j₁, ..., j_l) of the intersected area from the block n₁ to the corresponding indexes $(j_{1}^{'}, \dots, j_{l}^{'})$ from the block n₂. For example, solution for the block n₁ illustrated in Figure 1 is $J^{I} (S^{I}) = [\begin{matrix} 3 & 11 \end{matrix}]$ . j₁ = 11 ∈ I, where index 1 means the first coefficient form the intersected area I. Coefficient j₁ = 11 is located in the 11th position of the combined block. However, 11th coefficient in the combined block is the 15th coefficient in the block n₂ (i.e., $F (j_{1}) = F (11) = j_{1}^{'} = 15$ see Figure 1). Thus, even if the flip positions for blocks n₁ and n₂ are different (i.e., j₁ = 11 and $j_{1}^{'} = 15$ ), those coefficients have the same location in the combined block.

Finally, the solution for the block n₂ can be obtained as ${j_{1}^{'}, \dots, j_{l}^{'}, J_{s}^{II} (S_{n e w}^{II})}$ . Presented solution sufficiently hides message m₂ into the block n₂.

The joint solution hides both messages m₁ and m₂ into the combined blocks. The joint solution $\{J^{I} (S^{I}), J_{s}^{II} (S_{n e w}^{II})\}$ unifies the solutions for the blocks n₁ and n₂. In this example, the flipping positions from the intersected area are the part of J^I(S^I).

Similarly, we can get a joint solution by using the current solution for block n₂ (i.e., J^II(S^II)). For this purpose, Equation (9) can be rewritten again as follows:

\begin{matrix} P_{1}^{I} = S_{1}^{I} + β_{1} + \dots + β_{l} \\ P_{2}^{I} = S_{2}^{I} + β_{1}^{3} + \dots + β_{l}^{3} \end{matrix}

(11)

where $S^{I} = \{\begin{matrix} S_{1}^{I} & S_{2}^{I} \end{matrix}\}$ is the syndrome for blocks n₁; $S_{n e w}^{II} = \{\begin{matrix} P_{1}^{II} & P_{2}^{II} \end{matrix}\}$ is the new syndrome of the block n₁ after hiding data to block n₂; l is the number of flip positions $(j_{1}^{'}, \dots, j_{l}^{'})$ for the block n₂ belonged to the intersected area I (i.e., $(j_{1}^{'}, \dots, j_{l}^{'}) = J^{II} (S^{II}) \in I$ ); β₁, ..., β_l are computed using Equation (15) for the flipping positions $(j_{1}, . . ., j_{l}) = F^{- 1} (j_{1}^{'}, \dots, j_{l}^{'})$ from the intersected area I for the block n₁; function F^-1 (i.e., the inverse function of F) converts the indexes of the coefficients of intersected area $(j_{1}^{'}, \dots, j_{l}^{'})$ from the block n₂ to the corresponding indexes (j₁, ..., j_l) from the block n₁. For example, if $J^{II} (S^{II}) = [\begin{matrix} 1 & 15 \end{matrix}]$ , then $j_{1}^{'} = 15 \in I$ , then $j_{1} = F^{- 1} (j_{1}^{'}) = F^{- 1} (15) = 11$ (see Figure 1).

The solution for the block n₁ can be obtained as $\{(j_{1}, . . ., j_{l}) J_{s}^{I} (S_{n e w}^{I})\}$ . Presented solution sufficiently hides message m₁ into the block n₁.

Joint solution for hiding both messages m₁ and m₂ is $\{J_{s}^{I} (S_{n e w}^{I}) J^{II} (S^{II})\}$ . Here, the flipping positions from the intersected area $(j_{1}^{'}, \dots, j_{l}^{'})$ are the part of J^II(S^II). Corresponding flipping positions $(j_{1}, . . ., j_{l}) = F^{- 1} (j_{1}^{'}, \dots, j_{l}^{'})$ are the part of the solution for the block n₁.

Note that there are several solutions in J^I and J^II for syndromes S^I and S^II, respectively. Presented method may generate one joint solution for each solution from J^I(S^I) and J^II(S^II).

The proposed method requires to find values β from the flip positions (j₁, ..., j_l) or $(j_{1}^{'}, \dots, j_{l}^{'})$ . The relationship between β and flip position j is presented as follows:

j = log (β)

(12)

or

β = lo g^{- 1} (j)

(13)

The complete procedure for getting all possible joint solutions for any syndromes is presented as follows:

For a given combined block of binary coefficients a and two messages m₁and m₂process follows:

(a)
Define two blocks of the DCT coefficients n ₁ and n ₂ (see Figure 1). Compute syndromes S ^I and S ^II using corresponding binary streams v' and v".
(b)
Find all possible solutions j ^I = J ^I(S ^I) and j ^II = J ^II(S ^II) for blocks n ₁ and n ₂ by using the syndromes S ^I and S ^II.
(c)
For each solution j ^I(p) (p = 1, 2, 3,..,k, where k is the number of solutions) process follows:
i.
Define flip positions j ₁, ..., j_l from the intersected area I.
ii.
Convert j ₁, ..., j_l to $j_{1}^{'}, \dots, j_{l}^{'}$ (corresponding flip positions from the block n ₂). Compute corresponding β using Equation 13. Compute new syndrome $S_{n e w}^{II}$ using Equation 10.
iii.
Using a new syndrome $S_{n e w}^{II}$ get new flips solutions as $j_{n e w}^{II} = J_{s}^{II} (S_{n e w}^{II})$ .
iv.
For each solution $j_{n e w}^{II} (q)$ (q = 1, 2, 3,...,z, where z is the number of solutions) store the joint solution: $\{j^{I} (p), j_{n e w}^{II} (q)\}$ .
(d)
For each solution j ^II (p) (p = 1, 2, 3,...,k) process follows:
i.
Define flip positions $j_{1}^{'}, \dots, j_{l}^{'}$ from the intersected area I for block n ₂.
ii.
Convert $j_{1}^{'}, \dots, j_{l}^{'}$ to j ₁, ..., j_l Compute corresponding β. Compute new syndrome $S_{n e w}^{I}$ using Equation 11.
iii.
Using a new syndrome $S_{n e w}^{I}$ get new flips solutions as $j_{n e w}^{I} = J_{s}^{I} (S_{n e w}^{I}) .$
iv.
For each solution $j_{n e w}^{I} (q)$ (q = 1, 2, 3,...,z, where z is the number of solutions) store the joint solution: $\{j_{n e w}^{I} (q), j^{II} (p)\}$ .

The stored joint solutions are used further to hide data with better performance. Note that the proposed method needs to search the best solution among k·q possible candidates for each block (see steps c and d). Thus, computational complexity of the proposed search algorithm is O(n²).

3.2. Two-stage embedding technique

In order to enhance the performance of the blockwise methods (i.e., ME, MME, BCH-based data hiding, etc.), we utilize almost all the DCT coefficients for data hiding. The proposed method uses two different embedding schemes together. Two schemes use the different block sizes $n_{1}^{p}$ and $n_{2}^{p}$ , and have different payloads $m_{1}^{p}$ and $m_{2}^{p}$ .

This method divides the stream of the DCT coefficients (c₁, c₂, ..., c_N) and the message M into two parts and hides data into each part separately. The optimal number of the blocks (k₁ and k₂) for both schemes can be computed as follows:

The relation between the numbers of blocks for the schemes 1 and 2 is presented as follows:

\{\begin{matrix} n_{1}^{p} \cdot {k^{'}}_{1} + n_{2}^{p} \cdot {k^{'}}_{2} = N \\ m_{1}^{p} \cdot {k^{'}}_{1} + m_{2}^{p} \cdot {k^{'}}_{2} = |M| \end{matrix}

(14)

where N is the number of DCT coefficients.

The computed $k_{1}^{'}$ and $k_{2}^{'}$ are noninteger numbers. Thus, we have to choose the nearest integers $k_{1} = ⌈ k_{1}^{'} ⌉ \pm 1$ and $k_{2} = ⌈ k_{2}^{'} ⌉ \pm 1$ such that:

\{\begin{matrix} n_{1}^{p} \cdot k_{1} + n_{2}^{p} \cdot k_{2} \leq N \\ m_{1}^{p} \cdot k_{1} + m_{2}^{p} \cdot k_{2} \geq |M| \end{matrix}

(15)

The presented two-scheme embedding method improves the performance of data hiding by using the proper distribution of the available DCT coefficients among two different modified BCH schemes. First scheme uses $m_{1}^{p} = 4 \cdot m$ obtained from inequality (8), the second scheme uses $m_{p}^{2} = 4 \cdot (m + 1)$ . Note that the second scheme has higher embedding efficiency. The efficiency of the two schemes embedding refers to the ratio between number of blocks k₁ and k₂ for the schemes 1 and 2, respectively. The larger the value k₁ (smaller ratio k₁/k₂), the higher efficiency of the proposed two schemes embedding for the same m.

The two-scheme embedding method enables to use different sizes of the intersected area for both schemes I_{sh 1}and I_{sh 2}, respectively (see Tables 1 and 2 We test several sizes of the intersected areas and several payloads. In the experiments, we try to hide data into a set of 4,000 natural images and compute performance against the steganalysis [20, 25] for different sizes of the intersected areas and payloads. Results are presented in Tables 1, 2, and 3.

Table 1 Accuracy of the steganalysis [20] for different sizes of the intersected areas and payloads

Full size table

Table 2 Accuracy of the steganalysis [25] for different sizes of the intersected areas and payloads

Full size table

Table 3 The most appropriate intersected area size versus payload

Full size table

Boldface numbers in Tables 1 and 2 link to the lowest accuracy and show the most appropriate intersected area size for each tested payload. Data hiding by using the most appropriate intersected area always shows better results. Tables 1 and 2 also indicate a difference between the proposed method and the original BCH-based steganography method [16] in terms of performance of the steganalysis [20, 25]. The most appropriate intersected area size presented in Table 3 was used later for other experiments.

4. Inserting-removing strategy

The performance of the proposed method can significantly be increased by using inserting-removing strategy. The proposed strategy is based on fact that the block of the 2^m -1 DCT coefficients can be modified before data hiding by inserting or removing coefficients 1 and -1. Data hiding to modified stream of DCT coefficients may result lower distortion and, as a result, lower detectability of the steganalysis. Such a modification has to be carried out carefully and sophisticatedly in order to reduce distortion.

The proposed inserting-removing strategy uses the stream of nonrounded quantized DCT coefficients a_q computed as follows:

a^{'} = D C T (B), a_{q} = \frac{a^{'}}{Q}, a_{r} = r o u n d (a_{q})

(16)

where B is the 8 × 8 block of the image pixels; a' is the block of original DCT coefficients; a_q is the block of DCT coefficients divided by corresponding coefficients from quantization matrix Q; a_r is the block of quantized DCT coefficients; Q_f is a quality factor.

Each nonzero integer DCT coefficient has a corresponding informative bit computed as follows:

b = \{\begin{matrix} a_{r} m o d 2 & i f a_{r} > 0, \\ a_{r} - 1 m o d 2 & i f a_{r} < 0 \end{matrix}

(17)

According to the proposed inserting-removing strategy, the stream a of nonrounded DCT coefficients obtained from the blocks a_q is divided into three sets: modifiable c_m = a ∈ (-∞; -1.5) ∪ (1.5;∞), removable c_R = a ∈ [-1.5; -0.5) ∪ (0.5;1.5], and insertable c_Ins = a ∈ [-0.5; -0.25) ∪ (0.25;0.5]. Set c unifies modifiable, insertable, and removable sets (i.e., c = c_m ∪ c_R ∪ c_Ins). The set C = c_m ∪ c_R contains all nonzero rounded DCT coefficients. According to Equation (17), only the nonzero DCT coefficients (i.e., set C) have the corresponding informative coefficients and can be used for hiding data.

The proposed steganographic method uses the stream of n_p nonzero DCT coefficients from the set C for data hiding. In general, set C is the subset of the unified set c. Thus, each block unifies the n_p coefficients form set C and some insertable coefficients from the set c (i.e., $c_{b} = c_{m}^{'} \cup c_{R}^{'} \cup c_{I n s}^{'}$ , where $C^{'} = c_{m}^{'} \cup c_{R}^{'}$ is the block of n_p nonzero DCT coefficients from the set C). Inserting or removing of any coefficients from $c_{I n s}^{'}$ and $c_{R}^{'}$ produces a new block C' with new solution for data hiding. As a result, inserting-removing strategy significantly increases the number of possible solutions and helps to find the most appropriate solution with the lowest distortion.

In the proposed improved matrix encoding, we use the same measure for computing distortion similar to MME [14]. The distortion for each DCT coefficient is computed as follows:

D = E^{2} \cdot Q^{2}

(18)

E = \{\begin{matrix} 0.5 - |C - ⌊ C ⌋|, & if C \in c_{m} \\ 1.5 - |C|, & if C \in c_{R} \end{matrix}

The distortion due to inserting or removing D_IR is computed as follows:

D_{I R} = {|0.5 - |C||}^{2} \cdot Q^{2}, i f C \in c_{R} \cup c_{I n s}

(19)

where Q is the corresponding quantization coefficient of the quantization table.

The resulted distortion for the combined block of DCT coefficients is computed as follows:

D_{b} = \sum_{i = 1}^{l} D_{i} + D_{I R}

(20)

where l is the number of flipped coefficients.

Flipped coefficients are computed as follows:

A_{r} = \{\begin{matrix} 2, & i f a_{r} = 1, \\ - 2, & i f a_{r} = - 1, \\ a_{r} + 1, & i f a_{q} > a_{r}, \\ a_{r} - 1, & i f a_{q} < a_{r}, \end{matrix}

(21)

5. Encoder and decoder

The encoder of the proposed steganographic method based on modified BCH data hiding scheme and inserting-removing strategy is organized as follows:

For a given bitmap image I_m, payload P, quality factor Q_f, and secret key K process follows:

1.
Divide image I_m into nonoverlapped 8 × 8 blocks of pixels and process DCT, quantization and rounding as presented in (16). Remove DC coefficients. Obtain a', a_q, a_r , and streams of DCT coefficients a. Permute stream a using K and any pseudo-random generator. Obtain stream c = a ∈ (-∞; -0.25) ∪ (0.25;∞) from the permuted stream a.
2.
Define sets: modifiable c_m , insertable c_Ins , and removable c_R .
3.
Define parameters for schemes 1 and 2, and number of the blocks k ₁ and k ₂ using (14) and (15). Divide message M into two parts: $M_{1} = m_{1}^{p} \cdot k_{1}$ and $M_{2} = m_{2}^{p} \cdot k_{2}$ .
4.
Start from the first block i = 1. Define the i th block of the DCT coefficients $c_{b_{i}} = c_{m_{i}}^{'} \cup c_{R_{i}}^{'} \cup c_{{I n s}_{i}}^{'}$ , where $c_{m_{i}}^{'}$ , $c_{R_{i}}^{'}$ , and $c_{{I n s}_{i}}^{'}$ are the modifiable, removable, and insertable subsets for the current block. If i = k ₁ +1 switch to the scheme 2.
5.
Define the block of nonzero rounded DCT coefficients $C_{i}^{'} = c_{m_{i}}^{'} \cup c_{R i}^{'}$ .
6.
Get the solutions for the block $C_{i}^{'}$ using the modified BCH data hiding scheme (see the algorithm in Section 3). Compute the distortion D for each solution using Equation (20). Choose solution J_m with the lowest distortion D_m and store it.
7.
Modify the block $C_{i}^{'}$ by inserting or removing coefficients from the subsets $c_{R_{i}}^{'}$ , and $c_{{I n s}_{i}}^{'}$ . Obtain a new block: (i) after removing $C_{i}^{'} = c_{m_{i}}^{'} \cup c_{R_{i}}^{″}$ , where $c_{R_{i}}^{″} = c_{R_{i}}^{'} - c_{R_{i}}^{'} (p)$ is the modified removable set and $c_{R_{i}}^{'} (p)$ is the removed coefficient; (ii) after inserting $C_{i}^{'} = c_{m_{i}}^{'} \cup c_{R_{i}}^{'} \cup c_{{I n s}_{i}}^{'} (q)$ , where $c_{{I n s}_{i}}^{'} (q) = \pm 1$ is the inserted coefficient. p and q are the current position for insertion and removing.
8.
Repeat steps 5-6 for all insertable and removable coefficients from $c_{R_{i}}^{'}$ , and $c_{{I n s}_{i}}^{'}$ .
9.
Among all stored solutions J_m choose solution with the lowest distortion D_m . Modify one, two, or three coefficients according to the best solution (see explanation in Section 2) and, if necessary, insert or remove coefficient in the block $c_{b_{i}}$ .
10.
Process all k ₁ + k ₂ blocks using steps 4-9. Obtain the modified stream $c^{'} = \{c_{b_{1}}, c_{b_{2}}, \dots, c_{b_{k_{2} + k_{2}}}\}$ .
11.
Recover the original sequence order of the DCT coefficients a from the modified stream c' using the secret key K and utilized pseudo-random generator. Add DC coefficients, round the coefficients a', and obtain the modified JPEG image $I_{m}^{'}$ .

The decoder of the proposed steganographic method is organized as follows:

For the given modified JPEG image $I_{m}^{'}$ , quality factor Q_f, secret key K, and size of the payload p = |P| process follows:

1.
Read the DCT coefficients from the JPEG file. Permute them using the secret key K and utilized pseudo-random generator. Remove the DC coefficients. Obtain the stream of nonzero DCT coefficients C.
2.
Using Equations (15) and (16) define parameters of the schemes 1 and 2, and the number of blocks k ₁ and k ₂. Here, N = |C|.
3.
Divide C into the blocks according to the k ₁ and k ₂.
4.
Decode data from each block using (9).

The steganographic method based only on modified BCH data hiding scheme skips the steps 7 and 8.

6. Experimental results

In these experiments, we try to hide different amount of data into the set of uncompressed images using the proposed BCH-based data hiding scheme with and without the inserting-removing strategy. The set of modified and original compressed images is analyzed by two powerful steganalysis algorithm proposed by Pevny and Fridrich [20] and Kodovsky and Fridrich [25]. Those methods use 274 and 548 different features of the DCT coefficients, respectively. The union of the 274 or 548 features from the unmodified and modified images are used for making the models for the support vector machine (SVM) with parameter C = 10⁴ and kernel width γ = 10^-4. A set of 4,000 natural uncompressed images (768*512) downloaded from Corel Draw and obtained from several digital cameras is used in our experiments. Proposed method needs 1-5 min for hiding data to each image. Experiments are carried out for seven different payloads (0.05, 0.1, 0.15, 0.17, 0.20, 0.22, and 0.25 bits per nonzero coefficient--bpc) and quality factor 75. SVM training process needs a set of 3,000 images (1,500 original and 1,500 stego images) for 7 different payload sizes. The SVM engine tests for 7 obtained models to test a set of 1,000 images (500 original and 500 stego) for 7 different payload sizes. The result shows the error probabilities of the steganalysis for each tested payload (see Figures 2 and 3).

The error probability is computed as follows:

e = \frac{1}{2} (P_{a} + P_{b}),

(28)

where P_a is the probability of misdetection (i.e., the unmodified image is classified as modified) and P_b is the probability of misclassification (i.e., the modified image is classified as unmodified).

In our experiments, we test both methods: (1) based only on the modified BCH-based data hiding scheme; and (2) the modified BCH-based data hiding scheme with the proposed inserting-removing strategy. The proposed methods achieve high error probability for all the tested payloads. For payloads up to 0.1 bpc, both methods have detectability close to 50%, meaning that the steganalysis cannot distinguish the unmodified images from the modified. This probability is almost equal to that of the coin toss. For higher payloads around 0.15 and 0.2 bpc, the proposed methods show much better performance compared to the MME. Significant improvement over the MME is justified on the fact of using methods with larger embedding efficiency (i.e., the BCH-based schemes with large m). The proposed method also shows better results compared to the methods based on the original BCH-based schemes. Hence, the proposed method with the inserting-removing strategy shows the significant improvement over the method with modified BCH-based data hiding scheme only, by 0.0363, 0.0414, and 0.0392 points in terms of error probabilities for payloads 0.15, 0.2, and 0.25, respectively. For payload of 0.25 bpc, both methods show 0.2961 and 0.3353 of the error probability. The error probabilities are better than those of the MME [14], original BCH-based [16], heuristic BCH-based scheme [17], and syndrome trellis code STC [22] proposed by Kodovsky and Fridrich. Such improvement was achieved by using modified BCH-based data hiding and unique inserting-removing strategy.

7. Conclusion

In this article, an efficient data hiding technique for steganography is presented. The proposed BCH-based data hiding scheme uses two blocks to form a single combined block. A new data hiding strategy enables to get a joint solution for two blocks with intersected coefficients. Due to intersection, the proposed method requires small number of coefficients for hiding the same amount of data compared with the original nonoverlapping blockwise approaches. As a result, the proposed method can use the BCH-based schemes with large m (i.e., lager capacity). Even though the proposed method requires to use the same BCH-based scheme (for 0.17 and 0.2 bpc), the efficiency of data hiding is still high because the proposed two-scheme embedding has a lower ratio k₁\k₂ compared to the original BCH-based scheme. The proposed BCH-based data hiding scheme significantly outperforms the MME and original BCH-based steganography in terms of the error probabilities and accuracy against the steganalysis. The proposed two-scheme embedding technique (see Equations 14 and 15) enables to use almost all the available DCT coefficients. The proposed strategy based on inserting and removing coefficients 1 or -1 increases the number of possible solutions and significantly decreases the total distortion. The experimental results show that the inserting-removing strategy significantly improves the performance of the proposed method. The combination of the modified BCH-based and the inserting-removing strategy achieves higher error probabilities and lower accuracy against the powerful steganalysis.

References

Provos N: Defending against statistical steganalysis. In Proc of 10th USENIX Security Symposium. Washington, DC; 2001:24-24.
Google Scholar
Eggers J, Bauml R, Girod B: A communications approach to steganography. In Proc of EI SPIE. Volume vol. 4675. San Jose, CA; 2002:26-37.
Google Scholar
Noda H, Niimi M, Kawaguchi E: Application of QIM with dead zone for histogram preserving JPEG steganography. In Proc of ICIP. Geneva, Italy; 2005.
Google Scholar
Solanki K, Sakar A, Manjunath BS: YASS: Yet another steganographic scheme that resists blind steganalysis. Lect Notes Comput Sci 2007, 2939: 154-167.
Google Scholar
Westfeld A: High capacity despite better steganalysis (F5--a steganographic algorithm). Lect Notes Comput Sci 2001, 2137: 289-302.
Article Google Scholar
Fridrich J: Minimizing the embedding impact in steganography. In Proc of ACM Multimedia and Security Workshop. Geneva, Switzerland; 2006:2-10.
Google Scholar
Fridrich J: Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes. Lect Notes Comput Sci 2005, 3200: 67-81.
Article Google Scholar
Fridrich J, Filler T: Practical methods for minimizing embedding impact in steganography. In Proc EI SPIE. Volume vol. 6505. San Jose, CA; 2007:2-3.
Google Scholar
Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography using wet paper codes. In Proc of ACM Workshop on Multimedia and Security. Magdeburg, Germany; 2004:4-15.
Google Scholar
Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography. ACM Multimedia Secur J 2005, 11(2):98-107.
Article Google Scholar
Fridrich J, Pevny T, Kodovsky J: Statistically undetectable JPEG steganography: dead ends, challenges, and opportunities. In Proc of ACM Workshop on Multimedia and Security. Dallas, TX; 2007:3-15.
Google Scholar
Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography. ACM Multimedia Secur J 2005, 11(2):98-107.
Article Google Scholar
Fridrich J, Goljan M, Soukal D: Wet paper coding with improved embedding efficiency. IEEE Trans Inf Secur Forensics 2005, 1(1):102-110.
Article Google Scholar
Kim YH, Duric Z, Richards D: Modified matrix encoding technique for minimal distortion steganography. Lect Notes Comput Sci 2006, 4437: 314-327.
Article Google Scholar
Schönfeld D, Winkler A: Reducing the complexity of syndrome coding for embedding. Lect Notes Comput Sci 2008, 4567: 145-158.
Article Google Scholar
Zhang R, Sachnev V, Kim HJ: Fast BCH syndrome coding for steganography. Lect Notes Comput Sci 2009, 5806: 48-58.
Article Google Scholar
Sachnev V, Kim HJ, Zhang R: Less detectable JPEG steganography method based on heuristic optimization and BCH syndrome coding. In Proc of ACM Workshop on Multimedia and Security. Princeton, NJ; 2009:131-139.
Chapter Google Scholar
Filler T, Fridrich J: Steganography using Gibbs random fields. In Proceedings of ACM Multimedia and Security Workshop. Rome, Italy; 2010:199-212.
Chapter Google Scholar
Upham D[http://www.funet.fi/pub/crypt/stegangraphy/jpeg-jsteg-v4.diff.gz]
Pevny T, Fridrich J: Merging Markov and DCT features for multi-class JPEG steganalysis. In Proc of SPIE. Volume vol. 6505. San Jose, CA; 2007:3-4.
Google Scholar
Shi YQ, Chen C, Chen W: Markov process based approach to effective attacking JPEG steganography. Lect Notes Comput Sci 2006, 4437: 249-264.
Article Google Scholar
Filler T, Judas J, Fridrich J: Minimizing embedding impact in steganography using trellis-coded quantization. IEEE Trans Inf Secur Forensics 2011, 6(3):920-935.
Article Google Scholar
Rifa-Pous H, Rifa J: Product perfect codes and steganography. Digital Signal Process 2009, 19: 764-769.
Article Google Scholar
Zhao Z, Wu F, Yu S, Zhou J: A lookup table based fast algorithm for finding roots of quadratic or cubic polynomials in the GF(2^m). J Huazhong Univ Sci Technol (Nat Sci Ed.) 2005, 33(1):70-73.
MathSciNet Google Scholar
Kodovsky J, Fridrich J: Calibration revisited. In Proceedings of the 11th ACM Multimedia & Security Workshop. Edited by: Dittmann J, Craver S, Fridrich J. Princeton, NJ; 2009.
Google Scholar

Download references

Acknowledgements

This study was supported by the Catholic University of Korea, National Research Foundation of Korea (grant 2011-0013695), ITRC and BK21 Project, Korea University and IT R&D program (Development of anonymity-based u-knowledge security technology, 2007-S001-01).

Author information

Authors and Affiliations

School of Information, Communications, and Electronic Engineering, The Catholic University of Korea, Bucheon, 420-743, Republic of Korea
Vasily Sachnev
CIST, Korea University, Seoul, 136-701, Republic of Korea
Hyoung Joong Kim

Authors

Vasily Sachnev
View author publications
You can also search for this author in PubMed Google Scholar
Hyoung Joong Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hyoung Joong Kim.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Sachnev, V., Kim, H.J. Modified BCH data hiding scheme for JPEG steganography. EURASIP J. Adv. Signal Process. 2012, 89 (2012). https://doi.org/10.1186/1687-6180-2012-89

Download citation

Received: 14 June 2011
Accepted: 26 April 2012
Published: 26 April 2012
DOI: https://doi.org/10.1186/1687-6180-2012-89

Modified BCH data hiding scheme for JPEG steganography

Abstract

1. Introduction

2. BCH syndrome coding

2.1. BCH syndrome coding

2.2. Lookup tables

2.3. Solutions

3. Proposed data hiding scheme

3.1. Data hiding algorithm

3.2. Two-stage embedding technique

4. Inserting-removing strategy

5. Encoder and decoder

6. Experimental results

7. Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords