Skip to main content

Modified BCH data hiding scheme for JPEG steganography

Abstract

In this article, a new Bose-Chaudhuri-Hochquenghem (BCH)-based data hiding scheme for JPEG steganography is presented. Traditional data hiding approaches hide data into each block, where all the blocks are not overlapping each other. However, in the proposed method, two consecutive blocks can be overlapped to form a combined block which is larger than a single block, but smaller than two consecutive nonoverlapping blocks in size. In order to embed more amounts of data into the combined block than a single block, the BCH-based data hiding scheme has to be redesigned. In this article, we propose a way to get a joint solution for hiding data into two blocks with intersected coefficients such that any modification of the intersected area does not affect the data hiding process into both blocks. Due to hiding more amounts of data into the intersected area, embedding capacity is increased. On the other hand, the nonzero DCT coefficient stream is modified to achieve better steganalysis and to reduce the distortion impact after data hiding. This approach carefully inserts or removes 1 or -1 coefficients into or from the DCT coefficient stream according to the rule proposed in this article. Experimental results show that the proposed algorithms work well and their performance is significant.

1. Introduction

One of the first steganography methods for JPEG images embeds data by changing the least-significant bit values of the quantized discrete cosine transform (DCT) coefficients. However, this method can easily be detected by a statistical analysis. Thus, for a good while, evading the statistical analysis has been a major concern. Provos [1] divides the DCT coefficients into two disjoint subsets, hides data into the first subset, and compensates the distorted histogram by modifying the second subset. Other methods in [2, 3] use a similar approach. On the other hand, Solanki et al. [4] utilize the robust watermarking scheme for steganography purposes. They embed data into image in the spatial domain by using a technique robust against JPEG compression. Their scheme provides less degradation onto the features of the DCT coefficients, and, as a result, its detectability was low against old version of the statistical steganalysis.

Another way to survive against steganalysis is reducing the number of modified coefficients. Traditionally, each nonzero DCT coefficient has been modified. As a result, embedding capacity is as much as the number of nonzero DCT coefficients. However, the maximum possible embedding capacity trades off the detectability. Westfeld [5] has used a matrix encoding (ME) technique to lower detectability by sacrificing the embedding capacity. The ME technique exploits the Hamming code which is designed for error correction. His scheme hides many bits by flipping at most one coefficient in each block. This approach was the first instance of using the error correcting code for data hiding.

Fridrich et al. [6–13] use the concept of the "minimal distortion" to enhance the security (i.e., by reducing distortion). The perturbed quantization steganography utilizes the wet paper coding.

Later, Kim et al. [14] have improved the performance of the ME by reducing the distortion impact. In fact, their modified matrix encoding (MME) method changes more number of coefficients compared to the ME. However, they show that the distortion impact after modifying one coefficient may be larger than that after modifying two coefficients. Thus, it is obvious that modifying one coefficient or two per block may have less distortion and lower detectability against the steganalysis. Note that MME requires the original uncompressed image for data hiding, but not for decoding.

Schönfeld and Winkler [15] have proposed a new way to hide data using more powerful error correction code. They use a structured Bose-Chaudhuri-Hochquenghem (BCH) code [2]. Zhang et al. [16] have significantly improved the original BCH-based data hiding scheme. Their improved method can easily find the flip positions and defeat the steganalysis well compared to the existing methods. Later, Sachnev et al. [17] apply a heuristic optimization technique for the data hiding scheme over the BCH coding and modify the stream of the input DCT coefficients to reduce the distortion. Their method considerably outperforms the steganography method proposed by Zhang et al. [16].

Recently, Filler and Fridrich [18] have proposed a remarkable framework which minimizes a distortion measure as a weighted norm of the difference between cover and stego feature vectors. In their approach, the distortion is not necessarily an additive function over the pixels because the features may contain higher-order statistics such as sample transition probability matrices of pixels or DCT coefficients modeled as Markov chains [19–21]. When the distortion measure is defined as a sum of local potentials, practical near-optimal embedding methods can be implemented with syndrome-trellis codes [22].

Most of the above-mentioned steganographic methods use the nonoverlapping blocks of the DCT coefficients for hiding secret message. Such a blockwise embedding scheme divides both the stream of the DCT coefficients and hidden message into the separate blocks and solves the equations for hiding data for each block individually. Recent methods like MME [14], BCH-based steganography methods [15–17] may produce several alternative solutions. Thus, such a data hiding method can choose a solution with the lowest distortion impact. Past investigation over the BCH data hiding scheme finds that BCH usually allows redundant number of possible solutions. It means that a solution with acceptable distortion impact can be achieved from the reduced set of possible solutions. Hence, the embedding efficiency of the BCH steganographic methods can be increased by reducing the number of possible solutions and keeping similar distortion impact compared to the original approach.

In the proposed method, two blocks of the DCT coefficients form a combined block sharing common coefficients in the intersected part between two consecutive blocks. Such a design achieves high embedding efficiency by hiding data twice into the intersected area. The number of possible joint solutions for both blocks (i.e., solutions which valid for both blocks) is always smaller than the number of all possible solutions for two independent blocks. The reduced number of possible solutions can increase distortion, but not significantly. Besides, the number of possible solutions can easily be controlled by changing size of the intersected area. The smaller size of the intersected area, the larger number of possible joint solutions. Similar approach has been tested for Hamming code in [23].

However, the higher size of the intersected area, the higher embedding efficiency of the proposed method. In the proposed method, the block of the DCT coefficients can be modified by inserting new nonzero coefficients 1 or -1, or removing coefficients 1 or -1. Such modification is carried out carefully and sophisticatedly in order to reduce distortion caused by excessive hiding.

The rest of the article is organized as follows. Section 2 explains the details of the BCH coding. Section 3 presents the BCH-based modified data hiding scheme. In Section 4, we propose the inserting-removing strategy. The encoder and decoder are presented in Section 5. Section 6 provides the experimental results. Finally, Section 7 concludes the article.

2. BCH syndrome coding

The BCH codes are the well known and widely used family of the error correction codes. BCH code (n, k, t) can correct t bits by inserting n - k additional bits to the original message k such that syndrome of resulted n bits is equal to 0. In general, BCH codes were invented for error correction and cannot directly be used for data hiding. An efficient method of using powerful BCH codes for data hiding has been presented in [15–17].

2.1. BCH syndrome coding

The generalized parity-check matrix H for BCH coding is presented as follows:

H = 1 α α 2 ⋯ α n - 1 1 α 3 α 3 2 ⋯ α 3 n - 1 ⋮ ⋮ 1 α 2 t - 1 α 2 t - 1 2 ⋯ α 2 t - 1 n - 1
(1)

Let t be 2. Then, the parity-check matrix is expressed as follows:

H = 1 α α 2 ⋯ α n - 1 1 ( α 3 ) α 3 2 ⋯ α 3 n - 1
(2)

Assume that the original stream of binary data is V = {v0, v1, v2, ..., vn-1}, and the modified stream of binary data after data hiding is R = {r0, r1, r2, ..., rn-1}. The streams V and R over GF(2m) can be represented as V(x) = v0 + v1·x + v2·x2 + v3·x3 + ⋯ + vn-1·xn-1, and R(x) = r0 + r1·x + r2·x2 + r3·x3 + ⋯ + rn-1·xn-1, respectively.

The embedded message m can be computed as follows:

m=Râ‹… H T
(3)

Thus, the hiding message m to V requires to find R such that

Râ‹… H T =m
(4)

The difference between V and R shows the number and location of the elements in V to be flipped.

R=V+E
(5)

or

E= x u 1 + x u 2 + x u 3 +⋯+ x u l ,

where u = {u0, u1, u2, ..., u l } are the positions of the elements in V to be flipped in order to get R.

Using Equations (3) and (4), the syndrome S can be computed as follows:

S=m-Vâ‹… H T =Eâ‹… H T .
(6)

If t is 2, then

S= S 1 S 2 T =Eâ‹… H T .
(7)

2.2. Lookup tables

In this article, we utilized the method of Zhao et al. [24] based on the fast lookup tables for finding roots of quadratic and cubic polynomial of σ(x). Similar approach has been used in [16, 17].

2.3. Solutions

Hiding message m to the binary stream V requires to find the positions of the coefficients to be flipped. In this article, we used a method presented in [16, 17] to get one, two, three, or four flips solutions. The set of all possible solutions for one, two, three, or four flips has to be stored in the look up tables J1, J2, J3, and J4, respectively. The notation J3(S) returns all three flips solutions for syndrome S = {S1S2}. Similarly, we can get all possible solutions for block n1 with syndrome SI, for block n2 with syndrome SII, as JI = {J1(SI) J2(SI) J3(SI) J4(SI)} and for block n2 with syndrome SII as JII = {J1(SII) J2(SII) J3(SII) J4(SII)}, respectively. The look up tables' size is (22·m- 1) × nS where nS is a number of stored solutions.

3. Proposed data hiding scheme

In the proposed BCH data hiding scheme, we combine two BCH blocks of 2m - 1 DCT coefficients into one, such that BCH blocks intersect each other. Figure 1 shows the block diagram of coefficients for the proposed scheme. In the presented example, (a1, a2, a3, ..., a25) is the combined block of the DCT coefficients; ( v 1 ′ , v 2 ′ , v 3 ′ , … , v 15 ′ ) and ( v 1 ″ , v 2 ″ , v 3 ″ , … , v 15 ″ ) are the corresponding binary coefficients for the BCH blocks n1 and n2, respectively. Intersected area I covers five coefficients a11, a12, a13, a14, and a15 in this example. Such a scheme can hide more amounts of data by exploiting the intersected area using any kind of coding schemes.

Figure 1
figure 1

Two intersected blocks of the modified BCH data hiding scheme.

One of the two main contributions of this article is to present a systematic algorithm for the joint solutions. The proposed BCH-based data hiding scheme requires to find a joint solution for both blocks n1 and n2 using the guidelines from Section 2.1 such that the intersected area does not affect the result. For example, let 8 bits be hidden into 15 coefficients from a1 to a15 using the BCH-based steganography. Then, another 8 bits can be hidden into the next block having another 15 coefficients from a11 to a25. This is the traditional approach. As a result, 16 bits can be hidden into 30 coefficients. However, our new approach hides the same amount of data into 25 coefficients a1 to a25. Eight bits are hidden into the coefficients from a1 to a15, and another eight bits into the coefficients from a11 to a25. Data hiding algorithm requires to find syndromes SI and SII (Equation 6) for each block n1 and n2, respectively.

There are two possible ways for hiding data into the combined blocks. Either hiding data into the block n1 first, or into the block n2 first. The proposed algorithm for getting a joint solution is designed as follows:

  1. 1.

    Hiding data into the block n 2 first.

  2. (a)

    Some solutions for hiding data into the block n 1 do not modify the coefficients in the intersected area. Thus, solutions for the block n 2 have to be obtained using the original syndrome S II. Some solutions are valid since they do not modify the coefficients in the intersected area. These solutions are called specified solutions.

  3. (b)

    Some solutions for the block n 1 modify the coefficients in the intersected area. These modifications in the intersected area affect the syndrome for the block n 2. Thus, the new syndrome for the block n 2 is obtained as S II new. Some new solutions are valid since they do not modify the coefficients already modified by the n 1 in the intersected area.

Among all possible solutions for the block n2 and new syndrome S n e w II (in case of 1(a), S n e w II = S II ), choose the solutions which do not have flipping positions in the intersected area (i.e., valid or specified solutions). Thus, the joint solutions for a combined block unify the solutions for the block n1 and its syndrome SI and the specified solutions for the block n2 and its syndrome SII new.

  1. 2.

    Hiding data into the block n 2 first.

  2. (a)

    Some solutions for hiding data into the block n 2 do not modify the coefficients in the intersected area. Thus, solutions for the block n 1 have to be obtained using the original syndrome S I . Some solutions are valid since they do not modify the coefficients in the intersected area.

  3. (b)

    Some solutions for the block n 2 modify the coefficients in the intersected area. These modifications in the intersected area affect the syndromes for the block n 1. Thus, the new syndrome for the block n 1 is obtained as S n e w I . Some new solutions are valid since they do not modify the coefficients already modified by the n 2 in the intersected area.

The joint solutions for a combined block unify the solutions for the block n2 and its syndrome SII and the specified solutions for the block n1 and its syndrome SI new (in case of 2(a), S n e w I = S I ).

In general, the proposed modified BCH data hiding schemes hides 4·m bits of data to the block of 2·(2m-1)-|I| by using the BCH scheme (2m-1, k, 2) for blocks n1 and n2.

The proper BCH-based data hiding scheme needs a suitable parameter m for hiding message M into the stream of N nonzero DCT coefficients. The parameter m can be obtained as follows:

4 ⋅ m ⋅ N 2 ⋅ 2 m - 1 - | I | ≥ M ,
(8)

where m defines the proper BCH-based scheme for the proposed method, N is the number of nonzero DCT coefficients, M is the hidden message, np = 2·(2m-1)-|I| is the size of the combined block, 4·m is the capacity of the combined block.

3.1. Data hiding algorithm

The proposed method requires to find the solution for two blocks n1 and n2 for hiding two messages m1 and m2 together such that

m 1 = H â‹… R 1 m 2 = H â‹… R 2
(9)

where R1 and R2 are the modified streams of the binary coefficients obtained from n1 and n2 (see Figure 1); H is a parity-check matrix from Equation (1).

Note that, hiding message m1 to block n1 modifies the block n2 and vice versa, due to the intersected part. Hence, we need proper positions to flip by solving Equation (9) for correct decoding.

Among all possible solutions, the proposed method unifies the solutions for blocks n1 and n2, such that the flip positions cover only nonintersected area for both blocks (i.e., J s I = J I ∉I and J s II = J II ∉I, for blocks n1 and n2). In other words, it is desirable to hide data into the block n1 using the solutions from J s I that do not affect the block n2, and vice versa. According to the above explanation, J s I and J s II unify the specified solutions for the blocks n1 and n2, respectively. Here, note that superscript indexes XI and XII present different items for blocks n1 and n2, respectively.

However, even if some flip positions j from the block n1 belong to the intersected area I. Thus, we can consider the effect of those j to get a new solutions for the block n2 and vice versa.

For this purpose, Equation (9) can be rewritten as follows:

P 1 II = S 1 II + β 1 + … + β l P 2 II = S 2 II + β 1 3 + … + β l 3
(10)

where S II = S 1 II S 2 II is the syndrome for blocks n2; S n e w II = P 1 II P 2 II is a new syndrome for blocks n2 after hiding data to block n1; l is the number of the flip positions (j1, ..., j l ) from the block n1 belonged to the intersected area I (i.e., j1, ..., j l = JI (SI ) ∈ I); and the values β1, ..., β l are computed using Equation (13) for the flipping positions ( j 1 ′ , … , j l ′ ) =F ( j 1 , … , j l ) from the intersected area I for the block n2. Function F converts indexes (j1, ..., j l ) of the intersected area from the block n1 to the corresponding indexes ( j 1 ′ , … , j l ′ ) from the block n2. For example, solution for the block n1 illustrated in Figure 1 is J I S I = 3 11 . j1 = 11 ∈ I, where index 1 means the first coefficient form the intersected area I. Coefficient j1 = 11 is located in the 11th position of the combined block. However, 11th coefficient in the combined block is the 15th coefficient in the block n2 (i.e., F j 1 =F 11 = j 1 ′ =15 see Figure 1). Thus, even if the flip positions for blocks n1 and n2 are different (i.e., j1 = 11 and j 1 ′ =15), those coefficients have the same location in the combined block.

Finally, the solution for the block n2 can be obtained as { j 1 ′ , … , j l ′ , J s II ( S n e w II ) } . Presented solution sufficiently hides message m2 into the block n2.

The joint solution hides both messages m1 and m2 into the combined blocks. The joint solution J I S I , J s II S n e w II unifies the solutions for the blocks n1 and n2. In this example, the flipping positions from the intersected area are the part of JI(SI).

Similarly, we can get a joint solution by using the current solution for block n2 (i.e., JII(SII)). For this purpose, Equation (9) can be rewritten again as follows:

P 1 I = S 1 I + β 1 + … + β l P 2 I = S 2 I + β 1 3 + … + β l 3
(11)

where S I = S 1 I S 2 I is the syndrome for blocks n1; S n e w II = P 1 II P 2 II is the new syndrome of the block n1 after hiding data to block n2; l is the number of flip positions ( j 1 ′ , … , j l ′ ) for the block n2 belonged to the intersected area I (i.e., ( j 1 ′ , … , j l ′ ) = J II ( S II ) ∈I); β1, ..., β l are computed using Equation (15) for the flipping positions ( j 1 , . . . , j l ) = F - 1 ( j 1 ′ , … , j l ′ ) from the intersected area I for the block n1; function F-1 (i.e., the inverse function of F) converts the indexes of the coefficients of intersected area ( j 1 ′ , … , j l ′ ) from the block n2 to the corresponding indexes (j1, ..., j l ) from the block n1. For example, if J II S II = 1 15 , then j 1 ′ =15∈I, then j 1 = F - 1 j 1 ′ = F - 1 15 =11 (see Figure 1).

The solution for the block n1 can be obtained as ( j 1 , . . . , j l ) J s I ( S n e w I ) . Presented solution sufficiently hides message m1 into the block n1.

Joint solution for hiding both messages m1 and m2 is J s I ( S n e w I ) J II ( S II ) . Here, the flipping positions from the intersected area ( j 1 ′ , … , j l ′ ) are the part of JII(SII). Corresponding flipping positions ( j 1 , . . . , j l ) = F - 1 ( j 1 ′ , … , j l ′ ) are the part of the solution for the block n1.

Note that there are several solutions in JI and JII for syndromes SI and SII, respectively. Presented method may generate one joint solution for each solution from JI(SI) and JII(SII).

The proposed method requires to find values β from the flip positions (j1, ..., j l ) or ( j 1 ′ , … , j l ′ ) . The relationship between β and flip position j is presented as follows:

j= log ( β )
(12)

or

β= lo g - 1 ( j )
(13)

The complete procedure for getting all possible joint solutions for any syndromes is presented as follows:

For a given combined block of binary coefficients a and two messages m1and m2process follows:

  1. (a)

    Define two blocks of the DCT coefficients n 1 and n 2 (see Figure 1). Compute syndromes S I and S II using corresponding binary streams v' and v".

  2. (b)

    Find all possible solutions j I = J I(S I) and j II = J II(S II) for blocks n 1 and n 2 by using the syndromes S I and S II.

  3. (c)

    For each solution j I(p) (p = 1, 2, 3,..,k, where k is the number of solutions) process follows:

  4. i.

    Define flip positions j 1, ..., j l from the intersected area I.

  5. ii.

    Convert j 1, ..., j l to j 1 ′ ,…, j l ′ (corresponding flip positions from the block n 2). Compute corresponding β using Equation 13. Compute new syndrome S n e w II using Equation 10.

  6. iii.

    Using a new syndrome S n e w II get new flips solutions as j n e w II = J s II S n e w II .

  7. iv.

    For each solution j n e w II ( q ) (q = 1, 2, 3,...,z, where z is the number of solutions) store the joint solution: j I p , j n e w II ( q ) .

  8. (d)

    For each solution j II (p) (p = 1, 2, 3,...,k) process follows:

  9. i.

    Define flip positions j 1 ′ ,…, j l ′ from the intersected area I for block n 2.

  10. ii.

    Convert j 1 ′ ,…, j l ′ to j 1, ..., j l Compute corresponding β. Compute new syndrome S n e w I using Equation 11.

  11. iii.

    Using a new syndrome S n e w I get new flips solutions as j n e w I = J s I S n e w I .

  12. iv.

    For each solution j n e w I ( q ) (q = 1, 2, 3,...,z, where z is the number of solutions) store the joint solution: j n e w I q , j II ( p ) .

The stored joint solutions are used further to hide data with better performance. Note that the proposed method needs to search the best solution among k·q possible candidates for each block (see steps c and d). Thus, computational complexity of the proposed search algorithm is O(n2).

3.2. Two-stage embedding technique

In order to enhance the performance of the blockwise methods (i.e., ME, MME, BCH-based data hiding, etc.), we utilize almost all the DCT coefficients for data hiding. The proposed method uses two different embedding schemes together. Two schemes use the different block sizes n 1 p and n 2 p , and have different payloads m 1 p and m 2 p .

This method divides the stream of the DCT coefficients (c1, c2, ..., c N ) and the message M into two parts and hides data into each part separately. The optimal number of the blocks (k1 and k2) for both schemes can be computed as follows:

The relation between the numbers of blocks for the schemes 1 and 2 is presented as follows:

n 1 p ⋅ k ′ 1 + n 2 p ⋅ k ′ 2 = N m 1 p ⋅ k ′ 1 + m 2 p ⋅ k ′ 2 = M
(14)

where N is the number of DCT coefficients.

The computed k 1 ′ and k 2 ′ are noninteger numbers. Thus, we have to choose the nearest integers k 1 =⌈ k 1 ′ ⌉±1 and k 2 = ⌈ k 2 ′ ⌉ ± 1 such that:

n 1 p ⋅ k 1 + n 2 p ⋅ k 2 ≤ N m 1 p ⋅ k 1 + m 2 p ⋅ k 2 ≥ M
(15)

The presented two-scheme embedding method improves the performance of data hiding by using the proper distribution of the available DCT coefficients among two different modified BCH schemes. First scheme uses m 1 p =4â‹…m obtained from inequality (8), the second scheme uses m p 2 = 4 â‹… m + 1 . Note that the second scheme has higher embedding efficiency. The efficiency of the two schemes embedding refers to the ratio between number of blocks k1 and k2 for the schemes 1 and 2, respectively. The larger the value k1 (smaller ratio k1/k2), the higher efficiency of the proposed two schemes embedding for the same m.

The two-scheme embedding method enables to use different sizes of the intersected area for both schemes Ish 1and Ish 2, respectively (see Tables 1 and 2 We test several sizes of the intersected areas and several payloads. In the experiments, we try to hide data into a set of 4,000 natural images and compute performance against the steganalysis [20, 25] for different sizes of the intersected areas and payloads. Results are presented in Tables 1, 2, and 3.

Table 1 Accuracy of the steganalysis [20] for different sizes of the intersected areas and payloads
Table 2 Accuracy of the steganalysis [25] for different sizes of the intersected areas and payloads
Table 3 The most appropriate intersected area size versus payload

Boldface numbers in Tables 1 and 2 link to the lowest accuracy and show the most appropriate intersected area size for each tested payload. Data hiding by using the most appropriate intersected area always shows better results. Tables 1 and 2 also indicate a difference between the proposed method and the original BCH-based steganography method [16] in terms of performance of the steganalysis [20, 25]. The most appropriate intersected area size presented in Table 3 was used later for other experiments.

4. Inserting-removing strategy

The performance of the proposed method can significantly be increased by using inserting-removing strategy. The proposed strategy is based on fact that the block of the 2m -1 DCT coefficients can be modified before data hiding by inserting or removing coefficients 1 and -1. Data hiding to modified stream of DCT coefficients may result lower distortion and, as a result, lower detectability of the steganalysis. Such a modification has to be carried out carefully and sophisticatedly in order to reduce distortion.

The proposed inserting-removing strategy uses the stream of nonrounded quantized DCT coefficients a q computed as follows:

a ′ =DCT B , a q = a ′ Q , a r =round ( a q )
(16)

where B is the 8 × 8 block of the image pixels; a' is the block of original DCT coefficients; a q is the block of DCT coefficients divided by corresponding coefficients from quantization matrix Q; a r is the block of quantized DCT coefficients; Q f is a quality factor.

Each nonzero integer DCT coefficient has a corresponding informative bit computed as follows:

b= a r m o d 2 i f a r > 0 , a r - 1 m o d 2 i f a r < 0
(17)

According to the proposed inserting-removing strategy, the stream a of nonrounded DCT coefficients obtained from the blocks a q is divided into three sets: modifiable c m = a ∈ (-∞; -1.5) ∪ (1.5;∞), removable c R = a ∈ [-1.5; -0.5) ∪ (0.5;1.5], and insertable c Ins = a ∈ [-0.5; -0.25) ∪ (0.25;0.5]. Set c unifies modifiable, insertable, and removable sets (i.e., c = c m ∪ c R ∪ c Ins ). The set C = c m ∪ c R contains all nonzero rounded DCT coefficients. According to Equation (17), only the nonzero DCT coefficients (i.e., set C) have the corresponding informative coefficients and can be used for hiding data.

The proposed steganographic method uses the stream of n p nonzero DCT coefficients from the set C for data hiding. In general, set C is the subset of the unified set c. Thus, each block unifies the n p coefficients form set C and some insertable coefficients from the set c (i.e., c b = c m ′ ∪ c R ′ ∪ c I n s ′ , where C ′ = c m ′ ∪ c R ′ is the block of n p nonzero DCT coefficients from the set C). Inserting or removing of any coefficients from c I n s ′ and c R ′ produces a new block C' with new solution for data hiding. As a result, inserting-removing strategy significantly increases the number of possible solutions and helps to find the most appropriate solution with the lowest distortion.

In the proposed improved matrix encoding, we use the same measure for computing distortion similar to MME [14]. The distortion for each DCT coefficient is computed as follows:

D= E 2 â‹… Q 2
(18)
E= 0 . 5 - C - ⌊ C ⌋ , if C ∈ c m 1 . 5 - C , if C ∈ c R

The distortion due to inserting or removing D IR is computed as follows:

D I R = 0 . 5 - C 2 ⋅ Q 2 ,ifC∈ c R ∪ c I n s
(19)

where Q is the corresponding quantization coefficient of the quantization table.

The resulted distortion for the combined block of DCT coefficients is computed as follows:

D b = ∑ i = 1 l D i + D I R
(20)

where l is the number of flipped coefficients.

Flipped coefficients are computed as follows:

A r = 2 , i f a r = 1 , - 2 , i f a r = - 1 , a r + 1 , i f a q > a r , a r - 1 , i f a q < a r ,
(21)

5. Encoder and decoder

The encoder of the proposed steganographic method based on modified BCH data hiding scheme and inserting-removing strategy is organized as follows:

For a given bitmap image I m , payload P, quality factor Q f , and secret key K process follows:

  1. 1.

    Divide image I m into nonoverlapped 8 × 8 blocks of pixels and process DCT, quantization and rounding as presented in (16). Remove DC coefficients. Obtain a', a q , a r , and streams of DCT coefficients a. Permute stream a using K and any pseudo-random generator. Obtain stream c = a ∈ (-∞; -0.25) ∪ (0.25;∞) from the permuted stream a.

  2. 2.

    Define sets: modifiable c m , insertable c Ins , and removable c R .

  3. 3.

    Define parameters for schemes 1 and 2, and number of the blocks k 1 and k 2 using (14) and (15). Divide message M into two parts: M 1 = m 1 p â‹… k 1 and M 2 = m 2 p â‹… k 2 .

  4. 4.

    Start from the first block i = 1. Define the i th block of the DCT coefficients c b i = c m i ′ ∪ c R i ′ ∪ c I n s i ′ , where c m i ′ , c R i ′ , and c I n s i ′ are the modifiable, removable, and insertable subsets for the current block. If i = k 1 +1 switch to the scheme 2.

  5. 5.

    Define the block of nonzero rounded DCT coefficients C i ′ = c m i ′ ∪ c R i ′ .

  6. 6.

    Get the solutions for the block C i ′ using the modified BCH data hiding scheme (see the algorithm in Section 3). Compute the distortion D for each solution using Equation (20). Choose solution J m with the lowest distortion D m and store it.

  7. 7.

    Modify the block C i ′ by inserting or removing coefficients from the subsets c R i ′ , and c I n s i ′ . Obtain a new block: (i) after removing C i ′ = c m i ′ ∪ c R i ″ , where c R i ″ = c R i ′ - c R i ′ ( p ) is the modified removable set and c R i ′ ( p ) is the removed coefficient; (ii) after inserting C i ′ = c m i ′ ∪ c R i ′ ∪ c I n s i ′ ( q ) , where c I n s i ′ q =±1 is the inserted coefficient. p and q are the current position for insertion and removing.

  8. 8.

    Repeat steps 5-6 for all insertable and removable coefficients from c R i ′ , and c I n s i ′ .

  9. 9.

    Among all stored solutions J m choose solution with the lowest distortion D m . Modify one, two, or three coefficients according to the best solution (see explanation in Section 2) and, if necessary, insert or remove coefficient in the block c b i .

  10. 10.

    Process all k 1 + k 2 blocks using steps 4-9. Obtain the modified stream c ′ = c b 1 , c b 2 , … , c b k 2 + k 2 .

  11. 11.

    Recover the original sequence order of the DCT coefficients a from the modified stream c' using the secret key K and utilized pseudo-random generator. Add DC coefficients, round the coefficients a', and obtain the modified JPEG image I m ′ .

The decoder of the proposed steganographic method is organized as follows:

For the given modified JPEG image I m ′ , quality factor Q f , secret key K, and size of the payload p = |P| process follows:

  1. 1.

    Read the DCT coefficients from the JPEG file. Permute them using the secret key K and utilized pseudo-random generator. Remove the DC coefficients. Obtain the stream of nonzero DCT coefficients C.

  2. 2.

    Using Equations (15) and (16) define parameters of the schemes 1 and 2, and the number of blocks k 1 and k 2. Here, N = |C|.

  3. 3.

    Divide C into the blocks according to the k 1 and k 2.

  4. 4.

    Decode data from each block using (9).

The steganographic method based only on modified BCH data hiding scheme skips the steps 7 and 8.

6. Experimental results

In these experiments, we try to hide different amount of data into the set of uncompressed images using the proposed BCH-based data hiding scheme with and without the inserting-removing strategy. The set of modified and original compressed images is analyzed by two powerful steganalysis algorithm proposed by Pevny and Fridrich [20] and Kodovsky and Fridrich [25]. Those methods use 274 and 548 different features of the DCT coefficients, respectively. The union of the 274 or 548 features from the unmodified and modified images are used for making the models for the support vector machine (SVM) with parameter C = 104 and kernel width γ = 10-4. A set of 4,000 natural uncompressed images (768*512) downloaded from Corel Draw and obtained from several digital cameras is used in our experiments. Proposed method needs 1-5 min for hiding data to each image. Experiments are carried out for seven different payloads (0.05, 0.1, 0.15, 0.17, 0.20, 0.22, and 0.25 bits per nonzero coefficient--bpc) and quality factor 75. SVM training process needs a set of 3,000 images (1,500 original and 1,500 stego images) for 7 different payload sizes. The SVM engine tests for 7 obtained models to test a set of 1,000 images (500 original and 500 stego) for 7 different payload sizes. The result shows the error probabilities of the steganalysis for each tested payload (see Figures 2 and 3).

Figure 2
figure 2

Error probability versus payload (bpc) for quality factor 75 using steganalysis [20].

Figure 3
figure 3

Error probability versus payload (bpc) for quality factor 75 using steganalysis [25].

The error probability is computed as follows:

e= 1 2 ( P a + P b ) ,
(28)

where P a is the probability of misdetection (i.e., the unmodified image is classified as modified) and P b is the probability of misclassification (i.e., the modified image is classified as unmodified).

In our experiments, we test both methods: (1) based only on the modified BCH-based data hiding scheme; and (2) the modified BCH-based data hiding scheme with the proposed inserting-removing strategy. The proposed methods achieve high error probability for all the tested payloads. For payloads up to 0.1 bpc, both methods have detectability close to 50%, meaning that the steganalysis cannot distinguish the unmodified images from the modified. This probability is almost equal to that of the coin toss. For higher payloads around 0.15 and 0.2 bpc, the proposed methods show much better performance compared to the MME. Significant improvement over the MME is justified on the fact of using methods with larger embedding efficiency (i.e., the BCH-based schemes with large m). The proposed method also shows better results compared to the methods based on the original BCH-based schemes. Hence, the proposed method with the inserting-removing strategy shows the significant improvement over the method with modified BCH-based data hiding scheme only, by 0.0363, 0.0414, and 0.0392 points in terms of error probabilities for payloads 0.15, 0.2, and 0.25, respectively. For payload of 0.25 bpc, both methods show 0.2961 and 0.3353 of the error probability. The error probabilities are better than those of the MME [14], original BCH-based [16], heuristic BCH-based scheme [17], and syndrome trellis code STC [22] proposed by Kodovsky and Fridrich. Such improvement was achieved by using modified BCH-based data hiding and unique inserting-removing strategy.

7. Conclusion

In this article, an efficient data hiding technique for steganography is presented. The proposed BCH-based data hiding scheme uses two blocks to form a single combined block. A new data hiding strategy enables to get a joint solution for two blocks with intersected coefficients. Due to intersection, the proposed method requires small number of coefficients for hiding the same amount of data compared with the original nonoverlapping blockwise approaches. As a result, the proposed method can use the BCH-based schemes with large m (i.e., lager capacity). Even though the proposed method requires to use the same BCH-based scheme (for 0.17 and 0.2 bpc), the efficiency of data hiding is still high because the proposed two-scheme embedding has a lower ratio k1\k2 compared to the original BCH-based scheme. The proposed BCH-based data hiding scheme significantly outperforms the MME and original BCH-based steganography in terms of the error probabilities and accuracy against the steganalysis. The proposed two-scheme embedding technique (see Equations 14 and 15) enables to use almost all the available DCT coefficients. The proposed strategy based on inserting and removing coefficients 1 or -1 increases the number of possible solutions and significantly decreases the total distortion. The experimental results show that the inserting-removing strategy significantly improves the performance of the proposed method. The combination of the modified BCH-based and the inserting-removing strategy achieves higher error probabilities and lower accuracy against the powerful steganalysis.

References

  1. Provos N: Defending against statistical steganalysis. In Proc of 10th USENIX Security Symposium. Washington, DC; 2001:24-24.

    Google Scholar 

  2. Eggers J, Bauml R, Girod B: A communications approach to steganography. In Proc of EI SPIE. Volume vol. 4675. San Jose, CA; 2002:26-37.

    Google Scholar 

  3. Noda H, Niimi M, Kawaguchi E: Application of QIM with dead zone for histogram preserving JPEG steganography. In Proc of ICIP. Geneva, Italy; 2005.

    Google Scholar 

  4. Solanki K, Sakar A, Manjunath BS: YASS: Yet another steganographic scheme that resists blind steganalysis. Lect Notes Comput Sci 2007, 2939: 154-167.

    Google Scholar 

  5. Westfeld A: High capacity despite better steganalysis (F5--a steganographic algorithm). Lect Notes Comput Sci 2001, 2137: 289-302.

    Article  Google Scholar 

  6. Fridrich J: Minimizing the embedding impact in steganography. In Proc of ACM Multimedia and Security Workshop. Geneva, Switzerland; 2006:2-10.

    Google Scholar 

  7. Fridrich J: Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes. Lect Notes Comput Sci 2005, 3200: 67-81.

    Article  Google Scholar 

  8. Fridrich J, Filler T: Practical methods for minimizing embedding impact in steganography. In Proc EI SPIE. Volume vol. 6505. San Jose, CA; 2007:2-3.

    Google Scholar 

  9. Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography using wet paper codes. In Proc of ACM Workshop on Multimedia and Security. Magdeburg, Germany; 2004:4-15.

    Google Scholar 

  10. Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography. ACM Multimedia Secur J 2005, 11(2):98-107.

    Article  Google Scholar 

  11. Fridrich J, Pevny T, Kodovsky J: Statistically undetectable JPEG steganography: dead ends, challenges, and opportunities. In Proc of ACM Workshop on Multimedia and Security. Dallas, TX; 2007:3-15.

    Google Scholar 

  12. Fridrich J, Goljan M, Soukal D: Perturbed quantization steganography. ACM Multimedia Secur J 2005, 11(2):98-107.

    Article  Google Scholar 

  13. Fridrich J, Goljan M, Soukal D: Wet paper coding with improved embedding efficiency. IEEE Trans Inf Secur Forensics 2005, 1(1):102-110.

    Article  Google Scholar 

  14. Kim YH, Duric Z, Richards D: Modified matrix encoding technique for minimal distortion steganography. Lect Notes Comput Sci 2006, 4437: 314-327.

    Article  Google Scholar 

  15. Schönfeld D, Winkler A: Reducing the complexity of syndrome coding for embedding. Lect Notes Comput Sci 2008, 4567: 145-158.

    Article  Google Scholar 

  16. Zhang R, Sachnev V, Kim HJ: Fast BCH syndrome coding for steganography. Lect Notes Comput Sci 2009, 5806: 48-58.

    Article  Google Scholar 

  17. Sachnev V, Kim HJ, Zhang R: Less detectable JPEG steganography method based on heuristic optimization and BCH syndrome coding. In Proc of ACM Workshop on Multimedia and Security. Princeton, NJ; 2009:131-139.

    Chapter  Google Scholar 

  18. Filler T, Fridrich J: Steganography using Gibbs random fields. In Proceedings of ACM Multimedia and Security Workshop. Rome, Italy; 2010:199-212.

    Chapter  Google Scholar 

  19. Upham D[http://www.funet.fi/pub/crypt/stegangraphy/jpeg-jsteg-v4.diff.gz]

  20. Pevny T, Fridrich J: Merging Markov and DCT features for multi-class JPEG steganalysis. In Proc of SPIE. Volume vol. 6505. San Jose, CA; 2007:3-4.

    Google Scholar 

  21. Shi YQ, Chen C, Chen W: Markov process based approach to effective attacking JPEG steganography. Lect Notes Comput Sci 2006, 4437: 249-264.

    Article  Google Scholar 

  22. Filler T, Judas J, Fridrich J: Minimizing embedding impact in steganography using trellis-coded quantization. IEEE Trans Inf Secur Forensics 2011, 6(3):920-935.

    Article  Google Scholar 

  23. Rifa-Pous H, Rifa J: Product perfect codes and steganography. Digital Signal Process 2009, 19: 764-769.

    Article  Google Scholar 

  24. Zhao Z, Wu F, Yu S, Zhou J: A lookup table based fast algorithm for finding roots of quadratic or cubic polynomials in the GF(2m). J Huazhong Univ Sci Technol (Nat Sci Ed.) 2005, 33(1):70-73.

    MathSciNet  Google Scholar 

  25. Kodovsky J, Fridrich J: Calibration revisited. In Proceedings of the 11th ACM Multimedia & Security Workshop. Edited by: Dittmann J, Craver S, Fridrich J. Princeton, NJ; 2009.

    Google Scholar 

Download references

Acknowledgements

This study was supported by the Catholic University of Korea, National Research Foundation of Korea (grant 2011-0013695), ITRC and BK21 Project, Korea University and IT R&D program (Development of anonymity-based u-knowledge security technology, 2007-S001-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hyoung Joong Kim.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Sachnev, V., Kim, H.J. Modified BCH data hiding scheme for JPEG steganography. EURASIP J. Adv. Signal Process. 2012, 89 (2012). https://doi.org/10.1186/1687-6180-2012-89

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-6180-2012-89

Keywords