Progressive Exponential Clustering-Based Steganography

Cluster indexing-based steganography is an important branch of data-hiding techniques. Such schemes normally achieve good balance between high embedding capacity and low embedding distortion. However, most cluster indexing-based steganographic schemes utilise less e ﬃ cient clustering algorithms for embedding data, which causes redundancy and leaves room for increasing the embedding capacity further. In this paper, a new clustering algorithm, called progressive exponential clustering (PEC), is applied to increase the embedding capacity by avoiding redundancy. Meanwhile, a cluster expansion algorithm is also developed in order to further increase the capacity without sacriﬁcing imperceptibility.


Introduction
Many steganographic schemes have been developed for hiding data in vector-quantisation (VQ) compressed colour images (also called palette images) [1][2][3][4][5][6][7][8][9][10][11][12][13][14].Although there are variations among them, a common feature of these methods is that they partition the codebook into a number of groups or clusters and then embed the secret message by replacing the codeword indices of the compressed image with those of the same group/cluster selected according to the corresponding secret data bits.For example with a cluster of 8 (= 2 3 ) codewords, each codeword can embed 3 bits of the secret message.If the binary secret data bits is 010 2 , (or 110 2 ), the second (or sixth) codeword is used to replace the original codeword.The receiving end of the stego-image needs to have the same clustering of the same codebook.The secret message is extracted by concatenating the position/index (in binary form) of the received codewords in their groups/clusters.Therefore, we can see that the greater the cluster, the greater the embedding capacity of each codeword of the cluster [1,4,12].The size of a cluster is determined by the distance between each codeword and the cluster's centroid.The greater the distance is allowed, the larger the cluster is.However, the greater a cluster is, the greater the variance among the codewords in the cluster becomes, meaning the average embedding distortion is greater because the possibility that a codeword gets replaced with a more distant codeword is higher [4,12].So striking a good balance between embedding capacity and embedding distortion is important, but unfortunately not trivial.The feasibility resides in the optimality of the codebook clustering algorithm [4,12].Because of the indexing characteristic of this type of schemes, we describe them as cluster indexingbased steganography in this paper.In Du and Hsu's work [2], the clustering algorithm treats the secret message as a clustering parameter.The value of the secret message and the size of groups from each clustering step are combined as the threshold of the clustering algorithm.As a result, when embedding different secret messages, the codebook must be reclustered.Furthermore, the performance of the method is dependent on the secret message, making the performance evaluation of the algorithm difficult.Another drawback of some schemes of this category (e.g., [1]) is that the size of each cluster is not a power of 2, making some colours in the same cluster redundant, thus reducing overall embedding capacity.For example, if the size C of a cluster C is not a power of 2, then C − 2 log 2 C colours are redundant and not contributing to the improvement of embedding capacity.
A less studied steganographic technique is the application of the aforementioned cluster indexing idea to hiding data in true colour images [15,16] due to the fact that the size of the palette (codebook or colour table) is 2 24 , which is 2 16 times greater then that of VQ-compressed images and complicates the clustering of colours.In [16], instead of using the entire EURASIP Journal on Advances in Signal Processing colour space with 2 24 colours as the codebook, a colour table containing only the colours present in the original image is first created.A k-means clustering algorithm is then applied to cluster the colours in the colour table into N clusters/groups.Thirdly, the entire colour space with 2 24 colours is partitioned into N cubes, with each cube centred at one centroid of the N clusters formed in the first step.Subsequently, each cube is reduced to a 3-dimensional sphere containing 2 h colours, where h is a prespecified parameter which determines the capacity and distortion of the scheme.Finally, to embed data bits with a value equal to d into a pixel, the algorithm identifies the cluster which contains the colour of the pixel and then replaces the colour with the dth colour in the sphere centred at the centroid of the identified clusters.To extract the secret data from the received image, an N-element colour palette (i.e., the centroids of the same N colour clusters) has to be transmitted to the recipient so that the same colour clusters can be reestablished.From each pixel, the index of the pixel's colour in its cluster is taken as the secret data bits carried by the pixel.By concatenating those colour indices, the complete secret message can be formed if the stego-image is not manipulated.Although high embedding capacity of this scheme has been reported, high distortion due to the low density of the colour clusters makes this scheme unacceptable.Based on [16], Brisbane et al. proposed another scheme [15], aiming at trading embedding capacity for embedding distortion.Although the objective of reducing embedding distortion is achieved, the same requirement of communicating the extrapalette to the receiving side stands as the main limitation on the feasibility of the scheme.For example, if N = 1000, then approximately 3,000 bytes of data have to be transmitted to the recipient.This extracommunication not only requires extraresource, but more seriously, presents a security gap for potential attack.Moreover, a common limitation of the aforementioned methods is that they are not immune to histogram analysis as we will discuss in Section 4.1.
In this work, we propose a Progressive Exponential Clustering-(PEC-) based steganographic scheme, aiming at striking a good balance between high embedding capacity and low embedding distortion.Meanwhile, the proposed scheme does not have to transmit the palette from the embedding side to the recipient, hence strengthening security.Moreover, the proposed scheme is immune to histogram analysis [1,15,16].

Progressive Exponential Clustering-(PEC-) Based Steganography
In this section, we propose a Progressive Exponential Clustering-(PEC-) based steganographic scheme for hiding secret data in true colour images.Figure 1 illustrates the main idea of the proposed scheme.until no more matches can be made.The centroid/average of each cluster pair is calculated, and then, with the leftover clusters excluded, the entire new set of the paired clusters is subjected to the same pairing operation under the constraint of the same threshold T in the next iteration.The pairing/clustering operation iterates progressively until no match is made throughout an entire iteration.The final clustering is achieved by concatenating the leftover clusters of all iterations.The reason we want the size of each cluster to be equal to a power of 2 is to eliminate redundancy so as to increase the overall embedding capacity.The most accurate way of pairing up colours/clusters is to conduct an exhaustive search and pair up the closest ones.However, we observed that the size of the colour table of 1024 × 1024-pixel images normally varies from 100,000 to 700,000.Exhaustive search in such an enormous colour table is by no means computationally feasible.So we employ a suboptimal, yet more efficient search method, called incremental clustering based on the idea proposed by Jain et al. [17].The idea is that patterns are sorted in a random order, each treated as singleton clusters, and then incrementally merged to form larger clusters.When growing a cluster, the algorithm searches for the first pattern or cluster outside the growing cluster, with a distance to the growing cluster's centroid shorter than a threshold, to merge.According to [17], the time complexity of a k-means clustering algorithm is O(nlk), where n is the number of the colours to be clustered, k is the number of the clusters in the final cluster configuration, and l is the number of iterations taken by the algorithm before convergence.In comparison, the time complexity of the incremental clustering algorithm is only O(n log n).
Let Λ be the colour table, C i the set of clusters in the ith iteration of the clustering process, C i (x) the xth cluster in C i , u i (x) the centroids of C i (x), D(u i (x), u i (y)) the Euclidean distance between the centroids u i (x) and u i (y) of clusters C i (x) and C i (y), T a Euclidean distance threshold, and C the final clustering configuration of the colour table Λ.The proposed PEC algorithm is described in Algorithm 1. Figure 2 illustrates an example of the working of the PEC algorithm.Each circle represents a colour in the colour table Λ to be clustered.At the beginning of Iteration 0 in Figure 2(a), every individual colour is considered as a singleton cluster, and all of the circles/colours are sorted, with their indices determined by a secret key K.At the end of Iteration 0 as demonstrated in Figure 2(b), each circle/colour is either paired up with another with a distance shorter than T to form a new cluster {C 0 (m), C 0 (n)} of C 1 or left in C 0 .At the beginning of Iteration 1, as demonstrated in Figure 2(c), the centroids (represented as dots) of the successfully paired clusters in C 1 are calculated.Based on the centroids of the successfully paired groups, the clustering continues in Iteration 1 (see Figure 2(d)).After Iteration 1, there is only one cluster left, which satisfies C i ≤ 1 in step (4.4) of Algorithm 1.Therefore, the PEC algorithm stops.
In order to demonstrate the idea of the PEC algorithm in a simple manner, let us use a scalar number to represent a colour of Λ in the following example.For example, given after randomisation under the control of a secret key K, and the threshold T = 5, the clustering process is as follows.
After Iteration 0, After Iteration 1, Since C 2 = 1, according to step (4.4) and ( 5) of Algorithm 1, the final clustering is ( C is a clustering configuration consisting of clusters of various sizes, and the size of a cluster C i is C i = 2 i .Therefore, the embedding capacity for a pixel, whose colour belongs to C i , is i bits.Meanwhile, every colour in cluster C i may be used for substitution during the secret data hiding process, hence there is no redundancy in C. In addition, although C 0 contains only one colour and is not embeddable, such a cluster can be expanded to a larger cluster by the expansion algorithm as introduced in Section 2.3, and thus C 0 is included in C. In step (4.3) of Algorithm 1, for one pixel to be paired, the algorithm selects the first colour in the colour table smaller than the distance threshold T rather than the closest colour.
Input: colour table Λ, Euclidean distance T, secret key K Output: cluster configuration C PEC algorithm (1) C 0 = The set of colours in Λ with their orders randomised according to secret key K (each individual colour is seen as a cluster); (2) i = 0; (3) clustering completed:= FALSE; (4) While clustering completed:= FALSE; (4.1) Calculate the centroid μ i (x) of each cluster  This selection policy has two advantages.Firstly, it avoids exhaustive search and greatly reduces the computation load [17].Secondly, this selection policy increases the security of the PEC-based steganographic scheme because it is sensitive to the orders of the colours in the colour ).However, despite the increased capacity, the cluster expansion may also lead to higher embedding distortion if not done in an adaptive manner with regard to the details of the content in the image.Therefore the following two factors need to be taken into account when expanding clusters.
(i) We observed that, for most images without larger homogeneous areas, higher embedding capacity can be achieved without incurring further distortion in terms of both human perception and PSNRs if the added virtual colours are within one standard deviation of the cluster centroid.However, if there are larger homogeneous areas in the image, although the PSNR remains near constant, visual distortion may become more noticeable when too many virtual colours are added to the homogeneous areas.Therefore, the number of the virtual colours should be restricted in the expansion process.
(ii) Another factor worth noting is that overexpansion cannot guarantee the presence of every virtual colour in the stego-image.As a result, the recipient of the stego-image cannot obtain the same colour table as used at the embedding side; consequently he/she is unable to make the correct extraction.Therefore, measures for tackling this situation are necessary.
Let C i (x) be the expanded counterpart of C i (x), 2 h the size of C i (x) (i.e., 2 h = C i (x) ), and p the number of pixels whose colours belong to cluster C i (x).One necessary (but not sufficient) condition for all the colours in cluster C i (x) to appear in the stego-image is p ≥ 2 h .Without loss of generality, we assume that the secret data is uniformly distributed, that is, the probability of the occurrence of every 2 h -bit secret data segment in the entire secret data stream is 1/2 h .For example, in a uniformly distributed secret data sequence, the probability of the occurrence of every 2-bit secret data segment ("00", "01", "11" and "10") is 1/4.According to the data hiding algorithm (described in Section 2.4), the colour, with its index equal to the secret data bits will be substituted in order to carry the secret data bits, will be substituted in order to carry the secret data bits.Therefore, the probability for one colour to be substituted is 1/2 h .Consequently, the probability of one colour not being substituted is 1 − 1/2 h .Performing data embedding on p pixels whose colours are in the same cluster of size 2 h , the possibility λ of one colour in this cluster not being substituted is Therefore, the expansion should not result in a value of λ greater than a given upper bound τ (i.e., λ ≤ τ).That is to say that the following inequality must be satisfied: Additionally, the radius of the expanded cluster should also be smaller than the Euclidian distance threshold T where the right-hand side of Inequality ( 6) is the volume of a 3-deminsional sphere with a radius equal to T. We also require that the size of the expanded cluster still equals a power of 2, in order to avoid redundancy.Therefore, taking the aforementioned factors into consideration, the size of the expanded cluster, 2 h , should satisfy where • is the floor function.The cluster expansion algorithm is presented in Algorithm 2. After applying the cluster expansion to all clusters in the original colour table Λ, the result is an expanded colour table Λ .
To ensure the appearance of every virtual colour in the stego-image and subsequently allow the recipient to reconstruct the same colour table Λ , some pixels have to be chosen to have their original (physical) colours replaced with the virtual colours before data hiding takes place (step (4) of Figure 1).To assign a virtual colour, we identify the first pixel in the image with a physical colour in the same cluster as the virtual colour.Assigning each virtual colour to only one pixel keeps the distortion low.Note that this "expansion" distortion is insignificant when compared to the actual embedding distortion, as discussed in Section 3 and demonstrated in Table 4 and Figures 7 and 8.
Because the virtual colours are not added after the preliminary PEC clustering, there is no guarantee that the cluster configuration can be reconstructed.Therefore after cluster expansion and performing step (4) of Figure 1, a second round of PEC clustering (step (5) of Figure 1) is applied to partition the expanded colour table Λ so as to create a new cluster configuration C before data hiding.
Input: cluster C i (x), probability upper bound τ and Euclidean distance threshold T Output: expanded cluster C i (x) Cluster expansion algorithm (1) Compute p (the number of pixels with their colours belonging to C i (x)); (2) Compute the centroid u i (x) of cluster C i (x) (3) Compute 1/(1 − p √ τ) of Inequality ( 5); (4) Compute h according to (7); (5) Let C i (x) = C i (x) ∪ {2 h − 2 h virtual colours with their distances closest to C i (x)}.Note that by "virtual colours" we mean the colours that are not present in C i (x) and h = log 2 C i (x) .

Data Hiding and Extraction.
The data hiding process is straightforward and is presented in Algorithm 3. At the receiving side, the recipient has to construct the same colour table Λ according to the received image I E , and then apply the same PEC algorithm on Λ in order to obtain the same clustering configuration C .With the same Λ and C the secret data can be extracted correctly.The extraction algorithm performs as follows.If the colour of a pixel is the only member of its cluster C i (x), then this pixel carries no information; otherwise, the index of the colour in the cluster represents the secret data bits.After visiting all pixels, by concatenating those extracted bits, the complete secret data stream can be revealed.The data extraction algorithm is presented in Algorithm 4.

Experimental Results
In Figures 3 to 6, we show 4 original images and their corresponding stego-images with different values of Euclidean distance threshold T. In these figures, we can see that when T ≤ 4, the distortion is imperceptible by human eyes.However, as T increases further, the distortion becomes more noticeable.The following experiments conform to this result (see Table 4).Although the greater the T is, the higher the embedding capacity becomes because there is bigger room for cluster expansion, the distortion also increases.On the other hand, the smaller T is, the lower the distortion becomes.However, the embedding capacity decreases because the room for cluster expansion becomes more limited.Therefore, in our experiments, we set T to 3 and 4. We also set the upper bound τ of Inequality (5) to 0.001.The size of the host image is an important factor of the embedding capacity because a large image normally contains more colours than a smaller one, which results in larger clusters with higher embedding capacity.Meanwhile, since the homogeneous background areas in an image only contain a limited number of colours, images with large homogeneous areas tend to have a smaller colour table, but greater rooms for cluster expansion.We use two 1024×1024pixel images as examples, one with a large homogenous area (as shown in Figure 7(a)), thus a smaller colour table, while the other without a homogeneous background (as shown in Figure 8(a)), thus a larger colour table.Whether an image has a large homogeneous area can be determined by the size of the colour table.A large colour table means that every colour only appears a few times, indicating that there is no large homogenous area in the image, and vice verse.Given the images of the same size, the ratio between the total number of pixel and the size of the colour table can give a good indication as to whether there are large homogeneous areas in the images or not.For example, for the "Hill" image, the ratio between the total number of pixel, and the size of the colour table is 7.1 : 1 while the ratio for "Mandrill" is 1.6 : 1.Each colour has appeared about 7 times on average in the "Hill" image, while each colour only has appeared 1.6 times on average in the "Mandrill" image.Therefore we can conclude that there are larger homogeneous areas in "Hill" than those in "Mandrill."Table 1 lists the statistics of the cluster expansion with the distance threshold T equal to 3 and 4. We only choose the distance threshold 3 and 4 for experiments because when the threshold is higher than 4, the distortion is too high (PSNR is lower than 35 db) and when the threshold is lower than 3, the room for expansion is too small (the threshold equal to 2 hardly allows any expansion in the cluster expansion phase).We can see that image "Hill" in Figure 7(a), with a relatively larger homogeneous background than image "Mandrill" in Figure 8(a), has a more significant expansion rate.Because one virtual colour is assigned to only one pixel in the image, the distortion arising from the expansion operation is well above 60 dB (see "Distortion due to expansion" of Table 1).Figures 7(b) and 8(b) are the slightly modified version of Figures 7(a) and 8(a) with virtual colours added when distance threshold T = 3, respectively.Table 2 demonstrates the changes to the size of the largest cluster of image "Hill" and "Mandrill" after cluster expansion.For image "Hill," the size of the largest cluster doubles (from 64 up to 128) after the expansion when the threshold is either 3 or 4.So those pixels with their colour belonging to this cluster can carry 7 (2 7 = 128) bits of secret data, rather than just 6 (2 6 = 64) bits.Note that there are other clusters of various sizes underwent expansion.For the image of "Mandrill" with a lower expansion rate as shown in Table 2, the largest cluster has not been expanded due to the constraints (mentioned in Section 2.3) imposed on the expansion algorithm.
Tables 3 and 4 show that the proposed scheme is capable of improving the embedding capacity in terms of bit per pixel Original image (bpp) without inflicting distortion on the stego-images.For image "Hill," we can see the significant capacity difference between the unexpanded and the expanded cases as shown in Table 3 while embedding distortion remains near constant as shown in Table 4.Note that Table 4 even shows slight reduction in distortion (i.e., the increase of PSNR from 38.22 dB to 38.82 dB) after cluster expansion on "Hill" with T = 3.Although the general perception is that high embedding capacity is usually gained at the expenses of high distortion, such a distortion reduction is still possible due to the fact that the virtual colours are only added around the centroids of the clusters under the constraints mentioned in Section 2.3.The least upper bound of embedding distortion is the mean Euclidean distance of each group (determined by the triangle inequality).Adding virtual colours around the centroid of the clusters can reduce the mean Euclidean distance of each group and therefore reduce embedding distortion.However, in Table 3, in the image of "Mandrill" wherein the high-frequency signal prevails, rooms for cluster expansion are relatively limited (see Table 1, expansion rate), and performance improvement gained through cluster expansion is therefore insignificant.Figures 7(c) and 8(c) are the stego-images of Figures 7(a) and 8(a) with virtual colours added when distance threshold T = 3, respectively.As discussed above, adding the virtual colours to expand the cluster size can boost the performance of PEC.A more critical way of demonstrating the PEC algorithm's superiority to k-means algorithm is to perform PEC without adding virtual colours (i.e., a handicapped version of PEC).We compared the CW method reported in [1] and PEC by applying them to Lena image of 512 × 512-pixels and 256 gray levels as shown in Figure 9. CW is a VQ-based method; to conduct comparisons on the same basis, we generate a codebook containing 512 codewords, each represented as 4 × 4-pixel image block.The reader is referred to [4] for details about how the codebook is generated.Compared to the previous case with colour images as target images, the colour table is now replaced with the codebook of 512 codewords, and each image is divided into blocks of 4 × 4-pixels and replaced with the most similar codeword.Algorithm one is then used to cluster the codebook, but we do not apply Algorithm 2 to expand the clustered codebook.Algorithms 2 and 3 are applied to embedding and extracting secret data.Since the embedding capacity and embedding distortion cannot be evaluated alone without taking each other into account, a reasonable way of evaluating them would be to fix one of the two factors and see how the algorithm performs in terms of the other factor.Figures 10(a) and 10(b) are the results of experiments with the threshold T, which is the Euclidean distance between vectors, equal to 20 and 60, respectively.We can see that at any embedding capacity, the embedding distortion of the proposed PEC algorithm is always lower (higher PSNRs).For example, in Figure 10(b), when the embedding capacity (the horizontal axis) is 8000 bits, the distortions inflicted on the image by the CW and the proposed PEC algorithms are 29.81 dB and 31.82dB, respectively.We can also see from Figure 10 that at certain value of embedding distortion along the vertical axis, the embedding capacity of the PEC algorithm is always greater than that of the CW algorithm.The reader  is reminded that the CW algorithm is greedier in merging codewords.Therefore, with the same threshold, the CW algorithm should have a higher maximal capacity.That is why the two cutoff points where the maximal embedding capacities are reached are different.But this does not mean that the CW algorithm has better performance in terms of embedding capacity because the performance should be measured in terms of both embedding capacity and distortion.As we will discuss below, the PEC algorithm can still reach the same embedding capacity with a higher threshold T while keeping the embedding distortion lower than that of the CW algorithm using a lower threshold T.
Another way to look at the performance is to compare the maximal embedding capacity across a wide range of threshold values under the constraint that the embedding distortion should not go below a specific lower bound.Figures 11(a) and 11(b) show the curves of embedding capacity under the constraints that the embedding distortion should not go below the lower bound of 35 db and 30 db, respectively.At the beginning, the PSNR is lower than the lower bound due to the low capacity and hence the curves keep rising as the threshold increases, until the peaks of the curves are reached.The capacity starts to drop after the peaks are reached because as the threshold T increases, the sizes of the clusters become greater.As a result, embedding the same amount of data incurs higher distortion.That means that the lower bound of distortion will be reached when less data is embedded with greater T. In Figure 11(a), the maximal capacity of the PEC algorithm is 7457 bits (appearing at threshold T = 42), while that of the CW algorithm equals 6810 bits (appearing at threshold T = 26).In Figure 11(b), the maximal capacity of the PEC algorithm is 11070 bits (appearing at threshold T = 66), while that of the CW algorithm equals 8979 bits (appearing at threshold T = 50).We can see that in both cases the performance of the PEC algorithm in terms of maximal embedding capacity is   9.5% and 23.3%, respectively, superior to that of the CW algorithm.

Analyses of PEC
4.1.Histogram Analysis and ε-Secure.The histogram of an image can effectively reveal the distribution of the colours or intensities, and thus any irregular distribution caused by the steganographic method can be easily detected [18].In [15,16], the 3-dimensional RGB colour space is separated into spheres by the k-means clustering algorithm, and colours generated around the centre of the sphere are used for substitution in order to carry secret messages while the colours outside the sphere are not used in the stegoimages.Moreover, within each sphere, the colours distribute sparsely and uniformly.As a result, a large number of gaps would appear between colours with high occurrences in the histograms of the stego-images.Such a phenomenon usually do not appear in natural images, therefore its appearance would attract the steganalyst's attention.This is why histogram and histogram characteristic functions have been extensively used in measuring the security of stegonagraphic schemes [18][19][20].By analysing the histogram of the stego-image, the attacker can easily detect the spheres in the form of aggregated clusters separated by large gaps in the histogram.These gaps allow the attacker to reconstruct the sphere/cluster configuration and even infer the hidden message.Compared to [15,16], the proposed PEC algorithm does not separate the 3-deminsional RGB colour for generating colours.Instead, all the physical colours are preserved while some virtual colours are added to expand the clusters in order to increase embedding capacity without inflicting high distortion.Hence, the attacker cannot detect separated clusters in the histogram.
Katzenbeisser and Petitcolas [21] propose the statistical concept of ε-secure as a more strict security requirement based on the histogram analysis.ε-secure requires that, for steganography scheme to be deemed as secure, the Kullback-Leibler divergence (D KL ) between the PDF (probability distribution function) of the original signal and the stego-signal must be less than a given threshold ε.The Kullback-Leibler divergence (D KL ) can be described as follows: where P o (z) and P s (z) are the PDF of the original image and the stego-images.In the proposed PEC algorithm, the D KL between the images are about 0.001 to 0.02 (except the embedding result of the Airplane image, which is a particular example that the image contains over 80% homogenous area, allowing the PEC algorithm to expand the clusters with a greater expansion rate) depending on the content of the images.Table 5 demonstrates the KL divergence of the images.We can see that D KL is always lower than 0.01 when T < 13.

Characteristics of the PEC.
There are three main advantageous aspects to the idea of the proposed PEC-based steganographic scheme.
(i) The pairwise matching operation ensures that the sizes of all clusters are always equal to a power of 2 (i.e., exponential).This exponential structure makes use of every colour in a cluster for hiding data and thus removes redundancy in the data hiding process.
(ii) The cluster expansion algorithm allows the embedding capacity to be increased without sacrificing imperceptibility.At first glance, distortion should increase proportionally as the embedding capacity gets higher through the addition of the virtual colours.However the cluster expansion algorithm can actually decrease the embedding distortion in terms of PSNR by shortening the average distance between the colours and the centroid of the cluster.According to the cluster expansion algorithm, clusters are expanded not by extending their boundaries outwards, but by increasing the population density around the centroids while leaving the boundary intact, thus reducing colour variation of each cluster.
Ensuring the presences of all physical and virtual colours in the stego-image, the recipient can reestablish the same cluster configuration using the shared secret key K and distance threshold T. Sharing the large colour palette and the centroids of the clusters (as required in [15,16]) is no longer necessary.The reason the security of the PEC-based steganographic scheme is strengthened is twofold.
(a) Because there is no need to transmit extrainformation, the chance of compromising the security of the scheme is reduced.(b) Because of the fragility of the PEC-based steganographic scheme, modifications on the stego-image will inevitably change the colours of the pixels and thus lead to a different colour table .It is noted that the PEC algorithm is very sensitive to the initial colour table.With a different colour table, the PEC algorithm will produce a greatly different clustering configuration and wrong ordering of the colours within each cluster, which will prevent the hidden data being extracted by the attacker.

Conclusions
After studying some clustering-based steganographic schemes for colour images, we observed that the performance of a scheme depends on the efficiency of the clustering algorithm.An efficient clustering algorithm should take the two conflicting factors of high embedding capacity and low embedding distortion into account simultaneously.In this paper, we have proposed a novel steganographic scheme using the Progressive Exponential Clustering (PEC) algorithm.This algorithm overcomes the limitation of the traditional clustering-based steganographic schemes by seeking a balance between high embedding capacity and low embedding distortion.Meanwhile, this steganographic scheme employs a cluster expansion method, which further increases the capacity without sacrificing imperceptibility.
Our experiments have proved that adding virtual colours in an adaptive manner to the colour table of images with significant low-frequency components can significantly increase the embedding capacity without inflicting distortion on the stego-images.

Figure 1 :
Figure 1: The framework of the PEC-based steganographic scheme.

Figure 4 : 15 Figure 5 :
Figure 4: The original Mandrill image and the stego-images with different distance thresholds.

Figure 6 :
Figure 6: The original peppers image and the stego-images with different distance thresholds.

Figure 10 :
Figure 10: Distortion comparison between the PEC and CW algorithm when applied to Figure 9 with the threshold T equal to (a) 20 and (b) 60.

Figure 11 :
Figure 11: Embedding capacity comparison between the PEC and CW algorithm when applied to Figure 9 with the threshold PSNR set to (a) 35 dB and (b) 30 dB.
of various sizes, each containing 2 h colours, where h is a positive integer variable depending on the size of the cluster.In Step 3, each cluster is expanded by adding a number of virtual colours, which are not present in the original image and near the centroid of the cluster, in order to form an expanded cluster with 2 h colour (h > h).The cluster expansion operation results in an expanded colour table Λ .Throughout the rest of the paper, we will use the terms physical colours to represent the colours present in the original image I O and virtual colours to represent the added colours that are absent in the original image.The purpose of expanding the clusters is to increase the embedding capacity, and the reason of selecting the virtual colours near the centroids is to minimise the embedding distortion.
The first step of this scheme is to create a colour table Λ, which covers all colours present in the original image I O .In the second step, the colour table is partitioned into a number of groups using the Progressive Exponential Clustering (PEC) algorithm (to be described in Section 2.2).The result of this grouping is a set of clusters C table.For example, in Iteration 0 of the above demonstration, 17 is closer to 15.However, 15 is paired up with 11 since 11 appears before 17 in the colour table, and the distance between 11 and 15 is 4 < T (= 5).By randomising the orders of the colours of the colour table in Step 1 of the PEC algorithm according to secret key K, we can ensure the security of the proposed scheme.Without this secret key, even if a potential attacker managed to establish the same colour table, he/she is still unable to obtain the same clustering configuration C. Without the same clustering configuration C, the attacker is unable to extract the correct secret data.
2.3.Cluster Expansion.If the colour of a pixel belongs to a cluster with 2 h colours, then this pixel can carry h bits of secret message.Therefore, by expanding clusters exponentially, higher embedding capacity can be achieved.For example, a cluster with 4 (2 2 ) colours has the embedding capacity of 2 bits.If 4 more different colours are inserted into this cluster, the embedding capacity of the colours in this cluster increases to 3 bits (8 = 23

Table 1 :
The number of colours within the original and expanded colour tables with different threshold T. Note that "Distortion due to expansion" is the distortion incurred after one virtual colour is assigned to one pixel in the original image.

Table 2 :
The size of the largest clusters before and after cluster expansion.

Table 3 :
Embedding capacity in terms of bit per pixel (bpp) before and after clustering.

Table 4 :
Embedding distortion in terms of PSNR (dB) before and after clustering.

Table 5 :
KL divergence (D KL ) of the images