 Research Article
 Open Access
Progressive Exponential ClusteringBased Steganography
 ChangTsun Li^{1}Email author and
 Yue Li^{2}
https://doi.org/10.1155/2010/212517
© ChangTsun Li and Yue Li. 2010
 Received: 26 April 2010
 Accepted: 5 October 2010
 Published: 12 October 2010
Abstract
Cluster indexingbased steganography is an important branch of datahiding techniques. Such schemes normally achieve good balance between high embedding capacity and low embedding distortion. However, most cluster indexingbased steganographic schemes utilise less efficient clustering algorithms for embedding data, which causes redundancy and leaves room for increasing the embedding capacity further. In this paper, a new clustering algorithm, called progressive exponential clustering (PEC), is applied to increase the embedding capacity by avoiding redundancy. Meanwhile, a cluster expansion algorithm is also developed in order to further increase the capacity without sacrificing imperceptibility.
Keywords
 Secret Message
 Distance Threshold
 Secret Data
 Cluster Expansion
 Cluster Configuration
1. Introduction
Many steganographic schemes have been developed for hiding data in vectorquantisation (VQ) compressed colour images (also called palette images) [1–14]. Although there are variations among them, a common feature of these methods is that they partition the codebook into a number of groups or clusters and then embed the secret message by replacing the codeword indices of the compressed image with those of the same group/cluster selected according to the corresponding secret data bits. For example with a cluster of 8 ( ) codewords, each codeword can embed 3 bits of the secret message. If the binary secret data bits is , (or ), the second (or sixth) codeword is used to replace the original codeword. The receiving end of the stegoimage needs to have the same clustering of the same codebook. The secret message is extracted by concatenating the position/index (in binary form) of the received codewords in their groups/clusters. Therefore, we can see that the greater the cluster, the greater the embedding capacity of each codeword of the cluster [1, 4, 12]. The size of a cluster is determined by the distance between each codeword and the cluster's centroid. The greater the distance is allowed, the larger the cluster is. However, the greater a cluster is, the greater the variance among the codewords in the cluster becomes, meaning the average embedding distortion is greater because the possibility that a codeword gets replaced with a more distant codeword is higher [4, 12]. So striking a good balance between embedding capacity and embedding distortion is important, but unfortunately not trivial. The feasibility resides in the optimality of the codebook clustering algorithm [4, 12]. Because of the indexing characteristic of this type of schemes, we describe them as cluster indexingbased steganography in this paper. In Du and Hsu's work [2], the clustering algorithm treats the secret message as a clustering parameter. The value of the secret message and the size of groups from each clustering step are combined as the threshold of the clustering algorithm. As a result, when embedding different secret messages, the codebook must be reclustered. Furthermore, the performance of the method is dependent on the secret message, making the performance evaluation of the algorithm difficult. Another drawback of some schemes of this category (e.g., [1]) is that the size of each cluster is not a power of 2, making some colours in the same cluster redundant, thus reducing overall embedding capacity. For example, if the size of a cluster is not a power of 2, then colours are redundant and not contributing to the improvement of embedding capacity.
A less studied steganographic technique is the application of the aforementioned cluster indexing idea to hiding data in true colour images [15, 16] due to the fact that the size of the palette (codebook or colour table) is , which is times greater then that of VQcompressed images and complicates the clustering of colours. In [16], instead of using the entire colour space with colours as the codebook, a colour table containing only the colours present in the original image is first created. A means clustering algorithm is then applied to cluster the colours in the colour table into clusters/groups. Thirdly, the entire colour space with colours is partitioned into cubes, with each cube centred at one centroid of the clusters formed in the first step. Subsequently, each cube is reduced to a 3dimensional sphere containing colours, where is a prespecified parameter which determines the capacity and distortion of the scheme. Finally, to embed data bits with a value equal to into a pixel, the algorithm identifies the cluster which contains the colour of the pixel and then replaces the colour with the th colour in the sphere centred at the centroid of the identified clusters. To extract the secret data from the received image, an element colour palette (i.e., the centroids of the same colour clusters) has to be transmitted to the recipient so that the same colour clusters can be reestablished. From each pixel, the index of the pixel's colour in its cluster is taken as the secret data bits carried by the pixel. By concatenating those colour indices, the complete secret message can be formed if the stegoimage is not manipulated. Although high embedding capacity of this scheme has been reported, high distortion due to the low density of the colour clusters makes this scheme unacceptable. Based on [16], Brisbane et al. proposed another scheme [15], aiming at trading embedding capacity for embedding distortion. Although the objective of reducing embedding distortion is achieved, the same requirement of communicating the extrapalette to the receiving side stands as the main limitation on the feasibility of the scheme. For example, if , then approximately 3,000 bytes of data have to be transmitted to the recipient. This extracommunication not only requires extraresource, but more seriously, presents a security gap for potential attack. Moreover, a common limitation of the aforementioned methods is that they are not immune to histogram analysis as we will discuss in Section 4.1.
In this work, we propose a Progressive Exponential Clustering(PEC) based steganographic scheme, aiming at striking a good balance between high embedding capacity and low embedding distortion. Meanwhile, the proposed scheme does not have to transmit the palette from the embedding side to the recipient, hence strengthening security. Moreover, the proposed scheme is immune to histogram analysis [1, 15, 16].
2. Progressive Exponential Clustering (PEC)Based Steganography
2.1. Colour Table Generation
The colour table generation in Step 1 of Figure 1 is the operation of collecting, without repetition, all the colours present in the original image in the raster scan order to create a colour table . Usually images with large homogeneous areas or lowfrequency components lead to a smaller colour table than those images mainly consisting of highfrequency components.
2.2. Progressive Exponential Clustering (PEC)
In the preliminary clustering phase (step (2) of Figure 1), the colour table is taken as the input to the proposed PEC algorithm, as presented in Algorithm 1, and the PEC algorithm randomises the orders of the colours in under the control of a secret key and partitions into a number of clusters, each with a size equal to a power (h) of 2. The main idea of this iterative PEC algorithm is that, initially, in Iteration 0, each individual colour in the colour table is treated as a singleton cluster, and then a pairing operation matches each cluster to one of the clusters within a Euclidean distance of to form a new doublesized cluster. The process repeats until no more matches can be made. The centroid/average of each cluster pair is calculated, and then, with the leftover clusters excluded, the entire new set of the paired clusters is subjected to the same pairing operation under the constraint of the same threshold in the next iteration. The pairing/clustering operation iterates progressively until no match is made throughout an entire iteration. The final clustering is achieved by concatenating the leftover clusters of all iterations. The reason we want the size of each cluster to be equal to a power of 2 is to eliminate redundancy so as to increase the overall embedding capacity.
Algorithm 1: Progressive exponential clustering (PEC) algorithm.
Input: colour table , Euclidean distance , secret key
Output: cluster configuration
 (1)
The set of colours in with their orders randomised according to secret key (each individual colour is seen as a cluster);
 (2)
;
 (3)
clustering_completed:= FALSE;
 (4)While clustering_completed:= FALSE;
 (4.1)
Calculate the centroid of each cluster in ;
 (4.2)
;
 (4.3)For every in
 (4.3.1)
For to
If (4.3.1.1)
;
 (4.3.1.2)
(Note "\" is the operation of Set Difference)
 (4.3.1.3)
Go to step 4.3;
Else
Go to step (4.3.1)
 (4.3.1.1)
 (4.3.1)
 (4.4)
If (At the final iteration, is either 0 or 1 depending on Step )
clustering_completed:= TRUE;
Else
;
 (4.1)
 (5)
Final cluster configuration ;
The most accurate way of pairing up colours/clusters is to conduct an exhaustive search and pair up the closest ones. However, we observed that the size of the colour table of pixel images normally varies from 100,000 to 700,000. Exhaustive search in such an enormous colour table is by no means computationally feasible. So we employ a suboptimal, yet more efficient search method, called incremental clustering based on the idea proposed by Jain et al. [17]. The idea is that patterns are sorted in a random order, each treated as singleton clusters, and then incrementally merged to form larger clusters. When growing a cluster, the algorithm searches for the first pattern or cluster outside the growing cluster, with a distance to the growing cluster's centroid shorter than a threshold, to merge. According to [17], the time complexity of a means clustering algorithm is , where is the number of the colours to be clustered, is the number of the clusters in the final cluster configuration, and is the number of iterations taken by the algorithm before convergence. In comparison, the time complexity of the incremental clustering algorithm is only .
Let be the colour table, the set of clusters in the th iteration of the clustering process, the cluster in the centroids of the Euclidean distance between the centroids and of clusters and a Euclidean distance threshold, and the final clustering configuration of the colour table . The proposed PEC algorithm is described in Algorithm 1.
In order to demonstrate the idea of the PEC algorithm in a simple manner, let us use a scalar number to represent a colour of in the following example. For example, given after randomisation under the control of a secret key , and the threshold , the clustering process is as follows.
is a clustering configuration consisting of clusters of various sizes, and the size of a cluster is . Therefore, the embedding capacity for a pixel, whose colour belongs to , is bits. Meanwhile, every colour in cluster may be used for substitution during the secret data hiding process, hence there is no redundancy in . In addition, although contains only one colour and is not embeddable, such a cluster can be expanded to a larger cluster by the expansion algorithm as introduced in Section 2.3, and thus is included in .
In step (4.3) of Algorithm 1, for one pixel to be paired, the algorithm selects the first colour in the colour table smaller than the distance threshold rather than the closest colour. This selection policy has two advantages. Firstly, it avoids exhaustive search and greatly reduces the computation load [17]. Secondly, this selection policy increases the security of the PECbased steganographic scheme because it is sensitive to the orders of the colours in the colour table. For example, in Iteration 0 of the above demonstration, 17 is closer to 15. However, 15 is paired up with 11 since 11 appears before 17 in the colour table, and the distance between 11 and 15 is . By randomising the orders of the colours of the colour table in Step 1 of the PEC algorithm according to secret key , we can ensure the security of the proposed scheme. Without this secret key, even if a potential attacker managed to establish the same colour table, he/she is still unable to obtain the same clustering configuration . Without the same clustering configuration , the attacker is unable to extract the correct secret data.
2.3. Cluster Expansion
 (i)
We observed that, for most images without larger homogeneous areas, higher embedding capacity can be achieved without incurring further distortion in terms of both human perception and PSNRs if the added virtual colours are within one standard deviation of the cluster centroid. However, if there are larger homogeneous areas in the image, although the PSNR remains near constant, visual distortion may become more noticeable when too many virtual colours are added to the homogeneous areas. Therefore, the number of the virtual colours should be restricted in the expansion process.
 (ii)
Another factor worth noting is that overexpansion cannot guarantee the presence of every virtual colour in the stegoimage. As a result, the recipient of the stegoimage cannot obtain the same colour table as used at the embedding side; consequently he/she is unable to make the correct extraction. Therefore, measures for tackling this situation are necessary.
where is the floor function. The cluster expansion algorithm is presented in Algorithm 2. After applying the cluster expansion to all clusters in the original colour table , the result is an expanded colour table .
Algorithm 2: Cluster expansion algorithm.
Input: cluster , probability upper bound and Euclidean distance threshold
Output: expanded cluster
 (1)
Compute (the number of pixels with their colours belonging to );
 (2)
Compute the centroid of cluster
 (3)
Compute of Inequality (5);
 (4)
Compute according to (7);
 (5)
Let . Note
that by "virtual colours" we mean the colours that are not present in and
.
To ensure the appearance of every virtual colour in the stegoimage and subsequently allow the recipient to reconstruct the same colour table , some pixels have to be chosen to have their original (physical) colours replaced with the virtual colours before data hiding takes place (step (4) of Figure 1). To assign a virtual colour, we identify the first pixel in the image with a physical colour in the same cluster as the virtual colour. Assigning each virtual colour to only one pixel keeps the distortion low. Note that this "expansion" distortion is insignificant when compared to the actual embedding distortion, as discussed in Section 3 and demonstrated in Table 4 and Figures 7 and 8.
Because the virtual colours are not added after the preliminary PEC clustering, there is no guarantee that the cluster configuration can be reconstructed. Therefore after cluster expansion and performing step (4) of Figure 1, a second round of PEC clustering (step (5) of Figure 1) is applied to partition the expanded colour table so as to create a new cluster configuration before data hiding.
2.4. Data Hiding and Extraction
The data hiding process is straightforward and is presented in Algorithm 3. At the receiving side, the recipient has to construct the same colour table according to the received image , and then apply the same PEC algorithm on in order to obtain the same clustering configuration . With the same and the secret data can be extracted correctly. The extraction algorithm performs as follows. If the colour of a pixel is the only member of its cluster , then this pixel carries no information; otherwise, the index of the colour in the cluster represents the secret data bits. After visiting all pixels, by concatenating those extracted bits, the complete secret data stream can be revealed. The data extraction algorithm is presented in Algorithm 4.
Algorithm 3: Data hiding algorithm.
Input: modified image , cluster configuration , secret key , secret data
Output: embedded image
 (1)
Convert into binary form;
 (2)
While the end of the input image is not raeached
If the colour of the current pixel is the only member of its cluster (i.e., this pixel is nonembeddable) (2.1)
Go to Step 2;
Else
 (2.2)
Take ( )?bits from , denoted as ;
 (2.3)
Replace the current colour with the th colour in ;
 (2.1)
Algorithm 4: Data extraction algorithm.
Input: received image , secret key , threshold
Output: extracted secret data
 (1)
Create the colour table based on ; (Note no cluster expansion is needed.)
 (2)
Apply the PEC algorithm on to get the cluster configuration using the same threshold and the shared secret key ;
 (3)
While the end of is not reached
If the colour of the current pixel is the only member of its cluster (i.e., is nonembeddable) (3.1)
go to step ;
Else
 (3.2)
Convert the index of the colour in into binary form;
 (3.3)
Append to the binary stream of the secret data extracted so far;
 (3.1)
3. Experimental Results
The number of colours within the original and expanded colour tables with different threshold . Note that "Distortion due to expansion" is the distortion incurred after one virtual colour is assigned to one pixel in the original image.
Image  Number of coloursbefore expansion  Number of coloursafter expansion  Expansion rate  Distortiondue to expansion  






 
Hill  149509  163513  163993 

 64.21 dB  66.20 dB 
Mandrill  662895  663023  663023 

 62.05 dB  66.20 dB 
The size of the largest clusters before and after cluster expansion.
Image  The size of the largest cluster before expansion  The size of the largest cluster after expansion  The size of the largest cluster after expansion 

( )  ( )  
Hill  64  128  128 
Mandrill  64  64  64 
Embedding capacity in terms of bit per pixel (bpp) before and after clustering.

 

Image  The capacity without expansion  The capacity with expansion  The capacity without expansion  The capacity with expansion 
Hill  3.74  4.25  4.80  5.02 
Mandrill  4.05  4.05  5.05  5.06 
Embedding distortion in terms of PSNR (dB) before and after clustering.

 

Image  Embedding image using unexpanded colour table  Embedding image usingexpanded colour table  Embedding image using unexpanded colour table  Embedding image using expanded colour table 
Hill  38.22  38.82  35.69  35.37 
Mandrill  38.27  38.21  35.16  35.14 
4. Analyses of PEC
4.1. Histogram Analysis and Secure
The histogram of an image can effectively reveal the distribution of the colours or intensities, and thus any irregular distribution caused by the steganographic method can be easily detected [18]. In [15, 16], the 3dimensional RGB colour space is separated into spheres by the means clustering algorithm, and colours generated around the centre of the sphere are used for substitution in order to carry secret messages while the colours outside the sphere are not used in the stegoimages. Moreover, within each sphere, the colours distribute sparsely and uniformly. As a result, a large number of gaps would appear between colours with high occurrences in the histograms of the stegoimages. Such a phenomenon usually do not appear in natural images, therefore its appearance would attract the steganalyst's attention. This is why histogram and histogram characteristic functions have been extensively used in measuring the security of stegonagraphic schemes [18–20]. By analysing the histogram of the stegoimage, the attacker can easily detect the spheres in the form of aggregated clusters separated by large gaps in the histogram. These gaps allow the attacker to reconstruct the sphere/cluster configuration and even infer the hidden message. Compared to [15, 16], the proposed PEC algorithm does not separate the 3deminsional RGB colour for generating colours. Instead, all the physical colours are preserved while some virtual colours are added to expand the clusters in order to increase embedding capacity without inflicting high distortion. Hence, the attacker cannot detect separated clusters in the histogram.
KL divergence ( ) of the images
Threshold  

Image 




Lena  0.0023  0.0063  0.0084  0.0115 
Mandrill 
 0.0050  0.0068  0.0083 
Airplane  0.0092  0.0450  0.0645  0.0914 
Peppers  0.0011  0.0053  0.0073  0.0093 
4.2. Characteristics of the PEC
 (i)
The pairwise matching operation ensures that the sizes of all clusters are always equal to a power of 2 (i.e., exponential). This exponential structure makes use of every colour in a cluster for hiding data and thus removes redundancy in the data hiding process.
 (ii)
The cluster expansion algorithm allows the embedding capacity to be increased without sacrificing imperceptibility. At first glance, distortion should increase proportionally as the embedding capacity gets higher through the addition of the virtual colours. However the cluster expansion algorithm can actually decrease the embedding distortion in terms of PSNR by shortening the average distance between the colours and the centroid of the cluster. According to the cluster expansion algorithm, clusters are expanded not by extending their boundaries outwards, but by increasing the population density around the centroids while leaving the boundary intact, thus reducing colour variation of each cluster.
 (iii)The PECbased steganography strengthens security. Ensuring the presences of all physical and virtual colours in the stegoimage, the recipient can reestablish the same cluster configuration using the shared secret key and distance threshold . Sharing the large colour palette and the centroids of the clusters (as required in [15, 16]) is no longer necessary. The reason the security of the PECbased steganographic scheme is strengthened is twofold.
 (a)
Because there is no need to transmit extrainformation, the chance of compromising the security of the scheme is reduced.
 (b)
Because of the fragility of the PECbased steganographic scheme, modifications on the stegoimage will inevitably change the colours of the pixels and thus lead to a different colour table. It is noted that the PEC algorithm is very sensitive to the initial colour table. With a different colour table, the PEC algorithm will produce a greatly different clustering configuration and wrong ordering of the colours within each cluster, which will prevent the hidden data being extracted by the attacker.
 (a)
5. Conclusions
After studying some clusteringbased steganographic schemes for colour images, we observed that the performance of a scheme depends on the efficiency of the clustering algorithm. An efficient clustering algorithm should take the two conflicting factors of high embedding capacity and low embedding distortion into account simultaneously. In this paper, we have proposed a novel steganographic scheme using the Progressive Exponential Clustering (PEC) algorithm. This algorithm overcomes the limitation of the traditional clusteringbased steganographic schemes by seeking a balance between high embedding capacity and low embedding distortion. Meanwhile, this steganographic scheme employs a cluster expansion method, which further increases the capacity without sacrificing imperceptibility. Our experiments have proved that adding virtual colours in an adaptive manner to the colour table of images with significant lowfrequency components can significantly increase the embedding capacity without inflicting distortion on the stegoimages.
Authors’ Affiliations
References
 Chang CC, Wu WC: Hiding secret data adaptively in vector quantisation index tables. IEE Proceedings: Vision, Image and Signal Processing 2006, 153(5):589597. 10.1049/ipvis:20050153Google Scholar
 Du WC, Hsu WJ: Adaptive data hiding based on VQ compressed images. IEE Proceedings: Vision, Image and Signal Processing 2003, 150(4):233238. 10.1049/ipvis:20030525View ArticleGoogle Scholar
 Fridrich J, Du R: Secure steganographic methods for palette images. Proceedings of the 3rd Information Hiding Workshop, 2000, Lecture Notes in Computer Science 1768: 4760.Google Scholar
 Li Y, Li CT: Steganographic scheme for VQ compressed images using progressive exponential clustering. Proceedings of the IEEE International Conference on Video and Signal Based Surveillance (AVSS '06), 2006, Sydney, AustraliaGoogle Scholar
 Lin CC, Chen SC, Hsueh NL: Adaptive embedding techniques for VQcompressed images. Information Sciences 2009, 179(12):140149. 10.1016/j.ins.2008.09.001View ArticleGoogle Scholar
 Niimi M, Noda H, Kawaguchi E, Eason RO: High capacity and secure digital steganography to palettebased images. Proceedings of the International Conference on Image Processing (ICIP '02), September 2002, Rochester, NY, USA 917920.Google Scholar
 Niimi M, Noda H, Kawaguchi E, Eason RO: Luminance quasipreserving color quantization for digital steganography to palettebased images. Proceedings of the International Conference on Pattern Recognition, August 2002 251254.Google Scholar
 Raja KB, Siddaraju S, Venugopal KR, Patnaik LM: Secure steganography using colour palette decomposition. Proceedings of the International Conference on Signal Processing, Communications and Networking (ICSCN '07), February 2007 7480.Google Scholar
 Tai WL, Chang CC: Data hiding based on VQ compressed images using hamming codes and declustering. International Journal of Innovative Computing, Information and Control 2009, 5(7):20432052.Google Scholar
 Wang X, Yao Z, Li CT: A palettebased image steganographic method using colour quantisation. Proceedings of the International Conference on Image Processing (ICIP '05), September 2005, Genova, Italy 2: 10901093.Google Scholar
 Wu MY, Ho YK, Lee JH: An iterative method of palettebased image steganography. Pattern Recognition Letters 2004, 25(3):301309. 10.1016/j.patrec.2003.10.013View ArticleGoogle Scholar
 Wu MN, Juang PA, Li YC: An efficient VQbased data hiding scheme using voronoi clustering. Proceedings of the 9th International Conference on Hybrid Intelligent Systems (HIS '09), August 2009, Shenyang, China 7377.Google Scholar
 Zhang X, Wang S: Analysis of parity assignment steganography in palette images. Proceedings of the KnowledgeBased Intelligent Information and Engineering Systems, 2005, Lecture Notes in Computer Science 3683: 10251031.Google Scholar
 Zhang X, Wang S, Zhou Z: Multibit assignment steganography in palette images. IEEE Signal Processing Letters 2008, 15: 553556.View ArticleGoogle Scholar
 Brisbane G, SafaviNaini R, Ogunbona P: Highcapacity steganography using a shared colour palette. IEE Proceedings: Vision, Image, and Signal Processing 2005, 152(6):787792. 10.1049/ipvis:20045047Google Scholar
 Seppannen T, Makela K, Keskinarkaus A: Hiding information in color images using small color palettes. Proceedings of the 3rd International Workshop on Information Security, 2000, Wollongong, Australia 6981.Google Scholar
 Jain AK, Murty MN, Flynn PJ: Data clustering: a review. ACM Computing Surveys 1999, 31(3):316323.View ArticleGoogle Scholar
 Fredrich J, Goljan M: Practical steganalysis of digital images: state of the art. Security and Watermarking of Multimedia Contents IV, January 2002, San Jose, Calif, USA, Proceedings of SPIE 4675: 113.View ArticleGoogle Scholar
 Pevný T, Fridrich J: Multiclass detector of current steganographic methods for JPEG format. IEEE Transactions on Information Forensics and Security 2008, 3(4):635650.View ArticleGoogle Scholar
 Xuan G, Shi YQ, Gao J, Zou D, Yang C, Zhang Z, Chai P, Chen C, Chen W: Steganalysis based on multiple features formed by statistical moments of wavelet characteristic functions. In Proceedings of the 7th Information Hiding Workshop, June 2005, Barcelona, Spain, Lecture Notes in Computer Science. Volume 3727. Springer; 262277.View ArticleGoogle Scholar
 Katzenbeisser S, Petitcolas FAP (Eds): Information Hiding Techniques for Steganography and Digital Watermarking. Artech House Books, Norwood, Mass, USA; 1999.Google Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.