Efficient blind decoders for additive spread spectrum embedding based data hiding

This article investigates efficient blind watermark decoding approaches for hidden messages embedded into host images, within the framework of additive spread spectrum (SS) embedding based for data hiding. We study SS embedding in both the discrete cosine transform and the discrete Fourier transform (DFT) domains. The contributions of this article are multiple-fold: first, we show that the conventional SS scheme could not be applied directly into the magnitudes of the DFT, and thus we present a modified SS scheme and the optimal maximum likelihood (ML) decoder based on the Weibull distribution is derived. Secondly, we investigate the improved spread spectrum (ISS) embedding, an improved technique of the traditional additive SS, and propose the modified ISS scheme for information hiding in the magnitudes of the DFT coefficients and the optimal ML decoders for ISS embedding are derived. We also provide thorough theoretical error probability analysis for the aforementioned decoders. Thirdly, sub-optimal decoders, including local optimum decoder (LOD), generalized maximum likelihood (GML) decoder, and linear minimum mean square error (LMMSE) decoder, are investigated to reduce the required prior information at the receiver side, and their theoretical decoding performances are derived. Based on decoding performances and the required prior information for decoding, we discuss the preferred host domain and the preferred decoder for additive SS-based data hiding under different situations. Extensive simulations are conducted to illustrate the decoding performances of the presented decoders.


Introduction
The growing use of Internet has enabled the users to easily access, share, manipulate, and distribute the digital media data, and digital media has profoundly changed our daily life during the past decade. This proliferation of digital media data creates a technological revolution to the entertainment and media industries, brings new experience to users, and introduces new Internet concepts. However, the massive production and use of digital media also pose new challenges to the copyright industries and raise critical issues of protecting intellectual property of digital media, since current media sharing makes unauthorized copying and illegal distribution of the digital media much easier.
One popular technology for digital right protection is digital watermarking [1], where a specific signal (e.g., the ownership information) is embedded into the host media content without significantly degrading the perceptual quality of the original media data. In contrast to traditional encryption techniques, watermarked media data can still be used while remaining protected, and thus watermarking can provide post-delivery protection of digital media. It is worth mentioning that, despite the popularity of watermarking techniques, effective digital right protection is extremely challenging and currently there is no commonly accepted technical solution which is practically unbeatable when deployed to practical user settings. At any sense, watermarking techniques should only be considered as one important component of an overall protection system.
Amongst the proposed schemes for watermark embedding, spread spectrum (SS) and quantization based methods [2,3] are the two main broad categories. In SS embedding, an additive or multiplicative watermark is added into the host signal. The quantization based schemes are implemented by quantizing the host signal to the nearest lattice point. In this article, we focus on spread spectrum embedding schemes originally proposed by Cox et al. [4]. At the receiver side, a blind detection scheme is employed, since the original image is generally not available and thus is treated as a noise source. There are two main approaches of SS embedding: the additive spread spectrum watermarking and the multiplicative spread spectrum (MSS) watermarking. In additive SS [5,6], the watermark is spread over the host signal uniformly while in MSS [7,8], the watermark spreads according to the content of the host signal. In order to reduce the noise effect of the host signal in additive SS, Malvar and Florencio [9] proposed the improved spread spectrum (ISS), a new modulation technique exploiting the side information at the encoder to reduce the effect of host signal and improve the decoding performance [10]. Recently, the authors have proposed an embedding scheme incorporating the SS and ISS schemes which employs the correlation between the host signal and the signature code to improve the decoding performance [11].
As summarized in [12], depending on different purposes, there are two main types of watermarking schemes: In one type, the embedded watermark is used to communicate a specific hidden message (e.g., binary identification numbers used for image tracking and for video distribution, or a secret hidden message represented by binary sequences) which must be extracted with sufficient decoding accuracy. In the other type of systems, the goal is only to verify whether a specific embedded watermark (e.g., representing copyright information) is presented or not, and the embedded watermark normally does not communicate a secret message that needs to be accurately decoded. It is important to emphasize that the above two problems are formulated differently and different detection (decoding) approaches are desired to serve different performance criteria. References [13][14][15][16][17][18] explicitly have pointed out this distinction in their works.
Based on the above two types of watermarking schemes, current researches on watermark extraction can be categorized into two broad topics: watermark decoding [12,13,19] for the case of decoding the hidden message and watermark detection [16][17][18][20][21][22] for the case of detecting the presence of a specific watermark. Although watermark detection and decoding problems seem to be similar from the hypothesis testing point of view, they actually serve different goals and thus different criteria are used. In watermark decoding, the embedded hidden message should be decoded accurately at the receiver side and therefore the bit error rate is usually used as the performance criterion to measure the accuracy of the decoder in extracting the hidden message, and the watermark decoding problem can be formulated as minimizing the bit error rate. In watermark detection, the goal is to determine whether a specific watermark exists or not, and the detection criteria are mainly based on Neyman-Pearson Theorem (i. e., maximizing the probability of detection for a given probability of false alarm). Performance criterion such as the false alarm probability and the true detection probability are used for evaluating the watermark detector performance. To our knowledge, the majority of the current literature has been focused on watermark detection and many algorithms have been proposed. For instance, in [23] a watermark based on the host content is added and the detection is accomplished with the Neyman-Pearson criterion. In [24], a new perceptual masking is proposed and a correlation based detector is studied for watermark detection. In [25], a class of watermark detectors, including the generalized likelihood ratio, Bayesian, and Rao test detectors are proposed. In this article, we focus on the topic of watermark decoding, since we are particularly interested in communicating hidden message. Since in practice the original host image is generally not available at the decoder side, we focus on blind watermark decoding.
The very first decoder used for watermark decoding in SS embedding is the traditional correlator proposed by Cox. This decoder extracts the embedded information using the correlation between the signature code and the received data. Utilizing the probability density function (PDF) of the host signal could help enhancing the performance of watermark decoding. An optimum ML decoder for additive SS in the DCT domain was proposed by Hernandez et al. [26]. The optimal decoder for multiplicative SS in the DFT domain was investigated in [13]. Regardless of the above referred literatures, compared with the research works on watermark detection, watermark decoding is less studied, and a thorough analytical study of watermark decoding is still required. It is worth emphasizing that, since the watermark decoding problem is formulated as different hypotheses testing problem from the watermark detection problem (i.e., with H 0 being the noise-only hypothesis), a specific watermark detector does not necessarily mean a specific watermark decoder. For instance, the local optimum (LO) test (which is based on the derivative of the likelihood) will yield different forms for the LO detector and the LOD. Also, even though ML criterion has been used for both watermark detection and watermark decoding, it is derived differently and has different meanings (i.e., the ML watermark decoder is a Bayesian approach to minimizes the probability of bit error when assuming the equal prior probability of the bit information and thus assuming the threshold to be 1; while for watermark detection, the ML solution is the likelihood ratio test (LRT) detector based on the Neyman-Pearson theorem, where the LRT exploits the probability of false alarm to set the detection threshold). We would also like to emphasize that since different performance criteria are desired in watermark detector and watermark decoding, a specific type of efficient watermark detector does not necessarily mean an efficient watermark decoder.
The common objective of communicating hidden message using watermarking is to successfully embed and decode an imperceptible watermark which can be resistant against distortions and attacks. In order to reduce the performance degradation under certain attacks such as geometric attacks and to take advantage of the properties of certain transform domain, the message embedding can be performed in different domains such as the discrete cosine transform (DCT) domain [27], the discrete Fourier transform (DFT) domain [28][29][30][31], and the discrete wavelet transform (DWT) domain [32,33].
In this article, our main purpose is to provide a rigorous watermark decoding framework for data hiding using spread spectrum embedding in the DCT domain and the DFT magnitude domain. In the literature of additive SS, there is lack of investigation on the optimal and sub-optimal decoders using the additive SS in the DFT magnitude domain and we will fill this gap in this article. We will show that the conventional SS scheme could not be applied directly in the DFT magnitude domain and thus we will propose a modified SS embedding scheme. To further provide a guidance on the preferred domain for information hiding using additive SS embedding, based on the derived decoders, we will discuss which domain is preferred under different circumstances. We present a theoretical framework of optimal decoders for additive SS and improved SS in the DCT and DFT magnitude domains. Embedding in the DFT domain has its own advantages and it motivates us to develop optimal watermark decoding schemes for this domain. We note that optimal decoders using ISS provide better decoding performances than the traditional additive SS. As the optimum ML decoder requires the distribution parameters of the host image and the watermark strength information, to address this concern, we also investigate several sub-optimal watermark decoders. By invoking the Taylor series, the LOD is proposed by relaxing the requirement on watermark strength. We derive the generalized maximum likelihood (GML) decoder for information hiding in the DFT magnitude domain. Further, due to simplicity and good performance, we employ the linear minimum mean square error (LMMSE) criterion and derive the LMMSE decoders. We derive the theoretical performance analysis of the proposed ML, LOD, and LMMSE decoders, where the theoretical performance of the ML decoder is served as the performance upper bound of watermark decoding schemes. The main contributions of this article are summarized as follows: • Proposed modified SS and ISS embedding schemes in the DFT magnitude domain.
• Derive the ML and GML decoders for SS and ISS in the DFT magnitude domain; Derive the ML decoder for ISS embedding in the DCT domain.
• Derive the LOD decoders for SS embedding in the DCT and the DFT magnitude domains, and derive the LOD decoders for ISS embedding in the magnitude of the DFT domain.
• Derive the LMMSE decoders for SS and ISS embedding in both the DCT and the DFT magnitude domains.
• Provide the theoretical bit-error-rate performance analysis of the above decoders.
The rest of this article is organized as follows. In Section 2, the traditional additive SS and ISS embedding schemes are briefly reviewed for data hiding and communicating hidden message. Host probability distribution functions for DCT and DFT domains are described in Section 3. The optimal ML decoders are derived in Section 4 and the corresponding bit-error-rate analyses are presented. In Section 5, the sub-optimal decoders, including LOD, GML, and LMMSE decoders are presented and their theoretical performance analyses will be provided. The simulation results are demonstrated in Section 6 to validate the analysis. Finally, the discussions and concluding remarks are given in Section 7.

Additive SS and ISS embedding procedure
Suppose a host image I ∈ M m×n is supposed to be watermarked, where M means the image alphabet, e.g., M = {0, 1, ..., 255} for a gray scale image, and m and n represent the size of the image in the pixel (spatial) domain. Here for simplicity we assume m = n, though the results could be extended to the general case of having unequal m and n. For each block of size p × p in the transformed domain, a subset of host coefficients with length l ≤ p 2 is selected to be the carrier vector for embedding. Such selected vectors x i ∈ R l , i = 1, 2,..., q, are used for information embedding. A signature code s = [s 1 , s 2 ,..., s l ] T with length l can be employed for one bit embedding, and using multiple signature codes can allow us to embed multiple bits simultaneously. Usually in decoding problem, the signature code coefficients are from the values +1 and -1.

Additive SS embedding scheme
In the additive SS, the watermark is added to the host signal x = [x 1 , x 2 , ..., x l ] T . The signal model for the additive SS data hiding is expressed as where vectors r, x, s are with length l, and A means the bit information amplitude and b means the bit information to be embedded. Distortion due to information hiding is defined as which can be easily shown equals to A 2 in SS embedding. The point that should be taken into account is that different domain host signals may affect the procedure of adding the information. Using the DCT domain makes no restriction on the additive SS watermarking. Employing the DFT and explicitly embedding the information in the magnitude of DFT coefficients limit the set of coefficients to be watermarked, since the watermarked DFT coefficients are required to be positive. We will discuss about the embedding scheme in the magnitude of the DFT domain more precisely when the optimal decoders are explained in Section 4.

ISS embedding scheme
The traditional additive SS, where the host signal acts as a noise source, is a non rejected host method which does not use the host signal information in the decoder. It was shown in [9] that ISS, which reduces the interference effect of the host signal, leads to significant performance improvements. In this article, we employ ISS proposed in [9] to achieve a better performance in decoding hidden information. The signal model for ISS data hiding is defined as where and I l denotes the identity matrix and k is obtained usually by maximizing the watermark to data ratio or by minimizing the probability of error. The distortion for ISS embedding could be obtained as follows At the receiver side the hidden information needs to be decoded. Since the optimal decoders require the distribution of the host signal, different distributions for the DCT coefficients and the magnitude of DFT coefficients will be discussed in the next section.

Data hiding in the DCT and the DFT magnitude domains
As to be shown in the next section, the distribution of the DCT coefficients is needed to derive the optimal decoder. The authors in [26] suggested that the heavy tailed property for low and mid frequency of DCT coefficients can be modeled by the zero mean generalized Gaussian distribution (GGD) [26] as where α = βc and s x means the standard deviation of the host signal and Γ(.) means the Gamma function defined as The power exponent c is the shape parameter where its smaller value leads to the more impulsive shape and heavier tail. The scale parameter b and the shape parameter c can be estimated from the host signal [34].
The DFT is another popular transform domain for image analysis, where the DFT magnitude could be used to represent the host signal. Since the magnitude of the DFT coefficient is real and positive, the Weibull distribution was suggested [13] to model the PDF, because of its flexibility and consistency with the DFT magnitude, as follows where u(.) determines the step function which returns one where its argument is positive and returns zero when its argument is negative. Moreover, the parameters h > 0 and g > 0 represent the scale and the shape parameters of the Weibull distribution.

The ML optimal decoders
The optimal decoder attempts to obtain an estimateb of b such that the probability error P e = Pr b = b is minimized. This can be done with the maximum aposteriori (MAP) decoder, which is simplified to the maximum likelihood (ML) decoder with the assumption of the equal prior probability of the bit information. The ML estimateb can be expressed aŝ where f R (r|b, A, s) represents the conditional PDF of r when given b, A, and s. It is clear that the distribution of the host signal plays an important role in the ML decoder structure and thus, distribution of the DCT and magnitude of DFT domains were introduced in Section 3. The ML decoder for binary information hiding could be expressed by using the likelihood ratio rule. In this case the ML decoder decidesb = +1 if As discussed in Section 3, the PDF of the host signal can be different depending on the transform domain. In practice, due to different desired properties, different transform domains could be used for data hiding. Derivation and performance analyses of the ML decoder for SS embedding require the distribution of the host signal in a specific domain. In the following subsections, the ML decoders for SS and ISS embedding schemes in DCT and DFT domains are derived. It is worth mentioning that the ML decoder for the SS scheme in the DCT domain has been already proposed in [26].

ML decoders in the DFT domain
One possible host signal for information hiding is the DFT magnitude domain. However, it is important to note that the SS (1) and ISS (5) embedding schemes can not be applied directly in this domain because of the special property of the magnitudes of the DFT coefficients, i.e., they should be always positive. To ensure the intuition that the watermarked signal should be always positive, we propose a modified SS embedding scheme in the DFT magnitude domain as follows where the insurance vector e = [e 1 , e 2 , ..., e l ] T is designed to make r i 's positive. More specifically, if x i + s i Ab is positive, its corresponding element e i is set to be zero; If x i + s i Ab is negative, e i = -s i Ab is set to make r i equal to x i which is consequently positive. In summary, e i in (11) can be formulated as The modified SS embedding scheme (11) and the vector e defined in (12) reveal that, for those coefficients where e i > 0, the watermarked signal becomes r i = x i , meaning that the coefficient r i does not convey information directly. However, because of the structure of the optimal decoder, which will be derived shortly, such r i 's still could help for decoding. We also note that, for the modified SS scheme (11), by increasing the watermark amplitude A, the number of coefficients with e i > 0 increases and consequently the number of watermarked coefficients decreases. Generally, the goal of this modified SS embedding scheme is to make all the watermarked coefficients positive.
Having proposed the modified SS embedding scheme for information hiding in the magnitude of the DFT domain, we can derive the optimal decoder using the distribution of the host signal. Referring to expression (8), assuming the independent and identical distribution of the coefficients, we have the joint PDF of the host data as Based on the ML decoder structure (10) Generally, the ML decoder could be expressed aŝ where in this case, after some manipulations, the test statistic regarding (14) is obtained as follows Investigating the test statistic of the ML decoder in the DFT magnitude domain reveals that the bit information amplitude as well as the PDF parameters should be provided at the receiver side. Now, we proceed to show that the decoding procedure is error free for two cases. For one case that b = +1 and there is one coefficient with r i + s i A < 0 at the decoder side, we can see that the test statistic in (16) goes to infinity and thus the decoder (15) definitely decidesb = +1 . More precisely, in this case, ln(u(r i -s i A)) is positive and ln(u(r i + s i A)) goes to minus infinity, and thus the test statistic in (16) goes to infinity. Similarly, for the other case that b = -1 and at the decoder side there is one coefficient with r is i A < 0, the test statistic in (16) goes to minus infinity and thus the decoder (15) definitely decidesb = −1 .
As mentioned earlier, these coefficients which do not convey information directly could help decoding indirectly. To explain this better, let assume that b = +1 and x i + s i A < 0, thus the corresponding coefficient becomes r i = x i at the embedding side. At the decoder side we will have r i + s i A < 0, which based on the above discussion, leads to the decisionb = +1 . Similarly, let assume that b = -1 and x i s i A < 0, thus the corresponding coefficient becomes r i = x i at the embedding side. At the decoder side we will have r i + s i A < 0, which leads to decisionb = −1 . In both cases, the decoder performance would be error free and therefore, even some coefficients do not convey hidden information directly, they still could contribute to accurate decoding indirectly.
Deriving an analytic expression for the probability of error is always desirable, because it could help to analyze the behavior of the error. We first show that the test statistic used for decoding could be modeled as Gaussian random variable. It is noted that the test statistic (16) is the sum of l random variables, which with the assumption of independent host signal samples and with the knowledge of signature codes, by employing the central limit theorem, the test statistic could be approximated as a normal random variable if l would be large. Assuming that the signature code accepts values +1 and -1 with the equal probability, one could show that the conditional PDFs of the test statistic are expressed as where m z and σ 2 z represent the mean and the variance of the test statistic. Assuming the equal prior probability for the information bit, i.e., Pr{b = +1} = Pr{b = -1} = 1/ 2, the probability of error can be expressed as where er fc(x) = 2 √ π ∞ x e −t 2 dt is the complementary error function and WIR is referred as the watermark to interference ratio. It could be shown that the mean and variance of the test statistic (16) when b = +1, and for all host signal coefficients which x i > 2A, and with assuming equal probability for the signature code Pr{s i = +1} = Pr{s i = -1} = 1/2, are achieved as below If there is a host signal coefficient x i < 2A, according to the earlier discussion, the probability of error would equal to zero. Therefore, the theoretical error probability of the modified SS scheme is expressed as (19) when the mean and variance could be achieved using expressions (20) and (21).
Having introduced information embedding using the modified SS scheme in the DFT magnitude domain, we now present the modified ISS scheme. Similar to the modified SS scheme, to avoid having negative watermarked coefficients, we propose a modified ISS embedding scheme as where the insurance vector e is determined as follows. If u i + s i Ab is positive then the corresponding e i is set to zero; If u i + s i Ab is negative, then e i = -s i Ab + ks i s T x to ensure that r i is positive. Therefore e i in the modified ISS scheme could be expressed as In order to obtain the ML decoder, the conditional distribution of the received signal r should be exploited. To do so, it is straightforward to show that the distribution of the vector u could be given by where |M| is the determinant of M and m i is the ith row of M -1 , where M is defined as Exploiting the ML theory leads to decideb = +1 when the following inequality holds With some manipulations on (26), we can have the following test statistic We now investigate the error probability behavior of this scheme. It is observed from (26) that at least one of the terms m i (r + sA) and m i (r -sA) should be positive for all watermarked coefficients, but we can show that the scheme in (22) may not fulfill this requirement. For instance, let us assume that b = +1 is hidden into the host signal, with the embedding scheme (22) and the ML decoder (26), the two terms u(m i (r -sA)) and u(m i (r + sA)) become (u(x i + m i e)) and (u(x i + m i e-2Am i s)), respectively. Although the host signal vector and the insurance vector have positive coefficients, since the elements of m i could be negative, it can not be guaranteed that all coefficients of x i + m i e and x i + m i e -2Am i s be positive. A similar observation can be noted when b = -1 is hidden into the host signal. Referring to (26), one could conclude that in the cases that both terms m i (r + sA) and m i (r -sA) are negative, the decoder makes random decisions. In order to avoid this undesirable behavior, we improve the modified ISS embedding scheme by proposing where the vector q = [q 1 , q 2 , ..., q l ] T is to make m i (r + sA) when b = -1 (or m i (r -sA) when b = +1) positive. With (28), when b = -1, then y = M -1 (r + sA) becomes y = x + M -1 e + M -1 q, where y = [y 1 , y 2 , ..., y l ] T . Let us is the vector whose elements are non-negative, and v n = [v n1 , v n2 , ..., v nl ] T is the vector whose elements are less than zero. Therefore, the vector y could be written as In order to make all the elements of y positive, it is sufficient to make v n + M -1 q = 0, which leads to the vector q as We will apply the modified ISS scheme in (28) for embedding, and use the decoder in (26) for extracting the hidden information. So far, as a summary, the modified ISS embedding scheme (28) in the DFT magnitude domain has been proposed in order to make all the watermarked coefficients positive and to make the decoder (26) meaningful.
The remaining point is determining the parameter k in ISS embedding which could be done using the probability of error. This parameter should take the value which minimizes the theoretical probability of error. It could be shown that the probability of error for modified ISS scheme is obtained using the expression (19) when x i > 2A(l(k -1l) -1 + 1), based on the following mean and variance where It should be noted that regarding to (5) and the point that the parameter A is always positive, one could conclude that the parameter k in ISS embedding scheme should satisfy 0 ≤ k ≤ D/ s T R x s . This parameter could obtain by optimizing the WIR as the following constrained maximization

ML decoders in the DCT domain
As opposed to the SS and ISS schemes in the magnitude of DFT which suffered from lack of optimal decoder, the optimal decoder in the DCT domain exploiting SS scheme has been accomplished in [26]. It has been shown that the ML decoder satisfies (15) where We can see that the ML decoder requires knowledge of the bit information amplitude and the shape parameter as well as the signature code. For practical implementation, the receiver should either have these prior information or estimate them. One way to avoid estimating the shape parameter is to use a general value for all images, hoping it could describe the distribution of the DCT coefficients relatively well [35]. Now, we extend this work to obtain the ML decoder for ISS embedding scheme in the DCT domain to achieve better decoding performance. To do so, we should first derive the PDF of the vector u defined in (4). To this end, the PDF of generalized Gaussian is rewritten in the following vector form where and R x = diag σ 2 x1 , σ 2 x2 , ..., σ 2 xl . In addition, we define ∥[a 1 , a 2 ,..., a l ]∥ c = [|a 1 | c , |a 2 | c ,..., |a l | c ]. Regarding to the ISS signal model (3), it is shown [36] that the PDF of u can be expressed as Then, likelihood ratio for ISS (3) leads the decoder decidesb = +1 when Having accomplished some algebraic simplifications, the ML decoder for ISS embedding scheme is obtained as in the form of (15) where Having proposed the optimal decoder of the ISS scheme in the DCT domain, the error probability is obtained by (19) where the mean and variance are determined as follow: Similar to the ISS embedding in the magnitude of the DFT domain, the parameter k could be determined using the constrained maximization (35) taking into account the mean and variance defined in (42) and (43).

Sub-optimal decoders
As shown in Section 4, the ML decoder requires the host distribution parameters as well as the watermark amplitude. Assuming low distortion due to watermark, we could estimate the host signal parameters using the received signal, while estimating the watermark amplitude is not easy because of the complex structure of the embedding scheme. Therefore, to reduce the dependency on such prior information, in this section, we will investigate two sub-optimal decoders [37]. In addition, since it was shown that the ML decoder for embedding in the magnitude of the DFT domain is sensitive to watermark amplitude, we hope that the sub-optimal decoders in this domain could decrease this sensitivity and lead to good performances in the presence of additional noise.

Local optimum decoder
To make the hidden information imperceptible, the watermark amplitude should be small. This motivates us to explore the LOD idea of using Taylor expansion of the test statistic around zero. The Taylor series of f(x) around the point x = a excluding the second and higher orders can be expressed as Having introduced this approximation, we first consider the LOD for SS embedding in the DCT domain. Taking into account the point that the test statistic (36) at A = 0 equals to zero, and by taking the derivative of the test statistic and deriving the Taylor series around A = 0, the approximation of the test statistic turns to the following expression Thus, the LOD for SS embedding in the DCT domain is achieved bŷ The provided above decoder expression reveals that it is independent of the watermark amplitude and appropriate for the cases which there is no access to the watermark amplitude.
Having obtained the test statistic of LOD for SS embedding, the error probability can be analyzed using (19) where it could be shown that the mean and the variance are as follow We follow the same procedure provided for LOD using the SS scheme to obtain its counterpart using the ISS scheme. To derive the LOD for ISS in the DCT domain, the test statistic (41) should be rewritten in a more tractable form. After some algebraic manipulations, we have the following form of the test statistic Taking the derivative of above expression and using the Taylor series around A = 0, we have the LOD for ISS embedding asb = sign Again, we observe that above expression, for information decoding using the ISS scheme, has relaxed us having the watermark amplitude, and it is suitable in cases which we do not have access to them. Exploiting (19) leads to the corresponding theoretical error probability of the LOD in (50), with the following mean and variance parameters Since the LODs are approximations of the ML decoders, the degraded decoding performances from the optimal ML ones are expected. LOD has the advantage that the additional information of the watermark amplitude at the decoder side is not required.
A similar procedure could be taken to obtain the LOD in the DFT magnitude domain. Referring to the test statistic (16), we can obtain the LOD decoder for SS embedding in the DFT magnitude domain aŝ It is worthy mentioning that, since LOD is independent of the watermark amplitude A, decoding performance degradation is observed in LOD compared with ML, especially for the high watermark amplitude cases. As discussed in Section 4, though those watermarked coefficients with x i -s i A < 0 or x i + s i A < 0 do not convey information directly, they do help accurate decoding in the ML decoder in (16). From the LOD in (53), it is clear that with no access to the watermark amplitude information, all the received coefficients are exploited for extracting the hidden information even though some of them do not convey any information. This is the source of the decoding performance degradation from ML.
In order to reduce the LOD decoder's sensitivity to watermark amplitude, alternatively, we present the GML as the sub-optimal decoder in the DFT magnitude domain for SS embedding. The GML considers A, b, and e as unknown parameters to be estimated. Referring to the embedding scheme (11), we can obtain the GML decoder aŝ where f x (.) is the Weibull distribution defined in (8) and y = sAb + e. By taking the derivative of the above expression with respect to y and making it zero, we havê The main goal is to estimate b fromŷ. From the expression ofŷ = sAb + e, it is clear that the correlator b = sign s Tŷ is the solution. It should be pointed out that since the GML decoder does not have access to the watermark amplitude information, the GML decoder's performance degrades from that of ML. However, when compared with LOD, GML yields less errors for high watermark amplitudes and thus provides better decoding performance.
The GML decoder for ISS scheme could be obtained similarly by maximizinĝ where f U (.) is defined in (24) and y = sAb + e + q. Similar to the SS case, we havê Therefore, the GML decoder for ISS isb = sign s Tŷ whereŷ is defined as in (58).

LMMSE decoder
In Section 4, we focused on the optimal ML decoders, which require the PDF parameters as well as the watermark strength and signature code. Since providing the watermark strength information to the decoder is not always possible, the LOD and GML decoders were proposed in Section 5 to make the decoders independent of this information. However, the PDF parameters are still required by these decoders. This motivated us to develop sub-optimal decoders which depends neither on PDF parameters nor on the watermark strength information. Here, we introduce the LMMSE decoder which requires only the signature code as prior information at the decoder side.
In signal processing, the mean square error (MSE) is a common measure of estimation and the MMSE estimator minimizes the mean square error in a Bayesian setting. More specifically, let θ be an unknown random variable to be estimated, and let y be the measurement, the MMSE estimator is to find a functionθ = g(y) such that it minimizes the MSE E{(θg(y)) 2 |y}. It is known that, under some weak regularity assumptions, the MMSE estimator is given byθ MMSE = E{θ |y}. In many cases, the minimum mean square error estimator could not be achieved, since we may not know the distributions f(θ|y) and f(θ, y) or the conditional expectation can be difficult to compute. Therefore, in practice the LMMSE estimator which has a linear structure and could be achieved more easily [38], is usually applied. The LMMSE decoder takes a linear combination of the received signal r i and some coefficients w i as follows which the later coefficients should be determined, and the hidden information is extracted using (15). The weight vector w is obtained by minimizing the MSE aŝ The information embedding whether in the DCT domain or the magnitude of the DFT domain could be shown in the general form of (28). It could be shown that the result of this minimization leads the LMMSE decoder to the following conventional [38] form where the autocorrelation matrix R r defined as R r = E {rr T }, can be estimated at the receiver side. Therefore, from former expression, we can see that only the signature code is required at the receiver side. Although the LMMSE decoder has the same structure for information embedding in the DCT and magnitude of the DFT, its performance varies in these host domains. As explained earlier for LOD in the magnitude of the DFT domain, all the coefficients do not convey the information. On the other hand, the autocorrelation matrix is estimated using all the coefficients of the received signal and thus it causes degradation in the decoding performance.
To obtain a closed form expression of the error probability for the LMMSE decoder when SS scheme is exploited for information hiding in the DCT domain, by assuming that the test statistic z in (59) follows a Gaussian distribution, we can show that the probability of error would be in the form of (19) when In a similar way, the theoretical error probability of LMMSE decoder for ISS embedding in the DCT domain, using (3), obtains as where R r = A 2 ss T + (I l − kss T )R x (I l − kss T ).
In order to achieve a simpler expression of the WIR, by employing the matrix inversion lemma and after some manipulations, we have Taking the derivative of the WIR in (65) with respect to the parameter k gives the optimal value of k as Since all the required information are available during the encoding of the image, k can be calculated as above. Similarly, the theoretical error probability of the SS scheme in the magnitude of the DFT domain is obtained, using the embedding scheme (11), in the form of (19) where where R e = E{ee T }, and the theoretical error probability of the ISS scheme, using (22), be in the form of (19) where and R q = E{(e + q)(e T + q T )}.

Experimental results
In this section, simulations on real images are conducted to illustrate the performance of the proposed watermark decoders for decoding hidden message. A set of testing images, such as [13], with size 512 × 512 is employed for information embedding which includes "Boat", "Peppers", "Baboon", "Lena", and "Barbara" to represent almost a wide range of images. For information embedding in the DCT domain, for each 8 × 8 block of the image, the DCT coefficients are calculated and all coefficients except the dc one are used as the host signal to convey the hidden information, therefore, 63 coefficients are used for conveying of one bit of information. For information hiding in the DFT domain, since the coefficients should remain conjugate symmetric, 31 coefficients are employed. For the DCT-domain data hiding, determining an appropriate value of the shape parameter is important, though the details are out of scope of this article. One approach could be using the ML estimation [39,40]. In practice, to reduce the computational complexity, an alternative way is to use a constant value regardless of the specific image under analysis. One such constant value was suggested [35] as c = 0.8, and we use this value in our simulations for avoiding additional estimations. Our results are based on 100 simulation runs with using different signature codes, and since each block with size 8 × 8 is used for hiding one bit, the total number of embedded bits is 512 2 /8 2 = 4096 in each test image.
We first verify the theoretical error probabilities when employing the ML decoders proposed in Section 4 for both traditional SS and ISS embedding. For data hiding in the DCT domain, both the simulation results and the theoretical results are shown in Figure 1, where the bit error rate (BER) is plotted as a function of the data-towatermark ratio (DWR) defined in the form of DWR = 10 log( σ 2 x D ) . It should be mentioned that the average of the BER for five test images has been calculated as the final result. From Figure 1, it is clear that  the theoretical analysis and the simulated BER result match closely with each other, and this verifies the theoretical analysis of the error probability. Also, comparing the performances of SS and ISS, as expected, we note that ISS clearly outperforms the traditional SS.
Similar to the DCT-domain results, we also investigate the performances of ML decoders when the magnitude of the DFT-domain is exploited for information hiding. The average BER performance based on five testing images and the theoretical performance derived in Section 4 are shown in Figure 2. The consistency between the simulated and theoretical results proves the correctness of the provided error analysis in Section 4. From Figures 1 and 2, we note that, at low DWR, the theoretical results of the DCT domain data hiding are more accurate than that of the DFT magnitude domain. The reason is most likely due to the assumed Gaussian distribution of the test statistic, which is more true when more random variables are added together. Since the total number of available coefficients for information hiding in the DFT magnitude domain is only half of that of the DCT domain and, as explained in Section 4, and more DFT coefficients do not convey information when the DWR decreases, the Gaussian assumption imposed on the test statistics in (36) and (41) for the DCT domain embedding is more accurate than the test statistics in (16) and (27) for the DFT magnitude domain embedding. From Figures 1 and 2, we also note that data hiding in the magnitude DFT domain yields better decoding performances than that of the DCT domain. This observation could be explained by the special structure of the test statistic in (14), which is error free when r i + s i A < 0 or r is i A < 0 and thus leads to better decoding performances.
To gain more insight into the optimal and sub-optimal decoders derived in Sections 4 and 5, the decoding performances of the ML, LOD, and LMMSE decoders are compared in Figure 3 where the DCT-domain SS and ISS embedding schemes are studied, respectively. It is noted from these figures that the ML decoder outperforms the sub-optimal ones. The ML decoder uses the watermark amplitude information to make the decision, while the LOD and LMMSE decoders do not require this information and provide close, slightly worse performance to that of ML. The other point observed is that the LMMSE decoder yields slightly better decoding performance than the LOD one, which could be intuitively justified by the structure of LOD. In deriving LOD, it is assumed that the watermark amplitude is small, and the LOD shows close decoding performance to that of ML as long as this assumption is satisfied. As it is seen from Figure 3, the performance gap between the ML and the LOD gets bigger as the DWR decreases. This is because, by decreasing the DWR, the watermark amplitude gets larger and the LOD derived by truncating the second and higher orders of Taylor's series results in a coarser approximation of the ML decoder. The LMMSE does not impose any constraint on the watermark amplitude, and this might be one reason that LMMSE is slightly better than LOD. In addition, the LMMSE decoder does not need to estimate parameters of the host signal and this simplicity makes it attractive for decoding. Therefore, we suggest that the LMMSE decoder is generally a good choice for extracting the information hidden in the DCT domain, in the sense that it needs less information than the ML decoder yet yields close decoding performance.
To verify the derived theoretical decoding performance analysis in Section 5, the BER curves of the LOD and LMMSE decoders in the DCT domain are shown in Figures 4 and 5. We observe close matches between the theoretical BER performances and the performances calculated based on the simulations.
To compare the performances of the sub-optimal decoders for the DFT magnitude domain embedding, Figure 6 is reported. From Figure 6, we note that, for the SS embedding, the LOD decoder does not provide comparable BER performances when compared with other decoders. As discussed in Section 5, though not all DFT coefficients convey hidden information, the LOD and LMMSE decoders use all the received coefficients for decoding and thus have degraded decoding performances from that of ML. More specifically, at lower DWR, since less DFT coefficients can be used for conveying the information bit, the decoding performance gap is larger, as observed in Figure 6. It is observed from the Figure 6 that the slope of decoding performance of LOD becomes smaller as the DWR decreases, and the same behavior is observed for the LMMSE decoder in Figure 6. However, this behavior is not observed for the GML decoder, even though GML yields worse performances than the LMMSE decoder. The fact that GML estimates the unknown parameters and uses them to extract the hidden information might be the justification why the slope of its performance does not become smaller as the DWR decreases. From the Figure 6, we note that overall the LMMSE decoder outperforms other sub-optimal decoders.
To examine the performances of the proposed decoders in the presence of additional distortions/attacks, we consider a scenario, where an additive Gaussian noise is added into the watermarked images, and the decoders' performances are shown in Figures 7 and 8 for the DCT and the DFT magnitude domain embedding,   respectively. The DWR has been fixed to be 30 dB and the WNR varies between 0 to 10 dB, where WNR = 10 log( D σ 2 n ) and σ 2 n denotes the noise variance. From Figure 7, we note that the ISS scheme has better performance than SS, and that all sub-optimal decoders yield close decoding performances to each other. An interesting observation in Figure 8 is that the LMMSE and GML decoders outperform ML. Even though in the absence of any additional attack, ML in the DFT magnitude domain provides the best decoding performance, its performance in the presence of additional noise degrades significantly. This could be explained by the sensitivity of the ML decoder to the watermark amplitude. In the presence of additional noise, the received coefficients can be changed (e.g., the noisy coefficients could satisfy r i + S i A > 0 or r i + s i A > 0 even though they really do not convey any information), and thus are wrongly exploited for decoding and consequently degrade the performance of the ML decoder. On the other hand, the LMMSE and GML decoders do not depend on the watermark amplitude and can yield better performances than ML under additional noise.
In order to illustrate that the proposed modified SS and ISS schemes in the DFT magnitude domain always lead to positive watermarked coefficients, the histogram plots of the watermarked coefficients for the proposed modified schemes are provided in Figures 9 and 10. It could be seen that, as we expected, all coefficients are positive, supporting the intuitive rationale behind using the modified embedding schemes.
Further to justify the benefit of the modified SS scheme in the DFT magnitude domain, we compare the decoding performance of the proposed SS-based scheme with three existing SS-based methods in the DFT magnitude domain in Figure 11. One approach is to use the conventional correlator in SS [41,42]. The second approach is a decoder based on the Weibull distribution in SS [43], which does not take into consideration of the signs of r i + s i A and r is i A. For embedding the information into the DFT magnitude domain, the third approach is based on the MSS [13,43]. It is clear that the proposed modified scheme yields superior decoding performances over the conventional ones. The modified SS scheme provides better performance than the MSS probably because there are some error free cases in the proposed scheme.
Since some researchers suggested that magnitudes of DFT coefficients may follow PDF other than the Weibull, to check whether the Weibull distribution is a valid assumption, we estimate the PDF of the   coefficients based on the Weibull distribution and report empirical results for 15 images. Figure 12 reveals a close match between the empirical and Weibull-based PDFs, supporting the assumption of the Weibull distribution of the coefficients. To investigate the performance consistence on more images, the decoding performance of the proposed decoders are robust, the decoding performances of the ML decoders using the modified SS and ISS schemes in the DFT magnitude domain are shown in Figure 13 based on 100 images. We can see that the decoding performances for 5 and 100 images are similar.
In summary, some useful observations can be concluded from the experimental results: with the watermark amplitude information available at the receiver side, the DFT magnitude domain data hiding could result in better performances when the ML decoder is employed; With no access to the watermark amplitude information, the information embedding in the DCT domain data hiding is preferred, and the LMMSE decoder is preferred; When considering additional noise, data hiding in the DCT domain with ISS is preferred than the DFT magnitude domain ISS embedding. However, for SS embedding, the LMMSE decoder in the magnitude of the DFT domain provides fairly comparable performances to that of LMMSE in the DCT domain.

Conclusion
In this article, the optimal and sub-optimal decoders for additive spread spectrum data hiding were investigated. Overall, we presented a rigorous decoding analysis framework of additive spread spectrum and ISS data hiding when the information bit is embedded into the DCT and the magnitude of the DFT domains, respectively. Generalized Gaussian distribution and Weibull distribution were used for deriving the ML decoders in the different domains. To improve the accuracy of the extracted hidden message, we employed ISS embedding and presented the optimal ML decoders. The theoretical error analyses of SS and ISS embedding in the DCT domain and in the magnitude of the DFT domain were derived. Simulation results showed that, when the watermark amplitude is available at the decoder side, data hiding in the magnitude of the DFT domain could yield better decoding performances than that of the DCT domain.
Though theoretically the ML decoder achieves the decoding performance upper bound, it requires additional prior information such as the watermark amplitude. To relax the requirements on such prior information, the LOD and LMMSE decoders were derived for practical data hiding applications in the DCT domain. The LOD decoder is independent of the watermark amplitude, though it still requires the host signal parameters. The LMMSE decoder provides a linear decoder in terms of the received signal which is independent of the watermark amplitude. The LOD and LMMSE decoders yield performances close to that of the ML decoder, with the LMMSE being slightly better than the LOD, especially at low DWR. For the proposed sub-optimal decoders, we also provided the theoretical analysis of the bit error rate decoding performances.
The sub-optimal LOD was also proposed in the DFT magnitude domain. However LOD does not provide close performances to that of the ML decoder, probably because that LOD uses all the received coefficients for decoding. In order to address this issue, the GML decoder was proposed to provide an estimate of the watermark amplitude and the bit information. Although GML could tackle the LOD deficiency at low DWR, its performance is much worse than that of ML in the absence of any additional attack/distortion. The LMMSE decoder in the DFT magnitude domain shows better performance than that of the LOD and GML decoders.
The simulation results suggest that, with no access to the watermark amplitude information at the decoder side, the sub-optimal decoders in the DCT domain are more reliable than their counterparts in the DFT magnitude domain. Among the proposed sub-optimal decoders, overall the LMMSE decoders are preferred. As expected, the ISS embedding scheme outperforms SS in both the DCT and the DFT magnitude domains, and thus is preferred. Simulations in the presence of additional noise showed that ISS embedding in the DCT domain is preferred. The GML and LMMSE decoders are preferred in the presence of additional noise than the ML one for data hiding in the magnitude of DFT domain.