Decentralized Estimation over Orthogonal Multiple-access Fading Channels in Wireless Sensor Networks - Optimal and Suboptimal Estimators

Optimal and suboptimal decentralized estimators in wireless sensor networks (WSNs) over orthogonal multiple-access fading channels are studied in this paper. Considering multiple-bit quantization before digital transmission, we develop maximum likelihood estimators (MLEs) with both known and unknown channel state information (CSI). When training symbols are available, we derive a MLE that is a special case of the MLE with unknown CSI. It implicitly uses the training symbols to estimate the channel coefficients and exploits the estimated CSI in an optimal way. To reduce the computational complexity, we propose suboptimal estimators. These estimators exploit both signal and data level redundant information to improve the estimation performance. The proposed MLEs reduce to traditional fusion based or diversity based estimators when communications or observations are perfect. By introducing a general message function, the proposed estimators can be applied when various analog or digital transmission schemes are used. The simulations show that the estimators using digital communications with multiple-bit quantization outperform the estimator using analog-and-forwarding transmission in fading channels. When considering the total bandwidth and energy constraints, the MLE using multiple-bit quantization is superior to that using binary quantization at medium and high observation signal-to-noise ratio levels.


I. INTRODUCTION
Wireless sensor networks (WSNs) consist of a number of sensors deployed in a field to collect information, e.g., measuring physical parameters such as temperature and humidity.Since the sensors are usually powered by batteries and have very limited processing and communication abilities [1], the parameters are often estimated in a decentralized way.In typical WSNs for decentralized estimation, there exists a fusion center (FC).The sensors transmit their locally processed observations to the FC without inter-sensor communications, and the FC generates the final estimation based on the received signals [2].
Both observation noise and communication error deteriorate the performance of the decentralized estimation.Traditional fusion based estimators are able to minimize the mean square error (MSE) of the parameter estimation by assuming perfect communication links (see [3] and references therein).They reduce the observation noises by exploiting the redundant observations provided by multiple sensors.However, their performance will degrade dramatically when communication errors cannot be ignored or corrected.On the other hand, various wireless communication technologies aimed at achieving transmission capacity or improving reliability cannot necessarily minimize the MSE of the parameter estimation.For example, although diversity combining reduces the bit error rate (BER) of communications, it requires that the signals transmitted from the sensors are identical, which is not true in the context of WSNs due to the observation noises appeared in sensors.This motivates the joint optimization of the communication-oriented diversity combination and data fusion-oriented estimator at the FC under realistic observation and channel models, which uses MSE of the parameter estimation as the performance metric.
The bandwidth and energy constraints are two most important issues that are addressed in WSNs.When strict bandwidth constraint is taken into account, the decentralized estimation when the sensors only transmit one bit (binary quantization) for each observation is studied in [4]- [9].Among them, [4], [5] introduce the maximum likelihood estimation (MLE) and discuss the optimal quantization when the communication channels are noiseless.Also considering noiseless channels, [6] proposes a universal and isotropic quantization rule, and [8], [9] study the adaptive binary quantization methods.When channels are noisy, [7] studies the MLE in additive white Gaussian noise (AWGN) channels and introduces several low complexity suboptimal estimators.It has been found that the binary quantization is sufficient for decentralized estimation at low observation signal-to-noise ratio (SNR), whereas the sensors need to transmit a few extra bits when the observation SNR is high [4].When the energy constraint and general multilevel quantizers are considered, the decentralized estimation is studies under various channels.When communications are error-free, quantization at the sensors are designed in [10]- [14].The optimal trade-off between the number of active sensors and the quantization bit-rate of each sensor is investigated under total energy constraint in [15].In binary symmetrical channels (BSCs), the power scheduling is proposed to reduce the estimation MSE when the best linear unbiased estimator (BLUE) and a quasi-BLUE, where quantization noise is taken into account, are used at the FC [16], [17].To the best of the authors' knowledge, the optimal decentralized estimator using multiple-bit quantization in fading channels is still not available.Although the MLE proposed in AWGN channel [7] can be applied for fading channels if the channel state information (CSI) is known at the FC, it only considers binary quantization.
Besides the decentralized estimation based on digital communications, the estimation based on analog communications receives considerable attentions due to the important conclusions drawn from the studies in multi-terminal coding problem [18], [19].The most popular scheme is amplify-andforward (AF) transmission, which is proved to be optimal in quadratic Gaussian sensor networks with AWGN multipleaccess channels (MACs) [20].The power scheduling and energy efficiency of the AF transmission are studied under AWGN channels in [21] and [22].It shows that the AF transmission is more energy-efficient than that of digital communications with certain coding and modulation schemes.In fading channels, the AF transmission is not optimal any more both in orthogonal MACs [23]- [25] and in non-orthogonal MACs [26].The outage laws of the estimation diversity with AF transmission in fading channels are studied in [24] and [25] in different asymptotic regimes.These studies, especially the results in [23], indicate that the separate source-channel coding scheme outperforms the AF transmission, which is a simple joint source-channel coding scheme, and is actually optimal in fading channels with orthogonal multiple-access protocols.
In this paper, we develop the optimal and suboptimal decentralized estimators for a deterministic parameter considering digital communication systems.The observations of sensors are quantized, coded and modulated, then transmitted to the FC with orthogonal MACs over Rayleigh fading channels.Uniform quantization is used since it is optimal for deterministic parameters.Because the binary quantization is only applicable for low observation SNR levels [4], [15], a general multi-bit quantizer is considered.
We strive for deriving the MLE and feasible suboptimal estimator when different local processing and communication strategies are used.To this end, we first present a general message function to represent various quantization and transmission schemes.We then derive the MLE for an unknown parameter with known CSI at the FC.In typical WSNs, the sensors usually cannot transmit too many training symbols for the receiver to estimate or track channel coefficients due to both energy and bandwidth constraints.Therefore, we will also consider the case that the CSI is unknown at the FC when no or only a few training symbols are available, which is of practical significance.In order to reduce the computational complexity, we will introduce suboptimal estimators following the hint provided by the structure of the MLEs.
Our contributions are summarized as follows.We develop the decentralized MLEs with known and unknown CSI at the FC over orthogonal MACs with Rayleigh fading.The performance of the MLEs serves as the practical performance lower bounds of the decentralized estimation in orthogonal MACs.To provide feasible estimators with affordable complexity, we propose a suboptimal algorithm, which can be viewed as modified expectation-maximization (EM) algorithm [27].By studying the special cases for error-free communications or noiseless observations, we show that the MLEs degenerate into the well-known centralized fusion estimator-BLUE, or the maximal ratio combiner (MRC) based estimator when CSI is known and a subspace based estimator when CSI is unknown.This indicates that our estimators can exploit both data level redundancy and signal level redundancy provided by the multiple sensors when both observation noises and communication errors are present.By introducing a general message function that can describe various quantization and transmission schemes, the proposed decentralized estimators can also be applied for the WSNs where AF transmission or digital transmission with binary quantization are used.Therefore, our estimators can bridge the gap between the estimators using two extreme case quantization.
The rest of the paper is organized as follows.Section II describes the system models we considered.Section III presents the MLEs with known and unknown CSI, and Section IV introduces the suboptimal estimators.In Section V, we analyze several special cases of the MLEs.In Section VI, we discuss the codebook design issue, the computational complexity, and the asymptotic performance of the presented MLEs.Simulations are provided in Section VII, and the conclusions are given in Section VIII.

II. SYSTEM MODEL
We consider a typical kind of WSNs that consists of N sensors and a FC to measure an unknown deterministic parameter θ, where there are no inter-sensor communications among the sensors.The sensors transmit their quantized observations to the FC over Rayleigh fading channels.Assume that the sensors use ideal orthogonal multiple-access protocols, such as TDMA and FDMA, to transmit their signals to the FC.Then the FC can separate the received signals from different sensors without inducing interference.
Figure 1 is a diagram of the decentralized estimation system considered.The sensors process their observations for the parameter θ before transmission.For digital communications, the processing includes quantization, channel coding and modulation, etc.For analog communications, the processing may be simply amplification before transmission.A function c(x), named as messaging function, is used to describe the local processing for both digital and analog communication systems.The transmission signals of the sensors arrive at the FC through independent Rayleigh fading channels, and the FC uses the received signals to estimate θ.In the subsequent sections, we will first derive the decentralized estimators considering digital communication, then extend the results to the case using analog communication.

A. Observation Model
The observation for the unknown parameter θ provided by the i-th sensor is, where n s,i is the independent identically distributed (i.i.d.) Gaussian observation noise with zero mean and variance σ 2 s , and θ is bounded within a dynamic range [−V, +V ].

B. Quantization, Coding and Modulation
We use the messaging function c(x)|R → C L , which maps the observations to transmission symbols, to represent all the processing at the sensors including quantization, coding and modulation.To facilitate the analysis, the energy of the transmission symbols is normalized to 1, We consider the uniform quantization, which is optimal for deterministic parameters or for parameters with unknown statistics.For an M -level uniform quantizer, define the granular region of the quantizer as [−W, +W ], then all the possible quantized values of the observations can be written as, where ∆ = 2W/(M − 1) is the quantization interval.
The observations are rounded to the nearest S m , therefore c(x) is a piecewise constant function described as, where T is the L symbols corresponding to the quantized observation S m to be transmitted, Under the assumption that W is much larger than the dynamic range of θ, the probability that |x i | > W can be ignored.Then c(x) is simplified as, Define the transmission codebook as, which can be used to describe any coding and modulation scheme following the M -level quantization.
The sensors can use various codes such as natural binary codes to represent the quantized observations.Since the focus of this paper is to design the decentralized estimators, we will not optimize the transmission codebook for the parameter estimation.

C. Received Signals
Since we consider orthogonal MACs, we assume that the FC can perfectly separate and synchronize to the received signals from different sensors.Assume that the channels are block fading, i.e., the channel coefficients are invariant during the period that sensors transmit L symbols representing one observation.After matched filtering and symbol-rate sampling, the L received samples corresponding to the L transmitted symbols from the i-th sensor can be expressed as, where T , h i is the channel coefficient subjecting to complex Gaussian distribution with zero mean and unit variance, n c,i is a vector of thermal noise at the receiver subjecting to complex Gaussian distribution with zero mean and covariance matrix σ 2 c I, and E d is the transmission energy for each observation.

III. OPTIMAL ESTIMATORS WITH OR WITHOUT CSI
In this section, we derive the MLEs when CSI is known or unknown at the receiver of the FC, respectively.The MLE using training symbols in the transmission codebook is also studied as a special form of the MLE with unknown CSI.

A. MLE with Known CSI
Given θ, the received signals from different sensors are statistically independent.If the CSI is known at the receiver of the FC, the log-likelihood function is, T is the channel coefficients vector, and p(x|θ) is the conditional probability density function (PDF) of the observation given θ.Following the observation model shown in (1), we have, According to the received signal model shown in (7), the PDF of the received signals given CSI and the observation of the sensors is, where z 2 = (z H z) 1/2 is l 2 norm of vector z.Substitute ( 9) and ( 10) to (8), then the log-likelihood function becomes, where a = log is a constant that does not affect the estimation.From now on, we will omit the constant when we write likelihood functions for simplicity.Now we consider the form of the likelihood function for digital communications, where c(x) is a piecewise constant function described in (5).Substituting (5) to (11), we have, where p(y i |h i , c m ) is the PDF of the received signals given CSI and the transmitted symbols of the sensors, which is, and p(S m |θ) is the probability mass function (PMF) of the quantized observation given θ, which is, where The log-likelihood function in ( 15) is non-concave and has multiple extrema.It is difficult to find a closed-form expression of θ or to compute θ using high efficient numerical methods.

B. MLE with Unknown CSI
When the CSI is unknown at the FC, the log-likelihood function is, which has a similar form to the likelihood function with known CSI shown in (8).
According to the received signal model shown in (7), given x, y i subjects to zero mean complex Gaussian distribution, i.e.,

p(y
where R y is the covariance matrix of y i , which is, It is readily to find that one eigenvalue of R y equals to , and all other eigenvalues equal to σ 2 c .Thereby the determinant of R y is, Following the Matrix Inverse Lemma, we have, Substituting ( 19) and ( 20) to (17), p(y i |x) becomes, where α = is a constant.Upon substituting ( 21) and ( 14) to ( 16), the log-likelihood function becomes, Then, the MLE is obtained as,

C. MLE with Unknown CSI using Training Symbols
In typical communication systems, the transmitted symbols may consist of training symbols to facilitate channel estimation.In this subsection, we will analyze the MLE for such transmission schemes.
Define c p as a vector which consists of L p training symbols.Each transmission for an observation will begin with transmitting c p , then transmitting the data symbols defined as c d (x).Thus the messaging function becomes, Upon substituting this expression to (22) and ignoring all the terms that does not affect the estimation, we obtain the likelihood function as, where y i,p and y i,d are the received signals corresponding to the training symbols and data symbols, respectively, and ) is a constant.Now we show that c H p y i,p in (25) can be regarded as the minimum mean square error (MMSE) estimate for the channel coefficient h i except for a constant factor.Since both h i and the receiver thermal noise are complex Gaussian distributed, the MMSE estimate of h i is equivalent to its linear MMSE estimate, which is, where and R yp is the covariance matrix of y i,p .
According to the received signal model, we have and With the Matrix Inverse Lemma, R −1 yp is expressed as, Upon substituting ( 28) and ( 29) to ( 26), the MMSE channel estimate becomes, (25), we obtain, In the following, we will show that the MLE in this case is equivalent to a two-stage estimator.During the first stage, the FC uses (30) to obtain the MMSE estimate of h i .It can be modeled as ĥi = h i + ǫ hi , where ǫ hi is the estimation error subjecting to complex Gaussian distributed with zero mean and variance Lσ 2 c /(Lσ 2 c + L p E d ).During the second stage, the FC conducts the MLE using ĥi .
Substituting ĥi to (7), the received signal of the data symbols becomes, where n ci,d is the receiver thermal noise.Deriving the conditional PDF p(y i,d | ĥi , x) with (32), we can obtain a likelihood function which is exactly the same as (31).This implies that the MLE with unknown CSI can exploit the available training symbols implicitly to provide the optimal channel estimation and then use it to provide the optimal estimation of θ.
Note that the likelihood function in (31) is different from the likelihood function that uses the estimated CSI as the true value of the channel coefficients, which is, This is a coherent estimator.By contrast, there exist both a coherent term ℜ{ ĥi y H i,d c d (x)} and a non-coherent term (31).This means that the MLE shown in (31) uses the channel estimate as "partial" CSI after accounting for the channel estimation errors.The true value of the channel coefficients contained in the channel estimation corresponds to the coherent term of the log-likelihood function, whereas the uncertainty in the channel estimation, i.e., the estimation errors, leads to the non-coherent term.We will compare the performance of the two estimators through simulations in Section VII.

IV. SUBOPTIMAL ESTIMATOR
In the previous section, we developed the optimal estimators for the considered decentralized estimation systems, which are not feasible for practical systems due to their prohibitive computational complexity.Nevertheless, their performance can serve as the practical upper-bound when both observation noise and communication error are present, and their structure provides us some hint to derive low complexity suboptimal estimators.In this section, we take the suboptimal estimator with known CSI as an example.The estimator with unknown CSI can be obtained following the same principle.
We first consider an approximation of the PMF p(S m |θ).Following the Lagrange Mean Value Theorem, there exists ξ in an interval [ If the quantization interval ∆ is small enough, we can let ξ equal to the middle value of the interval, i.e., ξ = (S m −θ)/σ s , and obtain an approximate expression of the PMF as, Substituting (35) to (12) and computing its partial derivative with respect to θ, the likelihood equation is simplified as, which is the necessary condition for the MLE of θ.
Unfortunately, we cannot obtain an explicit estimator for θ from this equation because the right hand side of the likelihood equation also contains θ.However, considering the property of the conditional PDF, (36) becomes, If we assume that θ in the right hand side of the likelihood equation is known, the right hand side of (37) is actually the MMSE estimator of S mi , i.e., Ŝmi = E [S mi |y i , h i , θ].This indicates that the MLE can be regarded as a two stage estimator.During the first stage, it estimates S mi , i = 1, • • • , N , with the received signals from each sensor.During the second stage, it combines Ŝmi by a sample mean estimator.These two stages are consistent with the two steps of the EM algorithm [27].
The first stage is the expectation step (E-step) and the second stage is the maximization step (M-step) of the algorithm.The set of the quantized observations S mi , which is the sufficient statistics for estimating θ, is the complete data of the EM algorithm.
We present a suboptimal estimator with a similar two-stage structure.This estimator can be viewed as a modified EM algorithm.Because the likelihood function shown in (12) has multiple extrema and the equation shown in (36) is only a necessary condition, the initial value of the iterative computation is critical to the convergence of the iterative algorithm.To obtain a good initial value, the suboptimal estimator estimates S mi by assuming it to be uniformly distributed.Furthermore, since the estimate quality of the first stage is available, we use BLUE to obtain θ for exploiting the quality information, instead of using the MLE in the M-step as in the standard EM algorithm.
During the first stage of the iterative computation, the suboptimal algorithm estimates S mi under MMSE criterion.This estimator requires a priori probability of S mi which depends on the unknown parameter θ.The initial distribution of S mi is set to be uniform distribution.After obtained a temporary estimate of θ, we can apply it to update the a priori probability of S mi and estimate S mi iteratively.The MMSE estimator during the first stage is as follows, where p(y i |h i , S mi ) equals p(y i |h i , c m ) in (13), and p(S mi ) is the estimate for a priori PDF of S mi .Obtained θ, we use p A (S mi | θ) to update p(S mi ), i.e., let p(S mi ) = p A (S mi | θ).
In this section, we omit θ in p(S mi ) for notational simplicity, though it depends on θ.
Now we derive the mean and variance of Ŝmi , which will be used in the BLUE of θ.
If p(S mi ) equals to its true value, the MMSE estimator in However, p(S mi ) in our algorithm is inaccurate since we use θ instead of θ.The MMSE estimate may be biased, but it is hard to obtain this bias in practical systems.We regard the MMSE estimator as an unbiased estimate in our suboptimal algorithm.
Given h i and y i , the variance of the MMSE estimate can be derived as, Then the BLUE for estimating θ is, The iterative algorithm can be summarized as follows: S1) Let p(S mi ) = 1/M as the initial value.S2) Compute Ŝmi , i = 1, • • • , N , and its variance with (38) and (40).S3) Substitute Ŝmi and its variance to (41) to get θ.S4) Update p(S mi ) using p A (S mi | θ).S5) Repeat step S2) ∼ S4) until the algorithm converges or a predetermined number of iterations is reached.Note that this suboptimal algorithm differs from the one proposed in [7], which applies the maximal a posteriori (MAP) criterion to detect binary observations of the sensors, then uses the results as the true value of the observations in the MLE derived in noise-free channels.Our suboptimal algorithm inherits the structure of the MLE developed in fading channels, which gives "soft" estimates of the quantized observations at first, and combines them with a linear optimal estimator afterward.By conducting these two stages iteratively, the estimation accuracy improves rapidly.Although the suboptimal algorithm may converge to some local optimal solutions due to the non-convexity of the original optimization problem, it still performs fairly well as will be shown in the simulation results.The convergence of the algorithm will be studied by simulations in Section VII.

V. SPECIAL CASES OF THE MLES
To gain some insights on the decentralized MLE, in this section, we first study two special cases of the MLEs when either the observations of the sensors or the communications are perfect.After that, we discuss the form of the MLE with known CSI using two extreme case quantization, which are the AF transmission (infinitesimal quantization resolution) and the 1-bit quantization (rough most resolution).This will provide the connections of the derived MLE with existing well-studied optimal estimators in these special cases.

A. Ideal Observations and Ideal Communications
Considering the approximate expression of the PMF shown in (35), the likelihood equation with known CSI is approximated as, This tells us that the MLE exploits both the signal level information y i and the level information S m when the quantization interval is small enough.If the observations are perfect, say, no observation noises, we will show that the MLE degenerates into a signal level optimal combiner-MRC followed by data demodulation and parameter reconstruction.On the other hand, if the communications are perfect, we will show that the MLE reduces to a data level optimal fusion estimator-BLUE.
When CSI is unknown, we draw similar conclusions, except that the MLE becomes a subspace-based estimator followed by data demodulation and parameter reconstruction when there are no observation noises.
1) Noiseless Observations: When the observations of the sensors are ideal, i.e., where δ(x) is the Dirac-delta function.
We first consider the MLE with known CSI.Substituting (43) to (8) and ignoring all terms that do not affect the estimation, the log-likelihood function is simplified as, Since c(θ) is a piecewise constant function that is not derivable, we cannot compute the partial derivative of (44) with respect to θ.Instead, we first regard c(θ) as the parameter under estimated and obtain the MLE for estimating c(θ).The log-likelihood function in (44) is concave with respect to c(θ) and its only maximum can be obtained by solving the equation ∂ log p(Y|h, θ)/∂c(θ) = 0, which is, Then we can use it as a decision variable to detect the transmitted symbols and reconstruct θ according to the quantization rule with the detection results.
It shows that when the observations are perfect, the structure of the MLE is the MRC concatenated with data demodulation and parameter reconstruction.This is no surprise since in this case the signals transmitted by different sensors are all identical, the receiver at the FC is able to use the traditional diversity technology to reduce the communication errors.Meanwhile, it is unnecessary to use the redundant observations for data fusion.
We then consider the MLE with unknown CSI.Upon substituting (43) to ( 16) and ignoring all terms that do not affect the estimation, the log-likelihood function becomes, Again, we regard c(θ) as the parameter to be estimated.Recall that the energy of c(θ) is normalized.Then the problem that finds c(θ) to maximize (46) is a solvable quadratically constrained quadratic program (QCQP) [28], The solution of (47) can be obtained following the results about QCQP in [28], where v max (M) is the eigenvector corresponding to the maximal eigenvalue of the matrix M.This shows that when CSI is unknown at the FC in the case of noise-free observations, the MLE becomes a subspacebased estimator.
2) Noiseless Communications: When σ 2 c → 0, we have It means that y i is merely decided by c mi , or equivalently decided by S mi .Then the log-likelihood function becomes a function of the quantized observation S mi .
We first consider the MLE with known CSI.The loglikelihood function with ideal communications is, log p(Y|h, θ) → log p(S|h, θ) where Computing the derivative of (49), we have the likelihood equation, (50) Generally, this likelihood equation has no closed-form solution.Nonetheless, the closed-form solution can be obtained when the quantization noise is very small, i.e., ∆ → 0. Under this condition, S mi → x i and (50) becomes, The MLE obtained from (51) is, It is also no surprise to see that the MLE reduces to BLUE, which is often applied in centralized estimation [17], where the FC can obtain all raw observations of the sensors.
When the CSI is unknown at the FC, the receiver of the FC can recover the quantized observations of the sensors with error-free if the proper codebook, which will be discussed in Section VI-A, is applied.Then the MLE with unknown CSI also degenerates into the BLUE shown in (52).This is reasonable since only the structure of communication depends on the channel information.
The special cases of the MLEs with noiseless observations or noiseless communications are summarized in Table I.

B. AF Transmission and Binary Quantization 1) AF Transmission:
Although the estimators we derived until now consider digital communications, they can also be applied when using the AF transmission, because the messaging function we introduced in Section II can describe the AF transmission as well.
The messaging function for AF transmission is, where α is the amplification gain.Since c(x) reduces to a scalar, we rewritten it as c(x).
For AF transmission, we rewrite the energy normalization condition as s , this condition is satisfied.Because θ is an unknown nonrandom parameter, we cannot obtain E[θ 2 ].To solve this problem, we assume that θ is a random variable uniformly distributed in [−V, +V ], then we have E[θ 2 ] = V 2 /3. 1he received signal at the FC is, Substituting ( 53) and ( 54) to the log-likelihood function with known CSI shown in (11), we can obtain the MLE with AF transmission.The derivation to obtain the MLE in this way is rather involved due to the cross-correlation between the real and imaginary parts of the received signals.In the following, we will give an alternative derivation that is simple.
We first find a vector of sufficient statistic, then derive the log-likelihood function using this vector as an observation vector.When The real and imaginary parts of h * i y i are statistically independent and Gaussian distributed.We can find that the mean and variance of the real part is , and the mean and variance of the imaginary part is zero and |h i | 2 σ 2 c /2. Ignoring the constant not associated with the MLE, we can obtain the log-likelihood function as, and the likelihood equation as, where ℜ{z} and ℑ{z} are the real and imaginary part of z, respectively.
The log-likelihood function has only one maximum that can be obtained by solving the likelihood equation, which can degenerate to the optimal estimator proposed in [24] under the assumptions therein.The asymptotic performance of the AF transmission in fading orthogonal MACs are analyzed in [24] and [25].
2) Binary Quantization: Considering the stringent constraint on the bandwidth of WSNs, many contributions assume that the sensors use binary quantizer as the local processor.Our estimators can apply when binary quantizer is used.Based on the system models in Section II, the messaging function with binary quantizer and binary phase-shift keying (BPSK) modulation is where τ is the quantization threshold, which equals to 0 for the uniform quantizer we considered.Substituting c 1 (x) to the likelihood function shown in ( 11), we have, where F ns (x) is the cumulative distribution function (CDF) of the observation noise.Define then (60) can be simplified as This is the same as the likelihood function shown in ( 9) of [7] except for the presence of the channel coefficients since we consider fading channels.

A. Transmission Codebook Issues
When digital communications are used, the transmission codebooks can represent various quantization, coding and modulation schemes.Here we discuss the impact of the codebooks on the decentralized MLEs.
We rewrite the conditional PDF with known CSI shown in (10) as, Comparing the conditional PDF with unknown CSI p(y i |x) shown in (21) with p(y i |h i , x) shown in (64), we see that both PDFs depend on the correlation between the received signals y i and the transmitted symbols c(x).With known CSI, the optimal estimator is a coherent algorithm, since (64) relies on the real part of the correlation, y H i c(x).With unknown CSI, the optimal estimator is a non-coherent algorithm, since (21) depends on the square norm of y H , both MLEs depend on the crosscorrelation of the transmission symbols, c H (x i )c(x).
Taking digital communications as an example, if there exist two transmission symbols c m and c n in the transmission codebook which have the same norm, then p(y i |x) will have two identical extrema since the MLE with unknown CSI only depends on |y H i c(x)| 2 .Such a phase ambiguity will lead to severe performance degradation to the decentralized estimator.Therefore, the auto-correlation matrix of the codebook plays a critical role on the performance of the MLE, especially when CSI is unknown.
Many transmission schemes have this phase ambiguity problem.For example, when the natural binary code and BPSK modulation are applied to represent each quantized observation and to transmit, for any c m in such a transmission codebook, defined as C tn , there exists c m ′ in C tn that satisfies c m ′ = −c m .Therefore, C tn is not a proper codebook.
In order to cope with the phase ambiguity problem inherent in the codebook C tn , we can simply insert training symbols into the transmission symbols.Though heuristic, this approach can provide fairly good performance because the MLE can exploit the training symbols to estimate the channel coefficients implicitly as we have shown.
Since the MLEs are associated with the auto-correlation matrix of the transmission codebook, this allows us to enhance the performance of the estimators by systematically designing the codebook.Nonetheless, this is out of the scape of this paper.Some preliminary results for optimizing the transmission codebooks are shown in [29].

B. Asymptotic Performance of the MLEs with respect to N
We first consider the Cramér-Rao lower bound (CRLB) when CSI is unknown at the FC, which is, where p ′ (S m |θ) and p ′′ (S m |θ) is the first and second order partial derivatives with respect to θ, respectively.
It shows that the CRLB of the MLE with unknown CSI decreases with the factor of 1/N , which is the same as the BLUE lower bound of the centralized estimation [17].This is due to the fact that given θ, the received signals y i from different sensors are statistically identical distributed and independent among each other.Therefore, all these signals contribute equally for reducing the estimation errors.
When CSI is available at the FC, given h i , the received signals are no longer identical distributed.In this case, the CRLB will depend on the channel realization, which is very hard to derive.However, since more information can be exploited for estimation, we can infer that the CRLB with known CSI is always lower than that with unknown CSI.In other words, the asymptotic performance of the MLE with known CSI will be no worse than that of the MLE with unknown CSI.

C. Computational Complexity 1) MLE:
We take the MLE with known CSI as an example to analyze the computational complexity.The analysis for the MLE with unknown CSI is similar.
The MLE can be found by performing exhaustive searching.In order to make the MSE introduced by the discrete searching neglectable, we let the searching step-size be less than ∆/N , Then we need to compute the value of the likelihood function at least M × N times to obtain the MLE.
The FC applies ( 12), (13), and (14) to compute the values of the likelihood function with different θ.The exponential term in ( 13) is independent of θ, thus it can be computed before searching and be stored for future use.
Given θ, we still need to compute p(S m |θ), m = 0, • • • , M − 1, which complexity is O(M ), then to conduct M additions and M multiplications to obtain each value of the likelihood function.Thus the computational complexity for getting one value of log p(Y|h, θ) is O(M N ).
After considering the operations required by the exhaustive searching, the overall complexity of the MLE is O(M 2 N 2 ).
2) Suboptimal Estimator: The estimator presented in Section IV uses an iterative algorithm.For each iteration, we need to get Ŝmi and its variance with (38) and (40), then obtain the estimate of θ with (41).The complexity is similar to that of computing the log-likelihood function, which is O(M N ).
If the algorithm converges after I t times iteration, the complexity of the suboptimal estimator will be O(I t M N ).

VII. SIMULATIONS
We use the MSE of estimating θ as the metric to evaluate the performance of the estimators.The observation SNR considered in simulations is defined as [13], We use E d , the energy consumed by each sensor to transmit one observation, to define the communication SNR in order to fairly compare the energy efficiency of the estimators with different transmission schemes.The communication SNR is then, The codebooks used in the simulations are summarized in Table II.Consider the general features of WSNs that short data packets are usually transmitted and each sensor is of low cost.We use a simple error-control coding scheme, cyclic redundancy check (CRC) codes with generator polynomial G(x) = x 4 + x + 1, as an example of the coded transmission.Its codebook is denoted as C tc .For comparison, uncoded transmission is also evaluated, which codebook is denoted as C tn .We consider BPSK modulation to generate all codebooks.Because the code length of the uncoded transmission is shorter than that of the coded transmission, the energy to transmit each symbol will be higher for the given E d .Due to the phase ambiguity problem discussed in Section VI-A, we use the codebook with training symbols (TS) C tp whenever we evaluate the estimators with unknown CSI unless specified.Two estimators with ideal communications are shown as the baseline, which MSEs can be served as the performance lower bound.They are BLUE which MSE is σ 2 s /N , and Quasi-BLUE when considering quantization noise [17].The Quasi-BLUE bound is a more practical lower bound for comparison since we consider quantization for all estimators in this paper only except for the estimator with AF transmission.

A. Influence of the Quantization Bit-Rate
We first examine the impact of quantization bit-rate of the sensors.Three WSNs are considered, where the sensors in different WSNs use different quantization bit-rates.The three quantization bit-rates are set to be K = 1, 2, and 4, respectively.The sensors apply C tn as the transmission codebook, where the length of the transmitted symbols is L = K.In this simulation, we let both the total energy and the total bandwidth consumed by the networks to be identical when using different quantization levels.Due to the total network bandwidth constraint, the numbers of the active sensors when K = 1, 2, and 4 are 40, 20, and 10, respectively.Due to the total network energy constraint, the energy consumed by each sensor to transmit one observation is also different.For example, if the transmitted energy of a sensor when N = 40 is E d , the transmitted energy is 2E d and 4E d when N = 20 and 10, respectively.
We compare the MSEs of the MLE with known CSI and the Quasi-BLUE lower bound for different quantization bitrates in Fig. 2. It is shown that low quantization bit-rate is only applicable for the cases with extremely low observation SNR.At medium and high observation SNR levels, the optimal estimator with 1 or 2-bit quantization is inferior to that with  4-bit quantization under the same constraints of total energy and bandwidth.This indicates that we should design the quantization bit rate according to the observation SNR to reduce the resource consumption of the network.For high observation SNR, we should employ high bit-rate with reduced number of the active sensors.We note that similar conclusion is drawn in [15] except that [15] considers error-free communications whereas we consider the communications over fading channels.

B. Convergence of the Suboptimal Estimators
We then study the convergence of the suboptimal estimators.Figure 3 depicts the MSEs of the suboptimal estimators as a function of the number of iterations.It is shown that the MSEs of the suboptimal estimators will converge after two iterations at different communication SNRs no matter if CSI is known.To demonstrate the performance gain of the proposed estimators that jointly optimize demodulation and parameter estimation, two traditional fusion based estimators and a MRC based estimators are simulated.In the fusion based estimators, the FC first demodulates the transmitted data from each sensor, then reconstructs the observation of each sensor from the demodulated symbols following the rule of quantization, afterward combines these estimated observations with BLUE fusion rule to produce the final estimate of θ.When ECCs are applied at the sensors, the receiver at the FC will exploit its error detection ability to discard the data that cannot pass the error check.In the MRC based estimator, the FC first combines the received signals from all sensors, then demodulates the transmitted symbols.Finally the FC obtains the estimate of θ using the detected symbols according to the quantization rule.

C. MSE versus the Communication SNR
Except for the fusion based estimator with ECC uses codebook C tc , all estimators use codebook C tn in this simulation.The MSEs of the estimators with known CSI as a function of communication SNR when N = 10, γs = 20 dB.In the legend, "Fusion-CRC" and "Fusion-NoECC" stand for two fusion based estimators using the codebook Ctc and Ctn, "MRC" stand for the MRC based estimators, "Analog AF" stands for the MLE with AF transmission, "MLE" and "Subopt" denotes the MLE and suboptimal estimators, respectively, and "Q-BLUE Bound" and "BLUE Bound" stands for two lower bounds with ideal communications.
It is shown that the MLE and suboptimal estimators outperform both the MRC based and the fusion based estimators.The MSEs of the MLE and suboptimal estimator approach the Quasi-BLUE lower bound rapidly along with the increasing of the communication SNR, whereas the suboptimal estimator degrades a little at low SNR.The MSE of MLE using AF transmission is larger than that using digital transmission, since AF transmission is no longer optimal in fading channels.
According to the performance analysis for BPSK modulation in Rayleigh fading channels [30], the BER of the transmission scheme with codebook C tn exceeds 0.15 when the communication SNR is lower than 3 dB.ECC can improve the transmission performance for high communication SNR, but it causes more errors for low SNR.For the transmission schemes using CRC, the BER is even worse because long codes will reduce the transmission energy per symbol.For such a high BER, the fusion based estimators, especially those with ECCs, do not perform well.Most of the demodulated data will be dropped due to the error check, thus the fusion estimators do not have enough information to exploit, which finally leads to worse MSE performance.
The performance of the estimator based on the MRC is much worse than the proposed estimators, which shows the significant impact of the observation noise.In Fig. 5, the MSEs of the MLEs with unknown CSI are shown.Two MLEs, which differently use training symbols, are considered.One is the MLE with training symbols as shown in (31), and the other is the estimator as in (33), which uses the estimated channel coefficients as their true values.Besides the codebook C tn without training symbols, we also evaluate the codebooks with 2 and 5 training symbols.It is shown that if C tn is applied as the codebook for the MLE with unknown CSI, the MLE exhibits rather high MSE that cannot be improved by increasing the communication SNR.This validates our analysis in Section VI-A that the phase ambiguity of C tn will lead to severe performance degradation of the estimator.When we insert training symbols, the performance of the MLE with unknown CSI improves significantly, but it is still much worse than that of the MLE with known CSI at low communication SNR levels.It is interesting to see that using more training symbols do not improve the performance of the MLE as expected.This is because the energy for transmitting an observation is fixed, inserting training symbols will reduce the energy for the data symbols.Our simulations show that the best performance is obtained when L p = 2.This is consistent with the observation of [31], where the optimal L p equals √ K.
To further observe the impact of different levels of CSI on the optimal and suboptimal estimators, Fig. 6 shows the MSEs of the MLE and suboptimal estimators with known CSI and with unknown CSI but using two training symbols.Similar to the estimators with known CSI, the suboptimal estimator with training symbols is inferior to the MLE at low communication SNR.However, the performance of the suboptimal estimator degrades less than the MLE due to the channel estimation errors.Figure 7 and Fig. 8 show the MSEs of the estimators with known CSI and unknown CSI as a function of the number of sensors, N .We can see that the MSEs of all the estimators decrease at the speed of 1/N for large enough N , but the MSEs cannot approach the lower bound due to communication errors.Compare the MSEs of the MLEs, we can see that the results validates our asymptotic performance analysis for MLEs both with known CSI and unknown CSI in VI-B.From Fig. 7, we can observe that the proposed estimators perform much better than the fusion based estimators and the MRC based estimators.It means that the networks with traditional approaches must activate more sensors to achieve the same MSE performance as those with our estimators, which will lead to low energy and bandwidth efficiency.

E. Computational Complexity of the Estimators
To evaluate the computational complexity, we record the time consumed by 10,000 Monte-Carlo simulations for the proposed estimators with known CSI.Table III shows the computation time in seconds at different communication SNR levels.The step-size for the exhausting searching of the MLE is set to ∆/N .The number of the iterations for suboptimal estimator is set to 2 according to the convergence analysis.
It is shown that the computation time of the suboptimal estimator is much less than that of the MLE, and is almost invariant with the communication SNR since the number of the iterations is fixed.The computation time consumed by the MLE varies slightly, which comes from the implementation of the truncate exponential function in simulation codes.

VIII. CONCLUSION
In this paper, we studied the decentralized estimation for a deterministic parameter using digital communications over orthogonal multiple-access fading channels with a uniform multiple-bit quantizer.By introducing a general messaging function, the proposed estimators can be applied for digital communication systems using various quantization, coding and modulation schemes and for analog communication systems such as those using the well-studied AF transmission.
We derived the MLEs with known and unknown CSI.When inserting training symbols before the data symbols, the MLE with unknown CSI estimates channels implicitly and exploits the channel estimates in an optimal way.Following the structure of the MLE, we designed a suboptimal estimator that has affordable complexity and converges rapidly.It performs as well as the MLE at high communication SNR and has minor performance loss at low communication SNR.
Simulation results show that both the MLEs and the suboptimal estimators outperform the traditional MRC based and fusion based estimators, and the estimators using digital communications outperform those using AF transmission in Rayleigh fading channels.Compared with the WSN that using binary quantization for decentralized estimation, the system using multiple-bit quantization has superior energy and bandwidth efficiency.Therefore, even with the strict bandwidth constraints, we suggest that the WSNs should use multiplebit quantization rather than binary quantization when the observation SNR is relative high.

Fig. 1 .
Fig. 1.The diagram of the decentralized estimation system we considered.

Fig. 3 .
Fig. 3.The convergence of the suboptimal estimators when γs = 20 dB, N = 10.In the legend, NoCSI indicates suboptimal estimator with unknown CSI, and CSI stands for the suboptimal estimator with known CSI.The communication SNRs are 3 dB, 6 dB and 9 dB, which are marked in the legend.

Figure 4
Figure 4 depicts the MSEs of the estimators with known CSI.Except for the estimator using AF transmission, all other estimators use digital communications with a 4-bit uniform quantizer (M = 16).To demonstrate the performance gain of the proposed estimators that jointly optimize demodulation and parameter estimation, two traditional fusion based estimators and a MRC based estimators are simulated.In the fusion based estimators, the FC first demodulates the transmitted data from each sensor, then reconstructs the observation of each sensor from the demodulated symbols following the rule of quantization, afterward combines these estimated observations with BLUE fusion rule to produce the final estimate of θ.When ECCs are applied at the sensors, the receiver at the FC will exploit its error detection ability to discard the data that cannot pass the error check.In the MRC based estimator, the FC first combines the received signals from all sensors, then demodulates the transmitted symbols.Finally the FC obtains the estimate of θ using the detected symbols according to the quantization rule.Except for the fusion based estimator with ECC uses codebook C tc , all estimators use codebook C tn in this simulation.
Fig. 4.The MSEs of the estimators with known CSI as a function of communication SNR when N = 10, γs = 20 dB.In the legend, "Fusion-CRC" and "Fusion-NoECC" stand for two fusion based estimators using the codebook Ctc and Ctn, "MRC" stand for the MRC based estimators, "Analog AF" stands for the MLE with AF transmission, "MLE" and "Subopt" denotes the MLE and suboptimal estimators, respectively, and "Q-BLUE Bound" and "BLUE Bound" stands for two lower bounds with ideal communications.

Fig. 6 .
Fig. 6.The MSEs of the MLE and suboptimal estimators with training symbols and with known CSI, where N = 10 and γs = 20 dB.In the legend, MLE and Subopt means MLE and suboptimal estimators, respectively.

Fig. 7 .
Fig. 7.The MSEs of the estimators with known CSI, where γc = 6 dB and γs = 20 dB.The meaning of the legends is the same as Fig.4.

Fig. 8 .
Fig. 8.The MSEs of the estimators with training symbols and with estimated CSI when γc = 6 dB and γs = 20 dB.The legends are the same as Fig.5.

TABLE I THE
SPECIAL CASES OF THE MLES WITH KNOWN AND UNKNOWN CSI.

TABLE II THE
SUMMARY OF THE CODEBOOKS CONSIDERED.
Fig. 2. The MSEs of the MLE with different K.The MSE of MLE with known CSI is marked as MLE CSI in the legend, and Quasi-BLUE lower bound is marked as Q-BLUE.The communication SNR is 6 dB for 4 bit quantization, 3 dB for 2bit quantization, and 0 dB for binary quantization.

TABLE III THE
COMPUTATION TIME IN SECONDS CONSUMED BY SIMULATING THE ESTIMATORS WITH KNOWN CSI.