Decentralized estimation over orthogonal multiple-access fading channels in wireless sensor networks--optimal and suboptimal estimators
EURASIP Journal on Advances in Signal Processing volume 2011, Article number: 132 (2011)
We study optimal and suboptimal decentralized estimators in wireless sensor networks over orthogonal multiple-access fading channels in this paper. Considering multiple-bit quantization for digital transmission, we develop maximum likelihood estimators (MLEs) with both known and unknown channel state information (CSI). When training symbols are available, we derive a MLE that is a special case of the MLE with unknown CSI. It implicitly uses the training symbols to estimate CSI and exploits channel estimation in an optimal way and performs the best in realistic scenarios where CSI needs to be estimated and transmission energy is constrained. To reduce the computational complexity of the MLE with unknown CSI, we propose a suboptimal estimator. These optimal and suboptimal estimators exploit both signal- and data-level redundant information to combat the observation noise and the communication errors. Simulation results show that the proposed estimators are superior to the existing approaches, and the suboptimal estimator performs closely to the optimal MLE.
Wireless sensor networks (WSNs) consist of a number of sensors deployed in a field to collect information, for example, measuring physical parameters such as temperature and humidity. Since the sensors are usually powered by batteries and have very limited processing and communication abilities , the parameters are often estimated in a decentralized way. In typical WSNs for decentralized estimation, there exists a fusion center (FC). The sensors transmit their locally processed observations to the FC, and the FC generates the final estimation based on the received signals .
Both observation noise and communication errors deteriorate the performance of decentralized estimation. Traditional fusion-based estimators are able to minimize the mean square error (MSE) of the parameter estimation by assuming perfect communication links (see  and references therein). They reduce the observation noise by exploiting the redundant observations provided by multiple sensors. However, their performance degrades dramatically when communication errors cannot be ignored or corrected. On the other hand, various wireless communication technologies aiming at achieving transmission capacity or improving reliability do not minimize the MSE of the parameter estimation. For example, although diversity combining reduces the bit error rate (BER), it requires that the signals transmitted from multiple sensors are identical, which is not true in the context of WSNs due to the observation noise at sensors. This motivates to optimize estimator at the FC under realistic observation and channel models, which minimizes the MSE of parameter estimation.
The bandwidth and energy constraints are two critical issues for the design of WSNs. When the strict bandwidth constraint is taken into account, the decentralized estimation when the sensors only transmit one bit for each observation, that is, using binary quantization, is studied in [4–9]. When communication channels are noiseless, a maximum likelihood estimator (MLE) is introduced and optimal quantization is discussed in . A universal and isotropic quantization rule is proposed in , and adaptive binary quantization methods are studied in [7, 8]. When channels are noisy, the MLE in additive white Gaussian noise (AWGN) channels is studied and several low complexity suboptimal estimators are derived in . It has been found that the binary quantization is sufficient for decentralized estimation at low observation signal-to-noise ratio (SNR), but more bits are required for each observation at high observation SNR .
When the energy constraint and general multi-level quantizers are considered, various issues of the decentralized estimation are studied under different channels. When communications are error free, the quantization at the sensors is designed in [10–12]. The optimal trade-off between the number of active sensors and the quantization bit rate of each sensor is investigated under total energy constraint in . In binary symmetrical channels (BSCs), the power scheduling is proposed to reduce the estimation MSE when the best linear unbiased estimator (BLUE) and a quasi-BLUE, where quantization noise is taken into account, are used at the FC . Nonetheless, to the best of the authors' knowledge, the optimal decentralized estimator using multiple-bit quantization in fading channels is still unavailable. Although the MLE proposed in AWGN channels  can be applied for fading channels if the channel state information (CSI) is known at the FC, it only considers binary quantization.
Besides the decentralized estimation based on digital communications, the estimation based on analog communications receives considerable attentions due to the important conclusions drawn from the studies for the multi-terminal coding problem [15, 16]. The most popular scheme is amplify-and-forward (AF) transmission, which is proved to be optimal in quadratic Gaussian sensor networks under multiple-access channels (MACs) with AWGN . The power scheduling and energy efficiency of AF transmission are studied under AWGN channels in , where AF transmission is shown to be more energy efficient than digital communications. However, in fading channels, AF transmission is no longer optimal in orthogonal MACs [19–21]. The outage laws of the estimation diversity with AF transmission in fading channels are studied in  and  in different asymptotic regimes. These studies, especially the results in , indicate that the separate source-channel coding scheme is optimal in fading channels with orthogonal multiple-access protocols, which outperforms AF transmission, a simple joint source-channel coding scheme.
In this paper, we develop optimal and suboptimal decentralized estimators for a deterministic parameter considering digital communication. The observations of the sensors are quantized, coded and modulated, and then transmitted to the FC over Rayleigh fading orthogonal MACs. Because the binary quantization is only applicable at low observation SNR levels [4, 13], a general multi-bit quantizer is considered.
We strive for deriving MLEs and feasible suboptimal estimator when different local processing and communication strategies are used. To this end, we first present a general message function to represent various quantization and transmission schemes. We then derive the MLE for an unknown parameter with known CSI at the FC.
In typical WSNs, the sensors usually cannot transmit too many training symbols for the receiver to estimate channel coefficients because of both energy and bandwidth constraints. Therefore, we will consider realistic scenarios that the CSI is unknown at the FC when no or only a few training symbols are available. It is known that channel information has a large impact on the structure and the performance of decentralized estimation. In orthogonal MACs, most of the existing works assume that perfect CSI is available at the FC. Recently, the impact of channel estimation errors on the decentralized detection in WSNs is studied in , and its impact on the decentralized estimation when using AF transmission is investigated in . However, the decentralized estimation with unknown CSI for digital communications has still not been well understood.
Our contributions are summarized as follows. We develop the decentralized MLEs with known and unknown CSI at the FC over orthogonal MACs with Rayleigh fading. The performance of the MLE with known CSI can serve as a practical performance lower bound of the decentralized estimation, whereas the MLE with unknown CSI is more realistic. For the special cases of error-free communications or noiseless observations, we show that the MLEs degenerate into the well-known centralized fusion estimator--BLUE--or a maximal ratio combiner (MRC)-based estimator when CSI is known and a subspace-based estimator when CSI is unknown. This indicates that our estimators exploit both data-level redundancy and signal-level redundancy provided by multiple sensors. To provide feasible estimator with affordable complexity, we propose a suboptimal algorithm, which can be viewed as a modified expectation-maximization (EM) algorithm .
The rest of the paper is organized as follows. Section 2 describes the system models. Section 3 presents the MLEs with known and unknown CSI and their special cases, and Section 4 introduces the suboptimal estimator. In Section 5, we analyze the asymptotic performance and complexity of the presented MLEs and discuss the codebook issue. Simulation results are provided in Section 6, and the conclusions are given in Section 7.
2 System model
We consider a typical kind of WSNs that consists of N sensors and a FC to measure an unknown deterministic parameter θ, where there are no inter-sensor communications among the sensors. The sensors process their observations for the parameter θ before transmission. For digital communications, the processing includes quantization, channel coding and modulation. For analog communications, the processing may simply be amplifying the observations before transmission. A messaging function c(x) is used to describe the local processing. Though we can use c(x) for both digital and analog communication systems, we focus on digital transmission since the popular analog transmission scheme, AF, has been shown to be not optimal in fading channels [19–21].
2.1 Observation model
The observation for the unknown parameter provided by the i th sensor is
where n s, i is the independent and identically distributed (i.i.d.) Gaussian observation noise with zero mean and variance , and θ is bounded within a dynamic range [-V, +V].
2.2 Quantization, coding, and modulation
We use the messaging function c(x)|ℝ → ℂLto represent all the processing at the sensors including quantization, coding and modulation, which maps the observations to the transmit symbols. To facilitate analysis, the energy of the transmit symbols is normalized to 1, that is,
We consider uniform quantization by regarding θ as a uniformly distributed parameter. Uniform quantization is the Lloyd-Max quantizer that minimizes the quantization distortion of uniformly distributed sources [25, 26]. For an M-level uniform quantizer, define the dynamic range of the quantizer as [-W, +W], and then all the possible quantized values of the observations can be written as
where Δ = 2W/(M - 1) is the quantization interval.
The observations are rounded to the nearest S m , so that c(x) is a piecewise constant function described as
where c m = [cm,1, ..., c m, L ]T is the L symbols corresponding to the quantized observation S m , m = 0, ..., M - 1.
Under the assumption that W is much larger than the dynamic range of θ, the probability that |x i |> W can be ignored. Then, c(x) is simplified as
Define the transmission codebook as
which can be used to describe any coding and modulation scheme following the M-level quantization.
The sensors can use various codes such as natural binary codes to represent the quantized observations. In this paper, our focus is to design decentralized estimators; therefore, we will not address the transmission codebook optimization for parameter estimation.
2.3 Received signals
Since we consider orthogonal MACs, the FC can perfectly separate and synchronize to the received signals from different sensors. Assume that the channels are block fading, that is, the channel coefficients are invariant during the period that the sensors transmit L symbols representing one observation. After matched filtering and symbol-rate sampling, the L received samples corresponding to the L transmitted symbols from the i th sensor can be expressed as
where y i = [yi,1, ..., y i, L ]T, h i is the channel coefficient, which is i.i.d. and subjected to complex Gaussian distribution with zero mean and unit variance, n c, i is a vector of thermal noise at the receiver subjecting to complex Gaussian distribution with zero mean and covariance matrix , and is the transmission energy for each observation.
3 Optimal estimators with or without CSI
In this section, we derive MLEs when CSI is known or unknown at the receiver of the FC, respectively. To understand how they deal with both the communication errors and the observation noises, we study two special cases. The MLE using training symbols in the transmission codebook is also studied as a special form of the MLE with unknown CSI.
3.1 MLE with known CSI
Given θ, the received signals from different sensors are statistically independent. If the CSI is known at the receiver of the FC, the log-likelihood function is
where Y = [y1, ..., y N ], h = [h1, ..., h N ]T is the channel coefficients vector, and p(x|θ) is the conditional probability density function (PDF) of the observation given θ. Following the observation model shown in (1), we have
According to the received signal model shown in (7), the PDF of the received signals given CSI and the observation of the sensors is
where ||z||2 = (zHz)1/2 is l2 norm of vector z.
Substituting (9) and (10) into (8), we obtain the log-likelihood function for estimating θ, which can be used for any messaging function c(x), no matter when it describes analog or digital communications.
For digital communications, c(x) is a piecewise constant function as shown in (4). To simplify the analysis, we use its approximate form shown in (5) in the rest of this paper. After substituting (5) into (10) and then to (8), we have
where p(y i |h i , c m ) is the PDF of the received signals given the CSI and the transmitted symbols of the sensors, which is
and p(S m |θ) is the probability mass function (PMF) of the quantized observation given θ, which is
The MLE is obtained by maximizing the log-likelihood function shown in (11).
3.1.1 Special case when
When the observation SNR tends to infinity, the observations of the sensors are perfect, that is, x i = θ, ∀ i = 1, ..., N. The PDF of the observation x i given θ degrades to
where δ(x) is the Dirac-delta function.
In this case, the log-likelihood function for both analog and digital communications has the same form, which can be obtained by substituting (14) into (8). After ignoring all terms that do not affect the estimation, the log-likelihood function is simplified as
where c(θ) is the transmitted symbols when the observations of the sensors are θ.
For digital communications, c(θ) is a code word of C t and is a piecewise constant function. Therefore, we cannot get θ by taking partial derivative of (15). Instead, we first regard c(θ) as the parameter to be estimated and obtain the MLE for estimating c(θ). Then, we use it as a decision variable to detect the transmitted symbols and reconstruct θ according to the quantization rule with the decision results.
The log-likelihood function in (15) is concave with respect to (w.r.t.) c(θ), and its only maximum is obtained by solving the equation ∂ log p(Y|h , θ)/∂c(θ) = 0, which is
It follows that when the observations are perfect, the structure of the MLE is the MRC concatenated with data demodulation and parameter reconstruction. This is no surprise since in this case, all the signals transmitted by different sensors are identical; thus, the receiver at the FC is able to apply the conventional diversity technology to reduce the communication errors.
3.1.2 Special case when
When the communications are perfect, . It means that y i merely depends on or equivalently depends on . Then, the log-likelihood function becomes a function of the quantized observation .
The log-likelihood function with perfect communications becomes
By taking the derivative of (17) to be 0, we obtain the likelihood equation
Generally, this likelihood equation has no closed-form solution. Nonetheless, the closed-form solution can be obtained when the quantization noise is very small, that is, Δ → 0. Under this condition, and (18) becomes
The MLE obtained from (19) is
It is also no surprise to see that the MLE reduces to BLUE, which is often applied in centralized estimation , where the FC can obtain all raw observations of the sensors.
3.2 MLE with unknown CSI
In practical WSNs, the FC usually has no CSI, and the sensors can transmit training symbols to facilitate channel estimation. The training symbols can be incorporated into the message function c(x). Then, the MLE with training symbols available is a special form of the MLE with unknown CSI. We will derive the MLE with unknown CSI with general c(x) in the following and derive that with training symbols in c(x) in next subsection.
When CSI is unknown at the FC, the log-likelihood function is
which has a similar form to the likelihood function with known CSI shown in (8).
According to the received signal model, given x, y i subjects to zero mean complex Gaussian distribution, that is,
where R y is the covariance matrix of y i , which is
Since the energy of the transmit symbols is normalized as shown in (2), we have
Therefore, c(x) is an eigenvector of R y , and the corresponding eigenvalue is .
For any vector orthogonal to c(x), denoted as c⊥(x), we have
Therefore, the eigenvalues corresponding to the remaining L - 1 eigenvectors are all . The determinant of R y is
Following the Matrix Inversion Lemma , we have
Substituting (26) and (27) into (22), we have
where α is a constant.
Upon substituting (28) and (9) into (21), the log-likelihood function becomes
Then the MLE is obtained as
When considering digital communications, by substituting (5) into (29), the log-likelihood function is obtained as
where p(S m |θ) is shown in (13), and
3.2.1 Special case when
Similarly to the log-likelihood function with known CSI, the log-likelihood function with unknown CSI for perfect observations has the same form for both analog and digital communications.
Upon substituting (14) into (21) and ignoring all terms that do not affect the estimation, the log-likelihood function becomes
Again, since c(θ) is underivable for digital communications, we regard c(θ) as the parameter to be estimated. Recall that the energy of c(θ) is normalized. Then, the problem that finds c(θ) to maximize (33) is a solvable quadratically constrained quadratic program (QCQP) :
The solution of (34) can be obtained as
where vmax(M) is the eigenvector corresponding to the maximal eigenvalue of the matrix M.
This shows that when CSI is unknown at the FC in the case of noise-free observations, the MLE becomes a subspace-based estimator.
3.2.2 Special case when
When the communication SNR tends to infinity, the receiver of the FC can recover the quantized observations of the sensors with error free if a proper codebook, which will be discussed in Section 5.3, is applied. Then, the MLE with unknown CSI also degenerates into the BLUE shown in (20).
3.3 MLE with unknown CSI using training symbols
Define c p as a vector of L p training symbols for the receiver to estimate the channels, which is predesigned and is known at both the transmitter and receiver. Each transmission for an observation will begin with transmitting the training symbols, followed by the data symbols, which is defined as c d (x). In this case, the messaging function becomes
Substituting (36) into signal model shown in (7), the received signal y i can be decomposed into two parts that correspond to c p and c d (x), respectively. The received signal from the i th sensor corresponding to c p is
and the received signal from the i th sensor corresponding to c d (x) is
where both n cp, i and n cd, i are vectors of thermal noise at the receiver. Note that y i, p is independent from the observation x i .
We let and c d (x)Hc d (x) = 1 - L p /L in order to satisfy the normalization condition of c(x). Ignoring all the terms that do not affect the estimation, we obtain the log-likelihood function as
where y i, p and y i, d are, respectively, the received signals corresponding to the training symbols and the data symbols, and β is a constant.
Now we show that in (39) can be regarded as the minimum mean square error (MMSE) estimate for the channel coefficient h i with a constant factor. Since both h i and the receiver thermal noise are complex Gaussian distributed, the MMSE estimate of h i is equivalent to linear MMSE estimate, that is,
where , and is the covariance matrix of y i, p , which is
Let , then we have . Substituting it into (39), we obtain
In the sequel, we will show that the MLE in this case is equivalent to a two-stage estimator. During the first stage, the FC uses (40) to obtain the MMSE estimate of h i . During the second stage, the FC conducts the MLE using . The channel estimate can be modeled as , where is the estimation error subjecting to the complex Gaussian distribution with zero mean, and its variance is equal to the MSE of the linear MMSE estimator of h i , which is 
where can be obtained following Matrix Inversion Lemma .
Substituting into (7), the received signal of the data symbols becomes
where n ci, d is the receiver thermal noise.
By deriving the conditional PDF from (44), we can obtain a log-likelihood function that is exactly the same as that shown in (42). This implies that the MLE with unknown CSI exploits the available training symbols implicitly to provide an optimal channel estimate and then uses it to provide the optimal estimation of θ.
Note that the log-likelihood function in (42) is different from the log-likelihood function that uses the estimated CSI as the true value of CSI, which is
By maximizing (45), we obtain a coherent estimator since there only exists the coherent term in this log-likelihood function. By contrast, there exists a coherent term as well as a non-coherent term in the log-likelihood function in (42). This means that the MLE obtained from (42) uses the channel estimate as a "partial" CSI that accounts for the channel estimation errors. The true value of the channel coefficients contained in the channel estimate corresponds to the coherent term in the log-likelihood function, whereas the uncertainty in the channel estimate, that is, the estimation errors, leads to the non-coherent term. We will compare the performance of the two estimators through simulations in Section 6.
4 Suboptimal estimator
In the previous section, we developed the MLE with known CSI, which is not feasible in real-world systems since perfect CSI cannot be provided especially in WSN with strict energy constraint. Nevertheless, its performance can serve as a practical lower bound when both the observation noise and the communication errors are in presence.
The MLE with unknown CSI is more practical, but is too complex for application. Nonetheless, its structure provides some useful hints to derive low complexity estimator. In the following, we derive a suboptimal algorithm for the case with unknown CSI.
We first consider an approximation of the PMF, p(S m |θ). Following the Lagrange Mean Value Theorem , there exists ξ in an interval that satisfies
If the quantization interval Δ is small enough, we can let ξ equal to the middle value of the interval, that is, ξ = (S m - θ)/σ s , and obtain an approximate expression of the PMF as
Substituting (47) into (31) and taking its partial derivative with respect to θ, the likelihood equation is
where can be derived as
Substituting (49) into (48), the likelihood equation can be simplified as
which is the necessary condition for the MLE.
Unfortunately, we cannot obtain an explicit estimator for θ from this equation because the right-hand side of the likelihood equation also contains θ. However, considering the property of the conditional PDF, we can rewrite (50) as
The term inside the sum of the right-hand side of the likelihood equation shown in (51) is actually the MMSE estimator of for a given θ. This indicates that we can regard the MLE as a two-stage estimator.
During the first stage, it estimates with the received signals from each sensor. During the second stage, it combines by a sample mean estimator.
We present a suboptimal estimator with a similar two-stage structure. This estimator can be viewed as a modified EM algorithm  since its two-stage structure is similar to the EM algorithm. Because the likelihood function shown in (31) has multiple extrema and the equation shown in (50) is only a necessary condition, the initial value of the iterative computation is critical to the convergence of the iterative algorithm. To obtain a good initial value, the suboptimal estimator estimates by assuming it to be uniformly distributed. Furthermore, since the estimation quality of the first stage is available, we use BLUE to obtain for exploiting the quality information instead of using the MLE in the M-step as in the standard EM algorithm.
During the first stage of the iterative computation, the suboptimal algorithm estimates under MMSE criterion. This estimator requires a priori probability of that depends on the unknown parameter θ. The initial distribution of is set to be uniform distribution, that is, the estimate for a priori PDF of . After a temporary estimate of θ had been obtained, we use to update . The MMSE estimator during the first stage is
Because there is a one-to-one and onto mapping between S m and c m , is equal to , which is shown in (32). After replacing in (53) with and substituting it into (52), we have
Now we derive the mean and variance of , which will be used in the BLUE of θ.
If equals to its true value, the MMSE estimator in (54) is unbiased because
However, in our algorithm is not the true value since we use instead of θ to get it. Therefore, the MMSE estimate may be biased. Because it is hard to obtain this bias in practical systems, we regard the MMSE estimator as an unbiased estimate in our suboptimal algorithm and evaluate the resulting performance loss via simulations later.
The variance of the MMSE estimate can be derived as
Then, the BLUE for estimating θ is
Let k denote the index of the iteration, the iterative algorithm performed at the FC can be summarized as follows:
(S1) When k = 1, set as the initial value.
(S2) Compute , i = 1, ..., N, and its variance with (54) and (56).
(S3) Substitute and its variance into (57) to get .
(S4) Update using , i.e., .
(S5) Repeat step (S2) ~ (S4) to obtain until the algorithm converges or a predetermined number of iterations is reached.
Note that this suboptimal algorithm differs from the one proposed in , which applies maximal a posteriori (MAP) criterion to detect binary observations of sensors and then uses the results as the true values of the observations in a MLE derived in noise-free channels. Our suboptimal algorithm inherits the structure of the MLE developed in fading channels, which gives "soft" estimates of the quantized observations at first, and combines them with a linear optimal estimator afterward. By conducting these two stages iteratively, the estimation accuracy is improved rapidly. Although the suboptimal algorithm may converge to local optimal solutions due to the non-convexity of the original optimization problem, it still performs fairly well as will be shown in the simulation results. The convergence behavior of the algorithm will be studied in Section 5.4.
5 Performance analysis and discussion
5.1 Asymptotic performance w.r.t. number of the sensors
Now we discuss the asymptotic performance of the MLEs w.r.t. the number of sensors N by studying the Fisher information as well as the Cramér-Rao lower bound (CRLB) of the estimators.
We first consider the MLE with unknown CSI, where the channel coefficients are i.i.d. random variables. In this case, given θ, the received signals from different sensors are i.i.d. among each other; thus, the Fisher information, defined as , linearly increases with the number of the sensors. Therefore, the CRLB, which is the reciprocal of the Fisher information, decreases at a speed of 1/N, which is the same as the BLUE lower bound of centralized estimation .
When CSI is available at the FC, the received signals are no longer identical distributed. In this case, the Fisher information depends on the channel realizations. In the sequel, we will show that the mathematical expectation of the Fisher information over h is always lower than that with unknown CSI, which means that the knowledge about the channels provides more information to improve the estimation quality.
Denote the Fisher information with known CSI as I C (θ), which depends on the channel coefficient vector h. Considering that p(Y|θ) = E h [p(Y|h, θ)], we have
The terms in the integration of (58) are convex in p(Y|h, θ) because
Since the integration can be viewed as a nonnegative weighted summation, which will preserve the convexity of the functions , (58) is a convex function of p(Y|h, θ). Following Jensen's inequality and the convexity of (58), we have
Therefore, the asymptotic performance of the MLE with known CSI is superior to that of the MLE with unknown CSI, where the CRLB of the latter decreases at the speed of 1/N.
5.2 Computational complexity
5.2.1 MLE with known CSI
Since the parameter being estimated is a scalar, one-dimensional searching algorithms can be used to obtain the maximum of the log-likelihood function. However, because the log-likelihood function shown in (11) is non-concave and has multiple extrema, we need to find all its local maxima to get the global maximum.
Exhaustive searching method can be used to find the global maximum. In order to make the MSE introduced by discrete searching neglectable, we let the searching step size be less than Δ/N; thus, we need to compute the value of the likelihood function at least M × N times to obtain the MLE.
The FC applies (11), (12) and (13) to compute the values of the likelihood function with different θ. The exponential term in (12) is independent from θ; thus, it can be computed before searching and be stored for future use.
Given θ, we still need to compute p(S m |θ), m = 0, ..., M - 1, which complexity is O(M), then to obtain each value of the likelihood function with M additions and M multiplications. Therefore, the computational complexity for getting one value of log p(Y|h, θ) is O(MN).
After considering the operations required by the exhaustive searching, the overall complexity of the MLE is O(M2N2).
5.2.2 MLE with unknown CSI
The difference between the MLEs with known and unknown CSI is that p(y i |c m ) is used in MLE with unknown CSI instead of p(y i | h i , c m ). Since p(y i |c m ) can also be computed before the searching, this difference has no impact on the complexity of the MLE with unknown CSI. The computational complexity of the MLE with unknown CSI is also O(M2N2).
5.2.3 Suboptimal estimator
For each iteration of the suboptimal estimator, we need to get and its variance with (54) and (56) and then obtain the estimate of θ with (57). The complexity is similar to that of computing the log-likelihood function, which is O(MN). If the algorithm converges after I t iterations, the complexity of the suboptimal estimator will be O(I t MN).
5.3 Discussion about transmission codebook issues
As we have discussed, the transmission codebooks can represent various quantization, coding and modulation schemes as well as the training symbols. Here, we discuss the impact of the codebooks on the decentralized MLEs.
We rewrite the conditional PDF with known CSI shown in (10) as
Comparing the conditional PDF with unknown CSI p(y i |x) shown in (28) with p(y i |h i , x) shown in (61), we see that both PDFs depend on the correlation between the received signals y i and the transmitted symbols c(x). With known CSI, the optimal estimator is a coherent algorithm, since (61) relies on the real part of the correlation, . With unknown CSI, the optimal estimator is a non-coherent algorithm, since (28) depends on the square norm of . Because , both MLEs depend on the cross-correlation of the transmit symbols cH(x i )c(x).
If there exist two transmit symbols c m and c n in the transmission codebook that have the same norm, that is,
then p(y i |x) will have two identical extrema since the MLE with unknown CSI only depends on . Such a phase ambiguity will lead to severe performance degradation to the decentralized estimator. Therefore, the autocorrelation matrix of the codebook plays a critical role on the performance of the MLE, especially when CSI is unknown.
Many transmission schemes have this phase ambiguity problem, for example, when the natural binary code and BPSK are applied to represent each quantized observation and to modulate. For any c m in such a transmission codebook, defined as C tn , there exists cm′in C tn that satisfies cm′= -c m . Therefore, C tn is not a proper codebook. Another example is AF, the messaging function of which is c(x) = Gx, where G is the amplification gain. The MLE with unknown CSI is unable to distinguish x from -x when using this messaging function.
In order to handle the phase ambiguity problem inherent in the codebook C tn , we can simply insert training symbols into the transmit symbols. Though heuristic, this approach provides fairly good performance because the MLE exploits the training symbols to estimate the channel coefficients implicitly as we have shown. Moreover, since from the later simulations we see that the MLE without CSI and without training symbols does not perform well, we need to insert training symbols when we apply the decentralized estimator.
Since the MLEs are associated with the autocorrelation matrix of the transmission codebook, this allows us to enhance the performance of the estimators by systematically designing the codebook. Nonetheless, this is out of the scape of this paper. Some preliminary results for optimizing the transmission codebooks are shown in .
5.4 Convergence of the suboptimal estimator
For an iterative algorithm θ(k+1)= T(θ(k)), we call that the algorithm is convergent if the distance between θ(k+1)and a fixed point of T(θ) is smaller than the distance between θ(k)and this fixed point, where the fixed points of T(θ) are the points that satisfy equation θ = T(θ). This means that after each iteration, the output of the algorithm is closer to a fixed point.
Define Φ as a fixed point of T(θ) in (ϕ1, ϕ2). The algorithm is convergent if |θ(k+1)- Φ|<|θ(k)- Φ| for all θ(k)∈ (ϕ1, ϕ2).
In the following, we first study the convergence behavior of an iterative algorithm obtained directly from the likelihood equation (50) due to the mathematically tractability, where T(θ) is defined as the right-hand side of equation (50). The iteration algorithm of the suboptimal estimator can be regarded as a modified version of this algorithm, which will be discussed afterward.
To simplify the notation, we rewrite T(θ) as a function of . From Eqs. (48), (49) and (50), we have
Since the iterative function shown in (63) is derived from the likelihood equation, all stationary points of the log-likelihood function are fixed points of T(θ). Denote Φ n , n = 1, 2, ..., as the local maxima of the log-likelihood function, which are sorted in ascending order. Since the log-likelihood function is a continuous function of θ, there exists a minimum between two adjacent maxima. The minimum between Φ n and Φn+1is defined as ϕ n . We will show in the following that in each interval (ϕn-1, ϕ n ), the algorithm converges to Φ i after ignoring the effect of the non-extremal stationary points of log-likelihood function.
Assume that there is no non-extremal stationary point in (ϕn-1, ϕ n ). Because Φ n is a maximum, the sign of is always different from the sign of (θ(k)- Φ n ) for all ϕn-1< θ(k)< ϕ n . Following the corollary shown in Appendix, the algorithm is convergent if
Taking the second-order partial derivative of log p(Y|θ), we have
we have f m, i ≥ 0 and . Therefore, f m, i , m = 0, ..., M - 1 can be regarded as a PMF. Then, the term in (65) can be rewritten as
which satisfies (64). Therefore, the iterative algorithm is convergent.
Now we discuss the non-minimum stationary points of the log-likelihood function. Considering a minimum ϕ n , for any θ ∈ (Φ n , Φn+1), the sign of is the same as that of (θ - ϕ n ) on both sides of ϕ n , which does not satisfy the sufficient and necessary condition shown in Appendix. Therefore, the algorithm does not converge to ϕ n unless θ(k)exactly equals ϕ n . Any disturbance will perturb θ(k+1)far from this minimum point. As to any non-extremal stationary point , the sign of is the same as that of at one side of this point. The disturbance with proper direction will also make θ(k+1)far from this point.
When the communication SNR tends to infinity, that is, σ c → 0, there is only one p(y i |c m ), m = 0, ..., M - 1, that can be positive. All other p(y i |c m ) tend to 0. By substituting this into (65), we have . It is not hard to verify that in this case, |θ(k+1)- Φ m | = 0 for any θ(k). It means that the iterative algorithm converges to a local maximum of the log-likelihood function exactly after one iteration.
At practical communication SNR levels, , which will affect on the convergent speed of the algorithm.
Now we consider the iterative algorithm of the suboptimal estimator. Similar to the previous discussion, we rewrite the suboptimal algorithm (57) as a function of p(y i |θ) and its partial derivatives. After taking the first- and second-order partial derivatives of p(y i |θ) and comparing them with (54), (56) and (57), the suboptimal estimator can be rewritten as
This estimator has the same form as the algorithm defined by (63). Therefore, following the same argument, we can show that a sufficient condition that the suboptimal estimator be convergent is
By letting N = 1, we can obtain from (68) that for all i, , and all w i (θ) > 0. Therefore,
which satisfies the condition (71).
When the communication SNR tends to infinity, all tend to -1 as discussed. The estimator shown in (57) degenerates into the algorithm shown in (63). It is also convergent to a local maximum of the log-likelihood function exactly after one iteration.
At practical communication SNR levels, we can see from (72) that is weighted by itself since w i (θ) depends on . A larger will make the weight w i (θ) smaller. Therefore, the value of the partial derivative in (73) is closer to -1 compared with the iterative algorithm defined with (63) given y i and , which increases the speed of convergence.
6 Simulation results
We use the Monte Carlo method to evaluate the performance of the estimators. In each trail, the parameter θ is generated from a uniformly distributed source within its dynamic range. We use the MSE of estimating θ, that is, , as a performance metric. The observation SNR considered in simulations is defined as 
We use , the energy consumed by each sensor to transmit one observation, to define the communication SNR in order to fairly compare the energy efficiency of the estimators with different transmission schemes. The communication SNR is defined as
An M = 16 level uniform quantizer is considered, where each quantized value can be represented by a K = 4 bit binary sequence. We do not consider the binary quantizer, which only performs well in low observation SNR.
The codebooks used in the simulations are summarized in Table 1. Considering the general features of WSNs, that is, usually short data packets are transmitted and each sensor is of low cost, we use a simple error control coding (ECC) scheme, the cyclic redundancy check (CRC) codes with generation polynomial G(x) = x4 + x + 1, as an example of the coded transmission. The codebook is denoted as C tc . For comparison, uncoded transmission is also evaluated, where natural binary code is applied to represent each quantization, which codebook is denoted as C tn . We consider BPSK modulation for all codebooks. Because the code length of the uncoded transmission is shorter than that of the coded transmission, the energy to transmit each symbol will be higher for a given . Due to the phase ambiguity problem discussed in Section 5.3, we also consider the codebook with training symbols C tp .
When CSI is known at the FC, we evaluate the performance of the MLE with codebook C tn . The simulation results are marked as "MLE CSI" in the legend. When CSI is unknown and the codebook is still C tn , the legends for MLE and the supoptimal estimator are "MLE NoCSI" and "Subopt NoCSI," respectively. When CSI is unknown and the codebook is C tp , where 2 or 5 training symbols are inserted, the simulation results are marked as "MLE NoCSI TS2/5" and "Subopt NoCSI TS2/5." We also evaluate the performance of the MLE with a near-optimal codebook obtained in , which is marked as "MLE NoCSI OPT." As discussed in Section 3.2, the FC can use the training symbols to estimate the CSI and use the estimated CSI as the known CSI to estimate θ. We evaluate this estimator with the codebook C tp , which is marked as "MLE EstCH TS2/5."
To demonstrate the performance gain of the proposed estimators, two traditional fusion-based estimators and AF transmission are simulated. In the fusion-based estimators, the FC first demodulates the transmitted data from each sensor, then reconstructs the observation of each sensor from the demodulated symbols following the rule of quantization and finally combines these estimated observations with BLUE fusion rule to produce the estimate of θ. When ECCs are applied at the sensors, the receiver at the FC will exploit its error detection ability to discard the data that cannot pass the error check. The fusion-based estimators using codebook C tn and C tc are denoted as "Fusion-NoECC" and "Fusion-CRC" in the legends of the figures, respectively. For AF, the amplification gain G is designed to make the average transmission power of the sensors equals to that of the digital communication schemes. We also use the MLE at the FC to estimate θ, which is marked as "AF" in the legend.
The MSE of the Quasi-BLUE  is shown as the performance lower bound with legend "Q-BLUE Bound." This MSE is obtained in perfect communication scenarios with the same M-level quantizer as other estimators.
6.1 Convergence of the suboptimal estimator
We first study the convergence of the suboptimal estimator. Figure 1 depicts the MSEs of the suboptimal estimator as a function of the number of iterations. As discussed in 5.4, at high communication SNR levels, the MSE of the suboptimal estimator is convergent after one iteration, that is, the MSE does not decrease with the iterations any more. At low communication SNR levels, the convergent speed becomes lower.
6.2 MSE versus the communication SNR
Figure 2 depicts the MSEs of the estimators with known and unknown CSI.
When CSI is known at the FC, it is shown from Figure 2a that the MLE outperforms the fusion-based estimators. The MSE of the MLE approaches to the Quasi-BLUE lower bound rapidly with the increasing of the communication SNR. As expected, the MLE with AF transmission, marked as AF, is inferior to the MLE with digital communication using 4-bits quantization, marked as MLE CSI. This justifies the conclusions in [19–21], which show that AF is not optimal in fading channels.
According to the performance analysis for BPSK modulation in Rayleigh fading channels , the BER of the transmission scheme with codebook C tn exceeds 0.15 when γ s < 3 dB. ECC can improve the transmission performance for high communication SNR, but it causes more errors for low SNR. For the transmission schemes using CRC, the BER is even worse because long codes will reduce the transmission energy per symbol. For such a high BER, the fusion-based estimators do not perform well. Most of the demodulated data will be dropped due to the error check; thus, the fusion-based estimators do not have enough information to exploit, which finally leads to the worse MSE performance.
When CSI is unknown at the FC, the MSEs of the MLE with unknown CSI and with two different ways of using training symbols for channel estimation are shown in Figure 2b. One is the MLE obtained from the log-likelihood function in (42), and the other is the estimator obtained from (45), which uses the estimated channel coefficients as their true values. As expected, our MLE shown in (42) performs better, because it takes into account the uncertainty of the channel estimation.
Because there exists phase ambiguity in the schemes with C tn and AF transmission, simulation results show that the MSEs of the MLE and suboptimal estimator using these two transmission schemes are very large and do not decrease when γ c increases. Therefore, they are not shown on the figures.
When we insert training symbols, the performance of the MLE with unknown CSI improves significantly, but it is still much worse than that of the MLE with known CSI at low communication SNR levels. It is interesting to see that using more training symbols does not improve the performance of the MLE as expected, because inserting training symbols will reduce the energy for the data symbols when the energy for transmitting an observation is fixed. Our simulations show that the best performance is obtained when L p = 2. This is consistent with the observation of , where the optimal L p equals to .
As discussed, inserting training symbols is a heuristic way to improve the performance. It is shown in the figure that a codebook designed by using optimization method outperforms all the codebooks with training symbols.
6.3 MSE versus the number of sensors
Figure 3 shows the MSEs of the estimators with known CSI and unknown CSI as a function of the number of sensors N. We see that the MSEs of all the estimators decrease at the speed of 1/N for large enough N, but the MSEs do not approach the lower bound due to the communication errors. This validates our asymptotic performance analysis for the MLEs both with known CSI and with unknown CSI in 5.1. Moreover, we observe that the proposed estimators perform much better than the fusion-based estimators. It means that the networks with conventional approaches must activate more sensors to achieve the same MSE performance as those with our estimators, which will lead to low energy and bandwidth efficiency.
In this paper, we studied decentralized estimation for a deterministic parameter using digital communications over orthogonal multiple-access fading channels with a multiple-bit quantizer. By introducing a general messaging function, the proposed estimators can be applied for various quantization, coding and modulation schemes, including AF transmission, binary quantization and with or without training symbols.
We derived the MLEs with both known and unknown CSI. The MLE with known CSI can serve as a practical performance lower bound of existing decentralized estimators. It is shown that the MLE with multi-level quantization outperforms the MLE with AF as well as the fusion-based estimators.
The MLE with unknown CSI is more realistic. Without training symbols, it does not perform well due to the phase ambiguity. When inserting training symbols before data symbols, it estimates channel coefficients implicitly and exploits the channel estimates in an optimal way. Under the energy constraint, only a few symbols are beneficial for training channels, while more training symbols will lead to performance degradation. To design an estimator with affordable complexity, we developed a suboptimal estimator that converges rapidly. The proposed estimator performs well. It exhibits similar performance as the MLE at high SNRs and has minor performance loss at low SNRs.
Proposition: For an iterative algorithm θ(k+1)= T(θ(k)) with a form that T(θ) = f(θ) + θ, this algorithm converges to a fixed point Φ of T(θ) if and only if
Proof: We first prove that (76) and (77) are sufficient conditions. For the function T(θ) = f(θ) + θ and its fixed point Φ, we have
When θ(k)- Φ > 0, substituting (76) into (78), we have
which shows that the algorithm is convergent. When θ(k)- Φ < 0, substituting (77) into (78), we also obtain the inequality shown in (79). Therefore, (76) and (77) are sufficient conditions of the convergence.
Now we prove that they are also necessary conditions. If the algorithm is convergent, we have
When θ(k)- Φ > 0, (80) can be rewritten as
After the simplifications, we can obtain (76) from (81).
Similarly, when θ(k)- Φ < 0, (77) can be obtained following the same procedure. Therefore, (76) and (77) are necessary conditions. □
Corollary: A sufficient condition that the algorithm converges to Φ is f(θ)(θ - Φ) < 0, ∀θ ≠ Φ, and f′(θ) > -2.
Proof: Since Φ is a fixed point of T(θ), we have Φ = T(Φ) = f(Φ) + Φ, thus f(Φ) = 0. When θ(k)- Φ > 0 and f′(θ) > -2, we have
When θ(k)- Φ < 0 and f′(θ) > -2, we have
Therefore, the first inequality in (76) and the second inequality in (77) are satisfied. From the condition f(θ)(θ - Φ) < 0, it is not hard to find that the second inequality in (76) and the first inequality in (77) are also satisfied. Thus, the iterative algorithm is convergent following Proposition. □
Akyildiz IF, Su W, Sankarasubramaniam Y, Cayirci E: Wireless sensor networks: a survey. Comput Netw 2002,38(4):393-422. 10.1016/S1389-1286(01)00302-4
Xiao J-J, Ribeiro A, Luo Z-Q, Giannakis GB: Distributed compression-estimation using wireless sensor networks. IEEE Signal Process Mag 2006,23(7):27-41.
Li XR, Zhu Y, Wang J, Han C: Optimal linear estimation fusion--part I: unified fusion rules. IEEE Trans Inf Theory 2003,49(9):2192-2208. 10.1109/TIT.2003.815774
Ribeiro A, Giannakis GB: Bandwidth-constrained distributed estimation for wireless sensor networks--part I: Gaussian case. IEEE Trans Signal Process 2006,54(3):1131-1143.
Ribeiro A, Giannakis GB: Bandwidth-constrained distributed estimation for wireless sensor networks--part II: unknown probability density function. IEEE Trans Signal Process 2006,54(7):2784-2796.
Luo Z-Q: An isotropic universal decentralized estimation scheme for a bandwidth constrained ad hoc sensor network. IEEE J Sel Areas Commun 2005,23(4):735-744.
Li H, Fang J: Distributed adaptive quantization and estimation for wireless sensor networks. IEEE Signal Process Lett 2007,14(10):669-672.
Fang J, Li H: Distributed adaptive quantization for wireless sensor networks: from delta modulation to maximum likelihood. IEEE Trans Signal Process 2008,56(10):5246-5257.
Aysal T, Barner K: Constrained decentralized estimation over noisy channels for sensor networks. IEEE Trans Signal Process 2008,56(4):1398-1410.
Lam WM, Reibman AR: Design of quantizers for decentralized estimation systems. IEEE Trans Commun 1993,41(11):1602-1605. 10.1109/26.241739
Papadopoulos HC, Wornell GW, Oppenheim AV: Sequential signal encoding from noisy measurements using quantizers with dynamic bias control. IEEE Trans Inf Theory 2001,47(3):978-1002. 10.1109/18.915654
Xiao J-J, Luo Z-Q: Decentralized estimation in an inhomogeneous sensing environment. IEEE Trans Inf Theory 2005,51(10):3564-3575. 10.1109/TIT.2005.855580
Li J, AlRegib G: Distributed estimation in energy-constrained wireless sensor networks. IEEE Trans Signal Process 2009,57(10):3746-3758.
Xiao J-J, Cui S, Luo Z-Q, Goldsmith AJ: Power scheduling of universal decentralized estimation in sensor networks. IEEE Trans Signal Process 2006,54(2):413-422.
Gastpar M: To code or not to code, PhD Dissertation, Ecole Polytechnique Fédérale de Lausanne, EPFL. 2002.
Gastpar M, Vetterli M: Source-Channel Communication in Sensor Networks. Lecture Notes in Computer Science. 2003, 2634: 162-177.
Gastpar M: Uncoded transmission is exactly optimal for a simple Gaussian "sensor" network. 2007 Information Theory and Applications Workshop 2007, 5247-5251.
Cui S, Xiao J-J, Goldsmith AJ, Luo Z-Q, Poor HV: Energy-efficient joint estimation in sensor networks: Analog versus digital. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP' 05 2005, IV: 745-748.
Xiao J-J, Luo Z-Q: Multiterminal source-channel communication over an orthogonal multiple access channel. IEEE Trans Inf Theory 2007,53(9):3255-3264.
Cui S, Xiao J-J, Goldsmith AJ, Luo Z-Q, Poor HV: Estimation diversity and energy efficiency in distributed sensing. IEEE Trans Signal Process 2007,55(9):4683-4695.
Bai K, Senol H, Tepedelenlioğlu C: Outage scaling laws and diversity for distributed estimation over parallel fading channels IEEE Trans. Signal Process 2009,57(8):3182-3192.
Admadi HR, Vosoughi A: Impact of channel estimation error on decentralized detection in bandwidth constrained wireless sensor networks. IEEE Military Communications Conference, MILCOM' 08 2008, 1-7.
Senol H, Tepedelenlioğlu C: Performance of distributed estimation over unknown parallel fading channels. IEEE Trans Signal Process 2008,56(12):6057-6068.
Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 1977,39(1):1-38.
Max J: Quantizing for minimum distortion. IRE Trans Inf Theory 1960,6(1):7-12. 10.1109/TIT.1960.1057548
Lloyd SP: Least squares quantization in pcm. IEEE Trans Inf Theory 1982,28(2):129-137. 10.1109/TIT.1982.1056489
[Online]. Available: http://en.wikipedia.org/wiki/Woodbury_matrix_identity
Boyd S, Vandenberghe L: Convex Optimization. Cambridge University Press, Cambridge; 2004.
Kay SM: Fundamentals of Statistical Signal Processing, vol. I: Estimation Theory. Prentice Hall PTR, New Jersey; 1993.
[Online]. Available: http://en.wikipedia.org/wiki/Mean_value_theorem
Wang X, Yang C: Optimal transmission codebook design in fading channels for decentralized estimation in wireless sensor networks. IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP' 09 2009, 2293-2296.
Proakis JG: Digital Communications. 4th edition. The McGraw-Hill Companies, Inc., New York; 2001.
Wang M, Yang C: Distributed estimation in wireless sensor networks with imperfect channel estimation. 9th International Conference on Signal Processing, ICSP' 08 2008, 3: 2649-2652.
This work was supported by the National Nature Science Foundation of China under Grant 60672103. Parts of this work were presented at IEEE Globecom'07, Washington, DC, United States, Nov. 2007.
The authors declare that they have no competing interests.
About this article
Cite this article
Wang, X., Yang, C. Decentralized estimation over orthogonal multiple-access fading channels in wireless sensor networks--optimal and suboptimal estimators. EURASIP J. Adv. Signal Process. 2011, 132 (2011). https://doi.org/10.1186/1687-6180-2011-132