Distortion outage minimization in Nakagami fading using limited feedback

We focus on a decentralized estimation problem via a clustered wireless sensor network measuring a random Gaussian source where the clusterheads amplify and forward their received signals (from the intra-cluster sensors) over orthogonal independent stationary Nakagami fading channels to a remote fusion center that reconstructs an estimate of the original source. The objective of this paper is to design clusterhead transmit power allocation policies to minimize the distortion outage probability at the fusion center, subject to an expected sum transmit power constraint. In the case when full channel state information (CSI) is available at the clusterhead transmitters, the optimization problem can be shown to be convex and is solved exactly. When only rate-limited channel feedback is available, we design a number of computationally efficient sub-optimal power allocation algorithms to solve the associated non-convex optimization problem. We also derive an approximation for the diversity order of the distortion outage probability in the limit when the average transmission power goes to infinity. Numerical results illustrate that the sub-optimal power allocation algorithms perform very well and can close the outage probability gap between the constant power allocation (no CSI) and full CSI-based optimal power allocation with only 3-4 bits of channel feedback.


Introduction
Wireless sensor network is a promising technology that has applications across a wide range of fields such as in environmental and wildlife habitat monitoring, in tracking targets for defense applications, in aged healthcare and many other areas of human life.Wireless sensor networks are composed of sensor nodes (usually in large numbers) that are distributed geographically to monitor certain physical phenomena (e.g. chemical concentration in a factory or soil moisture in a nursery).Normally, there is a central processing unit [often called a fusion center (FC)] that collects all or parts of the noisy measurements from the sensor nodes via wireless links and reconstructs the quantities of interest by applying a suitable estimation algorithm.Energy consumption is an important issue in wireless sensor networks performing such distributed estimation tasks because once the sensors are deployed, replacing the sensor batteries is difficult and can be very expensive, if not simply impossible due to access difficulties, etc. Due to random fading in wireless channels, the quality of the estimate at the FC, measured by a distortion measure (such as a squared error criterion), becomes a random variable.In delay-limited settings, instead of minimizing a long-term average distortion (or expected distortion for ergodic fading channels), it is more appropriate to minimize the probability that the distortion for each estimate exceeding a certain threshold, the so-called distortion outage probability.This is similar to the idea of minimizing the information outage probability in block-fading wireless communications channels in the information theoretic context [1].Optimal power allocation at the senor transmitters for such outage minimization under various types of transmit power constraints is an important problem from the point of view of reducing energy consumption in sensor networks, or equivalently, prolonging the lifetime of the network.
The problem of distributed estimation and estimation outage in wireless sensor networks has been studied in [2] for additive white Gaussian noise (AWGN) orthogonal channels and in [3] for AWGN multiaccess channels (MAC).The former solved the problem of minimizing the distortion under power constraints and its dual problem for estimating a scalar point Gaussian source, introduced the concept of estimation outage and estimation diversity when the orthogonal channels between the sensors and the FC undergo independent and identically distributed Rayleigh block fading.The work in [3] solved the problem of minimizing the total power subject to a distortion constraint in MAC channel.These power allocation schemes assume that the channels are static and do not take into account fading channels, for which meeting a strict distortion constraint may not be always possible.The optimal power control over fading channels has been obtained in [1] in the context of information outage probability, which is defined as the probability that the instantaneous mutual information of the channel falls below the transmitted code rate.The optimal power allocation for distortion outage minimization over Rayleigh fading for a clustered wireless sensor network is obtained in [4].The works in [1,4] assume full instantaneous channel state information (CSI) at both the transmitter and the receiver.Channel state information at transmitter (CSIT) relies on perfect channel state feedback from FC to the transmitters, which can be expensive or infeasible to implement in practice.Many works in the literature have looked at power control in the field of multiple input multiple output (MIMO) beamforming systems with partial CSIT using limited feedback [5,6].The optimal power allocation scheme for systems employing limited feedback is in general complex and hence difficult to obtain.In [7], the authors studied average reliable throughput minimization over slow fading channels.They found properties of optimal power allocation policy that aid in the design of power allocation algorithms.A suboptimal power allocation scheme is proposed in [8] for a single user system with multiple transmit antennas and single receive antenna with finite rate feedback power control.These suboptimal power allocation schemes, although not optimal, can provide significant gains over no-CSIT even for small number of feedback bits.A recent paper [9] studies the effect of partial CSIT in a distributed estimation problem over a multiaccess channel where various forms of partial CSI are assumed to be available at the sensor transmitters, and their effect on minimization of distortion or estimation error is investigated.Finally, a related performance criterion in distributed estimation, called the distortion exponent, measures the slope of the average end-to-end distortion on a log-log scale at high signal-to-noise ratio (SNR) [10].This metric is similar to that of diversity gain studied in this paper (also termed as estimation diversity in [2]), which looks at the rate of diminishing of the distortion outage probability at high SNR rather than the expected distortion.
The main novelty of this paper lies in finding efficient power allocation schemes for distortion or estimation outage minimization in a clustered wireless sensor network measuring a point Gaussian source, unlike the previous papers where either distortion for static channels or an average distortion (averaged over ergodic fading channels) is minimized with respect to sensor transmit powers.The other novel contribution in this paper lies in considering partial channel information in the form of limited feedback from the FC, as opposed to the availability of full CSIT at the sensor transmitters in our earlier work [4].This work provides more general results than those in our earlier work [11] where the clusterhead to FC channels was assumed to undergo Rayleigh block-fading, in that we consider a more general Nakagami-m fading model for these channels of which Rayleigh fading is a particular case (when m = 1).The idea behind the limited feedbackbased power allocation is that a quantized power codebook of size L and a channel partition is computed at the FC by solving the distortion/estimation outage minimization problem purely on the basis of the statistics of the fading channels, which are assumed to be known at the FC and remain invariant during the estimation task.This power codebook is then communicated a priori to the sensor transmitters.Once the estimation task begins, the FC, based on its knowledge of full CSI (obtained via transmission of pilot tones from the sensors for example), decides which element of the power codebook should be used and multicasts the index of this codebook entry to the sensor transmitters using a finite-rate delay-free error-free feedback channel of rate R = log 2 L bits.The sensors can then use the appropriate transmit power for that particular fading block.In general, the distortion outage optimization problem that considers the joint optimization of the channel partitions and the quantized power codebook is a difficult non-convex problem.The absence of an analytical expression for the distortion outage probability makes this problem even harder.In this work, therefore, we adopt a number of well-justified approximations according to various assumptions on the number of quantization levels (or the number of feedback bits available) and the available average power, based on some existing and some newly derived results by us.After applying these approximations, we design a number of power allocation algorithms by solving the necessary Karush-Kuhn-Tucker (KKT) optimality conditions of the constrained approximate optimization problems directly.For comparison purposes, we also design a simulation-based stochastic optimization algorithm for locally optimal power allocation for the original distortion outage minimization problem using a simultaneous perturbation stochastic approximation (SPSA) method.Numerical results show that these sub-optimal but low-complexity algorithms perform very well compared to the locally optimal algorithm based on SPSA, which requires a very high computational complexity.It is also seen that a small number (3-4) of feedback bits can close the gap between the distortion outage performance with no CSIT and full CSIT substantially.We also study the asymptotic behavior of the outage probability and diversity gain as the available average power becomes unlimited and obtain an approximate expression of the diversity gain.The rest of the paper is organized as follows.In Section 2 {}, we provide the sensor network model and problem formulation.Power allocation schemes based on various CSI assumptions and approximations as well as the diversity gain for the limited-feedback network are presented in Section 3. Simulation results are presented in Section 4 and concluding remarks are given in Section 5.

Sensor network model and problem formulation
A schematic diagram of the wireless sensor network studied in this paper is shown in Figure 1.The network consists of N clusters where the n-th cluster has M n sensors and a clusterhead (CH), n = 1, ..., N. The sensors measure a single point source denoted by θ[k] over discrete time instants k = 0, 1, 2... and send the measurements to their corresponding CH. θ[k] is assumed to be an independent and identically distributed (i.i.d.) band-limited Gaussian random process of zero mean and variance σ 2 θ .The mth sensor measurement within the n-th cluster at time k is of the mth sensor within the n-th cluster is assumed to be i.i.d.Gaussian distributed of zero mean and variance (σ n m ) 2 .We assume that the sensors within a cluster simultaneously amplify-and-forward their observations to the CH via a non-orthogonal multi-access scheme such that the received sensor signals at the CH add up coherently.Note that this can be achieved by distributed beamforming, a technique that synchronizes all sensor transmissions within a given cluster.Hence, the signal received by the n-th CH is where α n m is the amplifier gain, g n m and N n C1 [k] are the channel power gain and the channel noise for transmissions from mth sensor of n-th cluster to n-th CH, respectively.We assume that the channels between sensors and CHs are static (for example, due to shorter distances and a strong direct line of sight component), which implies that the channel gains g n m are time-invariant and can be easily pre-determined.We assume that N n C1 [k] is AWGN of zero mean and variance (σ n C1 ) 2 .We also assume that signals received at a given CH are not interfered by any signals from other clusters (which can be easily accomplished by using time division multiple access for scheduling intra-cluster sensor transmissions).We assume that CHs, being more powerful devices that are capable of transmitting with larger power than sensors, amplify-and-forward y n [k] to FC using orthogonal multi-access [e.g.frequency division multi-access (FDMA)].The FC receives a vector of signals whose n- where b n is the amplifier gain at the n-th CH transmitter, h n and N n C2 [k] are the channel power gain and the channel noise for transmissions from n-th CH to FC respectively.We assume that N n C2 [k] is AWGN of zero mean and variance (σ n C2 ) 2 .We assume that the channels between CHs and FC are stationary ergodic and subject to independent Nakagami-m block-fading, and hence, the channel power gain h n ℜ + is distributed according to a gamma distribution with a mean equal to the inverse of the square of the transmission distance.In other words, the probability density function (p.d.f.) of h i , i = 1, 2, ..., N, is given as where 1 λ i is the mean channel power gain and m i ≥ 0.5 is a real parameter that indicates the severity of the fading.Γ(•) is the Gamma function defined as (m) = ∞ 0 t m−1 e −t dt.We include a subscript i in f i (•) because the distributions are independent but not necessarily identical.For the special case of Rayleigh-fading, the channel power gain is exponentially distributed given by f i (h i ) = λ i e −λ i h i which can be easily obtained by substituting m i = 1 into (1).In our work, we will assume either full CSI knowledge at both the FC receiver (CSIR) and the CH transmitters (CSIT) or CSIR and partial CSIT.The type of partial CSIT considered in this paper is in the form of quantized (or rate-limited) feedback.We can write the received signal at the FC in vector form given as z = sθ + v where In what follows, we suppress the time index k for simplicity (due to assumed stationarity of the fading channels and i.i.d.nature of the source).The fusion center uses a linear minimum mean square error (MMSE) estimator to reconstruct the source θ, given by θ where C is a diagonal matrix with its n-th diagonal element given as The variance of θ is given by var( θ ) = 1 Denote by q n the total power of sensors in the n-th cluster and P n the transmit power of the n-th CH.Following the assumption made in [4] that all sensors within a cluster transmit with equal power (q n /M n ), we obtain the expressions for the sensor amplify and forward power gain within the n-th cluster, the n-th CH transmission power and the distortion at the FC as respectively, where 2 .Note that U n , V n , C n are parameters available at CH and contain information about the topology of each cluster.
With this sensor network configuration and modeling assumptions, we first present the optimum power allocation problem assuming CSIR and full CSIT in Section 2-A and then formulate the problem assuming partial CSIT using quantized channel feedback in Section 2-B.Note that power allocation here refers to the power control of CH transmitters for transmission over a single fading block as a function of CSIT, and long-term average power refers to the transmit power averaged over infinitely many fading blocks and over the number of CH transmitters.The performance metric used in this paper is distortion outage, or distortion outage probability, which is defined as the probability that the instantaneous distortion D at the FC (which, for a given fading block is a random variable) exceeds a maximum allowable distortion threshold D max , or in mathematical notation, P outage = Pr (D >D max ), where Pr(A) denotes the probability of the event A occurring.

A. Power allocation with CSIR and full CSIT
In this section, we simply re-state the power allocation problem with CSIR and CSIT studied in [4] for blockfading channels.The aim is to obtain the optimal power allocation scheme that minimizes distortion outage probability subject to a long-term average power constraint P av , formally given as ( where x i where M is the dimension of the vector x, and is the distortion achieved at the FC for a given fading block, as a function of the channel gains and CH transmission powers, which are also functions of the channel gains due to the availability of full CSIT.

B. Power allocation with CSIR and quantized CSIT
In wireless sensor networks with rate-limited feedback links, only a finite set of power values can be transmitted from the receiver (FC) to the transmitters (CHs).We denote the collection of this finite set of power values as a power codebook P (N,L) where N and L are the number of CH transmitters and the number of power levels, respectively.It is often more practical to convert L into R binary bits using the relationship L = 2 R and refer to the unit of feedback resolution in terms of bits.For an R-bit broadcast feedback channel and N clusters in the network, we quantize the vector channel space N + into L regions.Denote the regions as R (N) j and the power codeword associated with the j-th quantized region as P (N) j ∈ P (N,L) , j = 1, ..., L. Furthermore, the j-th region power codeword P (N) j = [P 1,j , . . ., P N,j ] T contains a set of N power values specifying the CH transmit powers.We assume that CHs and FC know this (pre-computed) power codebook, since this power codebook can be computed offline, purely based on the channel statistics and the available average power.We will first present the single-cluster network problem formulation as it is simple and provides some useful intuitions and properties that will be useful later in formulating the multi-cluster problem.
1) Power allocation with quantized CSIT for a single cluster (N = 1): Suppose we have an arbitrary power codebook P (1,L) = [P 1,1 , . . ., P 1,L ] T assigned deterministi- cally to L quantization regions in h 1 ℜ + , that is whenever h 1 belongs to the j-th quantization region, the CH uses the transmission power P 1, j with probability one.Without loss of generality, we assume that P 1,1 > ... >P 1, L ≥ 0. Before we define the quantization regions, we need to first state a property that the optimal quantizer (one that minimizes the outage probability) possesses.Note that when N = 1 it can be easily shown that the distortion and the outage probability are monotonically decreasing functions of power.These two properties are the same as the problems studied in [12][13][14], and hence, it can be easily shown in a similar fashion that the optimal (deterministic) index mapping achieving minimum outage probability also has a circular structure (one that wraps around) as in [12][13][14].It is straightforward to show that, for a given fading block, in the case of nonoutage, the index is assigned to the minimum power that can meet the distortion threshold, and in the case of outage, which occurs when none of the power in the power codebook can meet the distortion threshold, the index is assigned to the smallest power.We now introduce a set of channel thresholds defining the boundaries of the quantized channel regions as an alternative for defining the problem instead of power simply because it is easier to define the cumulative distribution function (c.d.f.) for the fading distribution and the outage probability in terms of the channel thresholds.However, throughout this paper, we may use channel thresholds and power levels interchangeably, depending on whichever is more convenient in the given context.The channel thresholds are one-to-one functions of the quantized power values, given as s 1,j = j 1 /P 1,j where and L denotes that there are L power feedback levels or quantization regions).Denote the regions as R (1) j , j = 1, ..., L (the superscript indicates N = 1).The circular index mapping allows us to naturally define ) of the channel gain for N = 1.Note that the outage probability is then simply given by F 1 (s 1,1 ).The problem of minimizing the outage probability subject to a longterm average power constraint can then be formulated as 2) Power allocation with quantized CSIT when N ≥ 2: We begin by first illustrating the complexity in the structure of quantization regions for N ≥ 2 through an example.Figure 2 shows the quantization regions of a suboptimal solution for N = 2 and L = 4 obtained by using iterative Lloyd's algorithm incorporating a simulation-based randomized optimization method called SPSA (simultaneous perturbation stochastic approximation [15]), where the first step of the algorithm finds the optimal channel partitions for a given set of quantized power values, and the second step uses SPSA to find a locally optimal set of quantized power values for these channel partitions.These two steps are iterated until a satisfactory convergence criterion is met.For more details on this algorithm and SPSA as a stochastic optimization tool, see Section 3-B1 where we provide this SPSA-based algorithm that has a superior performance compared to our quantized power allocation algorithms, but at the cost of a high computational complexity.We can see from Figure 2 the irregularity in the way the regions can be formed already for N = 2 and L = 4.In the general case with N ≥ 2 cluster network with L-level power feedback, the optimal quantizer is unknown.Hence in order to make the quantized power allocation problem for distortion outage minimization analytically tractable, we impose a restriction on the ordering of the powers.This restriction gives the quantization regions a certain structure that can be exploited for analytical tractability, at the cost of a small performance loss.
Recall that the power codewords of a (N, L) power codebook are given by P (N) j = [P 1,j , . . ., P N,j ] T , j = 1, ..., L. We assume the restriction in ordering of the power codeword given as P (N) 1

. . . P (N) L
where ≻ denotes component-wise inequality.We first show, in a similar way to [14], that the optimal (deterministic) index mapping that achieves the minimum outage probability for N ≥ 2 also has a circular structure.The component-wise inequality of the power codeword implies that Λ 1 > ... > Λ L where j = N i=1 P i,j , j = 1, ..., L. Note also that distortion and the outage probability are monotonically decreasing functions of P i, j .We are interested in finding an index mapping scheme that achieves the minimum outage probability subject to a long-term average power constraint.We first consider the set of channel gains that are not in outage with a non-zero probability measure: S = {h : D(P (N) 1 , h) ≤ D max } .The optimal index mapping strategy for a channel h in this set is for the receiver to feed back an index i such that Denote by I the set of channel realizations that get assigned to the index i.Now assume the contrary, that it is optimal to feed back some j ≠ i for h ∈ H ⊆ I where H has a non- zero probability measure.If j <i, construct a new scheme that maps all elements of H to i instead.The newly con- structed scheme clearly uses less average power since Λ i < Λ j while the outage probability remains the same.If j >i, we see that an outage also occurs for h ∈ H. Thus, the corresponding outage has increased, which is a contradiction to the assumption that j ≠ i is optimal.Now consider the set of channels in outage, namely {h : D(P (N) 1 , h) > D max } with a non-zero probability measure.It is easy to see that the optimal feedback index should be L since it is the one that results in the smallest average power consumption while achieving the same outage probability, since Λ L < Λ j ∀j <L.
To illustrate the structure of the quantization regions under the above-mentioned restriction on the quantized power values, we give an example of an N = 2 network with R = log 2 L-bit feedback in Figure 3. Similar to the N = 1 case, we quantize the channel space into L regions according to a circular quantization structure.The regions are defined as R Denote the boundaries that divide the channel space into L regions as B j (s (N) j ) for j = 1, ..., L, where s (N) j = {s 1,j , . . ., s N,j } ∈ S (N,L) .The circular quantizer structure implies that there should only exist a single outage region given by R L .It also implies that s i, j = j i /P i, j where φ i = C i (σ i C2 ) 2 γ th /(U i − V i γ th ).In order to ensure no outage exists outside the set R (N) out defined above, the distortion must be constant and equal to D max on all the boundaries between any two quantized regions.This allows us to easily write down the expressions that define the boundaries after substituting P i,j = C i β 2 i,j .We also call the boundaries as distortion curves for this reason.With this quantizer structure, we are interested in minimizing the distortion outage probability subject to a long-term average power constraint in the vector channel quantization space.Defined F N (s where the set {h ≺ B j } {h : D(h, P (N) j ) > D max }.The quantized power allocation problem for outage minimization for this quantizer structure for N-clusters and Rbit feedback is given by min FN(s where j = N i=1 P i,j denotes the elementwise sum of the power codeword P (N) j .

Power allocation schemes and solutions
A. CSIR and full CSIT Problem (2) is solved in [4] for block-fading channels with CSIR and full CSIT.Before we state the result first we need to introduce some notations and definitions.Define the regions R(u) and R(u), and the boundary surface B(u) for some non-negativ u as In order to obtain u*, we need to define the two average power sums as P(u) = R(u) P(h) dF(h) and P(u) = R(u) P(h) dF(h), where F (h) denotes the c.d.f of h.Finally, the power sum threshold u* and the weight w* are given as u* = sup{u : P(u) <P av } and w * = P av −P(u * ) P(u * )−P(u * ) , respectively.
. Variables with a bar on top indicate that they depend on h.

B. CSIR and partial CSIT
Problem ( 5) is non-convex in general, but we can find a locally optimal solution using the standard Lagrange multiplier-based optimization technique and the associated KKT necessary optimality conditions.Note that it can be easily shown that the second constraint in ( 5) is satisfied with a strict inequality.We therefore discard this constraint in what follows as it will not affect the result.The Lagrangian is given by FN(s where μ is the Lagrange multiplier.For ease of viewing, we write the partial derivatives of the c.d.f F N (s (N) j ) and the sum power function Λ j with respect to any of its variables in s In this case, the c.d.f F 1 (s 1,j ) can be obtained by integrating (1) from 0 to s 1,j .For Nakagami-m fading, the c.d.f is given by the regularized lower incomplete Gamma function defined as F 1 (s 1,j ) = g(mls 1,j , m)/Γ(m) where γ (x, m) = x 0 t m−1 e −t dt is the incomplete Gamma function.
For Rayleigh fading channels, the c.d.f has a simple closed form expression given as F 1 (s 1,j ) = 1 − e −λs 1,j and the KKT conditions for Problem (4) for m = 1 and P 1,j > 0 are given as Note that the last KKT condition relates to the longterm average power constraint which must be met with equality as implied by the optimality condition.Problem (9) then can be solved by fixed point iterative methods for solving nonlinear equations or any other suitable nonlinear equation solver.The corresponding equations for Nakagami-m fading can be also solved similarly, we do not include them here to avoid repetition.

Multi-cluster network (N ≥ 2)
The KKT conditions of (5) for N ≥ 2 and P 1,j > 0 are given as In general, computing the c.d.fs, namely F N (s (N) j ) for N > 1, involves evaluating multi-dimensional integrals as a function of the distortion curves and cannot be expressed in closed form.We can, however, approximate the distortion curve by a straight line (or a hyperplane if N > 2) that passes through the same points as the distortion curve does at the axes, shown as the straight line above the distortion curve in Figure 4. We call this approximation the outer-straight-line approximation and denote the ith plane as Bi .We can also construct another straight line/hyperplane that is parallel to Bi and is tangential to B i , shown by the straight line below the distortion curve in Figure 4. We call this the inner-straight-line approximation and denote the ith plane as B i .Simulation results show that these two approximations give very comparable outage performances; hence, the rest of the paper will be based on The approximated c.d.f function obtained by SLA is now defined as FN (s j ) Pr(h ≺ Bj ).In the literature, a number of different expressions of the same c.d.f function exists for Nakagami-m fading.In [16,17], the c.d.f is expressed in the form of iterative equations.Reig and Cardona [18] provide an expression that approximates the multivariate c.d.f by an equivalent scalar lower regularized incomplete Gamma function.In [19], the c.d.f is expressed in an integral form.In [20], the c.d.f is given in the form of an 'infinite-sum-series' representation φ i λ i and P i, j > 0 ∀i, j.The partial derivative of the c.d.f is given as The KKT conditions shown in (10) constitute a set of nonlinear equations, where the number of equations grows exponentially as the number of feedback bits increases.In this section, we develop a number of suboptimal algorithms by combining some existing and some newly derived (by us) approximations for special cases of high and low average power, respectively.For moderate to large number of feedback bits, we use an existing approximation called equal average power per region (EPPR) derived in [5,8] using the Mean Value Theorem of real analysis.However, before we can write down the problem formulation using this EPPR approximation, we must deal with the issue of whether we should allocate power in the outage region or not.It seems counter-intuitive to allocate power in the outage region and indeed when full channel information is available, the optimal solution is to not allocate any power in the outage region.This is not true however when quantized channel information is available, as shown in [8,13], and it is optimal to use the smallest power from the power codebook in the outage region.With a nonzero power in the outage region (NZPOR), the channel space is quantized into L regions including L -1 non-outage regions and the Lth region containing a non-outage region as well as an outage region due to the circular nature mentioned earlier.It may be nearoptimal however to allocate zero power in the outage region (ZPOR), in the case of very low average power as  also noted in [14].In this case, there would be L regions with L -1 non-outage regions and the Lth region containing only the outage region.Numerical results indeed confirm that combined with the EPPR approximation, ZPOR performs nearly optimally when the available average power is very low.Note that the actual threshold below which ZPOR performs near-optimally depends on N, m and R. See the Section on Simulation Results for further details on these threshold values for P av .This algorithm with EPPR + ZPOR has the added advantage of low complexity of implementation, as will be evident below.We now provide the problem formulations using EPPR approximation for NZPOR and ZPOR respectively given as The following lemma shows that at high average power and using SLA, one can further simplify the optimal power allocation scheme.
Lemma 3.1: Based on SLA, for Nakagami-m fading with m =[m 1 , ..., m N ] T being the fading parameter of each channel, as P av ∞, it is asymptotically optimal to transmit with P i,j = m i m k P k,j, i, k {1, ..., N}, j = 1, ..., L. If all the fading parameters are identical, it is asymptotically optimal to transmit with equal transmit power per CH for every quantization region, i.e., P i, j = P k, j ∀i, k {1, ..., N}, j = 1, ..., L.
This proof, as well as proofs of other lemmas and theorems, can be found in the Appendix.Hence, Problems ( 13) and ( 14) can be further simplified at high average power by letting all CHs transmit with equal power in the case where all m i are identical.Note again that the exact value of P av that would qualify as ''high average power'' will depend on the values of N, m and R for a given sensor network configuration.See Section 4 for further details.In what follows, we will abbreviate equal power per CH as EPPC.Each region boundary can now be expressed as a function of a single scalar variable.For simplicity, we use P 1,j as the variable.Since s i, j = j i / P 1,j , we can also express channel thresholds belonging to the same boundary as a function of s i, j given as s i, j = (j i /j 1 ) s i, j .When all channels from the CHs to the FC are independent and identically distributed, using SLA, EPPR and EPPC, Problem (13) becomes For low values of the long-term average power, we solve Problem ( 14) by using the nonlinear optimization toolbox 'fmincon' in MATLAB.and for high values long-term average power, we solve Problem (15) using a simple binary search algorithm.The results are then combined and only the best are selected on the basis of the outage performance obtained from these two problems.Note that the constraint on the component-wise ordering of the powers in Problem ( 15) is automatically satisfied due to EPPC and EPPR approximations.In Problem ( 14), we can preserve the power-ordering constraint by breaking down the problem into a series of nested sub-problems where we first solve for s L-1 and then solve for s L-2 and by following the same steps we can eventually solve for s 1 .Note that s L has all its elements equal to positive infinity.The sub-problems are given as min FN (s One can easily show that solving this series of sub-problems is the same as solving Problem ( 14) by verifying the KKT conditions.At each sub-problem, once s j+1 is obtained, we can solve for s j by making sure that s j ≺ s j +1 , j = 1, ..., L -2.
1) Power allocation for quantized CSI using a simultaneous perturbation stochastic approximation (SPSA) algorithm: The vector channel quantization problem can be formulated as the classical vector quantization problem with a modified distortion measure, and the solution can be found by using an iterative Lloyd's algorithm incorporating SPSA [21].Since results obtained using this method do not use any approximations, they can provide benchmarks for performance comparison.Lloyd's algorithm with SPSA can find a locally optimal power codebook that minimizes the outage probability subject to a long-term average power constraint.The Lloyd iteration for codebook improvement involves two steps.In the first step, given the power codebook P (N,L) , one finds the optimal partition for the quantization cells using the nearest neighbor condition by solving the following optimization problem arg min (16) Problem ( 16) can be solved numerically using Monte Carlo simulation for a given P (N,L) .Its solution contains a set of L regions or cells R (N) j , j = 1, ..., L in the vector L , where none of the power vectors in the power codebook can achieve the distortion constraint.
In the second step, we find the improved power codebook.This involves solving the optimization problem (17) where 1(•) is the indicator function.Because we do not have an explicit outage probability expression, we resort to using SPSA, a type of stochastic optimization algorithm, to numerically search for the new power codebook [22].SPSA randomly chooses the search direction and iterates toward a locally optimal solution.Denote ] T as the NL by 1 column vector.Define a loss function where λ is the Lagrangian multiplier.Since the loss function can be viewed as the objective function of an unconstrained optimization problem, we will have to obtain P av numerically as a function of λ.Once the new power codebook is found, we repeat step 1 and step 2 until the stopping criterion is met.The 2-sided SPSA algorithm used in this paper can be summarized by the following steps [15]: (1) Initialization and coefficient selection: Set counter index k = 0. Use a random initial power codebook P0 and set non-negative coefficients a, c, A, a and g in the SPSA gain sequences as a k = a/(A + k +1) a and c k = c/(k+1) g .For additional guidelines on choosing these coefficients, see [15].
(2) Generation of simultaneous perturbation: Generate a NM-dimensional random perturbation column vector Δ k .Each component of Δ k are i.i.d.Bernoulli ± 1 distributed with probability of 0.5 for each ± 1 outcome.
(3) Loss function evaluations: Obtain two measurements of the loss function based on the simultaneous perturbations around the current power codebook Pk : J( Pk + c k k ) and J( Pk − c k k ) with c k and Δ k as defined in Steps 1 and 2.
(4) Gradient approximation: Generate the simultaneous perturbation approximation to the unknown gradient given as where Δ k, i is the ith component of the Δ k vector.
(6) Iteration or termination: Return to Step 2 with k + 1 replacing k.Terminate the algorithm if there is little change in several successive iterations or the maximum allowable number of iterations has been reached.
Remark 1: SPSA is computationally intensive and requires tuning λ and all the coefficients whenever network parameters change, such as any changes in the average power constraint or the number of feedback bits.Convergence can be slow and may settle to different local minima depending on the initial points chosen.Hence in the next section, we will only provide limited SPSA results (up to 4 bits of feedback) as a performance benchmark for our various approximate distortion outage minimization algorithms.

C. Asymptotic behavior of outage probability and diversity gain in quantized feedback
In this section, we briefly present some results on the asymptotic behavior of the distortion outage probability as the available long-term average power P av goes to infinity.We also provide an approximation for the diversity gain (see definition below) which essentially indicates how fast the outage probability decays with increasing P av .The asymptotic behavior of outage probability as P av ∞ is given in the following Lemma.Lemma 3.2: Suppose the fading channels between the clusterheads and the FC undergo independent Nakagami-m fading with the i-th clusterhead having a fading parameter of m i .As P av ∞, the asymptotic distortion outage probability achieved by the SLA-based power allocation algorithm with quantized channel feedback of R = log 2 L bits is given by where Q = N i=1 m i .Note that P outage ≈ FN (s 1,1 ) is given by (30) in the Appendix.
The diversity gain d is defined as Theorem 1: Under the same conditions as in Lemma 3.2, the diversity gain achieved by the SLA-based power allocation algorithm with quantized channel feedback of R = log 2 L bits is given by d ≈ Q L + ... +Q 2 + Q, where Q = N i=1 m i .Remark 2: Note that there are a number of approximations (all of them analytically justified) that are used to derive the above results as can be seen in their proofs in the Appendix.We would like to remark here that it is because of this reason we express the asymptotic expressions as approximate relationships.Whether or not these limiting values hold exactly with equality is left open for future research.

Simulation results
Our simulation results are based on the topology given in Figure 6.The topology for N = 1 (one cluster) is obtained by discarding all the clusters except the one on the top left.For N = 2, we keep the top left and the bottom right clusters.For N = 6, the topology is given as it is in Figure 6.The sensors in each cluster are placed in four equally spaced concentric circles, and the number of sensors in each circle are 6, 12, 18 and 24 from the smallest to the biggest circle, respectively.All clusters have a radius of 40 m.All sensors transmit with a power of q n /M n = 1mW.The clusterheads are located at the center of each cluster for simplicity.CHs are 100 m apart from the next closest CH (for the 6-cluster network).The FC is located 500 m away from the source.The channel noise variances are set to (σ n C1 ) 2 = 10 −12 Watt and (σ n C2 ) 2 = 10 −10 Watt ∀n.The source variance is set to σ 2 θ = 1 Watt.The maximum distortion threshold D max is set to 0.0043 (10% of the minimum achievable distortion of the 6-cluster network).Recall that there are no expressions of the outage probability for N ≥ 2 in closed form; hence, we obtain the outage probability via Monte Carlo simulation over 1,000,000 channel realizations using the locally optimum power allocation (for N = 1 and the SPSA algorithm) and the strictly sub-optimal power allocation obtained via SLA combined with various other approximations such as EPPC and EPPR etc.For very low average power values, the outage performance is obtained using the ZPOR algorithm.As mentioned before, note that the actual threshold values below which the ZPOR-based algorithm performs nearoptimally depend on the specific values of N, m and R.
For example, when N = 2, m = 0.5, the threshold values for low P av are -41.0dBW, -38.7 dBW and -32.7 dBW for 1-bit, 2-bits and 4-bits of feedback, respectively.The exact analytical characterization of this P av threshold is beyond the scope of the current paper.
The first simulation result comparing the outage performances of the inner SLA and outer SLA is provided in Figure 7 for the 6-cluster network for the Rayleigh fading case (m = 1).These simulation results are computed using EPPR and EPPC approximations and show a close match between the two SLA methods.Similar results were seen for other choices of m.From here onwards, for the rest of the simulation results, SLA refers to the outer SLA.
We now present the simulation results for N = 1,2,6, based on three different Nakagami-m fading parameters, namely, m = 0.5 (severe fading), m = 1 (Rayleigh fading) and m = 2 (less severe fading).We assume that fading channels between CHs and FC have identical fading parameters (m i = m k ∀i, k).The outage performance of the single cluster limited-feedback problem is obtained using the solutions to the KKT conditions for Problem (4) for 1 bit feedback and the EPPR approximation for 2, 4 or 6 bits of feedback.The corresponding results with Nakagami fading parameter m = 0.5 are shown in Figure 8.Although in the single cluster network we are only quantizing a scalar channel space, its performance studies allow us to obtain some fundamental but important insights into the results for quantizing the multidimensional vector channel space.The outage performance using equal power allocation (EPA), allocating all CHs with the same powers, and optimal power allocation scheme for full CSI using (6) are also shown in the figure to provide performance benchmarks.Figure 8 shows a progression of performance improvement from EPA which has no knowledge of CSIT, to partial CSIT with increase in feedback resolution from 1 bit to 6 bits, to full CSI (complete knowledge of CSIT).At P outage =  0.1, a 1-bit feedback can achieve roughly half the power gain (in dB) than that of EPA relative to full-CSI.With R = 6, the outage performance is already very close to full CSI.
Figure 9 gives some indications of how good the approximation methods (SLA and SLA + EPPR) are for N = 2, R = 1 and m = 0.5,1,2.The benchmark here is the optimal outage performance obtained using an exhaustive search (ES) method.The exhaustive search is used due to difficulties in obtaining the closed-form outage expressions for N > 1. ES is carried out over 100,000 search points in 2 + .Figure 9 shows both SLA and SLA + EPPR are good approximations at least under this topology setting as both give results that are closely matched to the optimal outage performance.
Figures 10 and 11 show the outage performance obtained by SPSA algorithm and EPA, SLA + EPPR and full CSI for N = 2 and R = 1,2,4 for m = 0.5 and 2, respectively.Comparing these two figures, we find that larger average power is required in Figure 10 to achieve the same outage probability due to more severe fading.We can also observe that as the number of CH increases from one to two, less average power is required to achieve the same outage probability due to diversity gain.For example, for m = 0.5, R = 4 and N = 2, the long-term average power required to achieve an outage performance of 0.1 is -39 dBW, 7.4 dB less than N = 1 with the same settings.Note also that the power gain gap between the 4-bit feedback and the full CSI has widened.This gap will become more prominent in the case N = 6.Also note that SPSA gives very similar results as to SLA + EPPR.The coefficients used in SPSA algorithm are set to c = 10 -5 , A = 80, a = 0.602, g = 0.101 and a = 10 -6 .(A + 1) a /(mean magnitude of ĝ0 ( P0 )), where ĝ0 ( P0 ) is computed via step 4 of SPSA, and the mean is computed by averaging over Δ k .In step 2 of the Lloyd's algorithm outlined in section 3-B1, the probabilities are calculated by Monte Carlo simulation over 100,000 vector channel realizations.
The near-optimality of the EPPC-based algorithm at high average power is illustrated through Figure 12.This figure shows how the EPPC-based algorithm (SLA combined with EPPR and EPPC) approaches the performance of the SLA-based algorithm (without any further approximations) as the average power increases for the 2-cluster network with 1-bit feedback for m = 0.5,1 and 2. For m = 2, the region that belongs to the high average power is roughly P av > -40 dBW, as shown in Figure 12.Similar results (not included in order to avoid repetition) were seen for other values of N, m and R, albeit with different thresholds for P av above, for which the EPPC-based approximations perform close to the SLAbased algorithm.
The outage performance for N = 6, R = 1, 2, 4 obtained by using EPA, SLA, SPSA and full CSI for m = 0.5 and m = 2 is shown in Figures 13 and 14 respectively.The parameters used in SPSA here are the same as for N = 2. Observe again the effect of diversity gain with the increased number of clusters.The gap between the 4-bit feedback and the full-CSI has widened.This may be due to the fact that the feedback resolution per CH decreases as N increases with a fixed R. Simulation results show that at P outage = 0.1, having a 4-bit feedback can achieve half the power gain (in dB) than that of EPA relative to full CSI.
The diversity gains are also shown in Figures 11 and 13 as solid straight lines just above the outage probability curves.From the definition of the diversity gain, we can see that it is simply given by the gradient of the outage probability as P av ∞.Note that the straight lines are inserted in these figures to provide a visual description of the diversity gains by showing the gradients; they do not represent the actual outage performance.These straight by using various levels of useful approximations.An extensive set of numerical results are presented to demonstrate the performance of these algorithms for different fading conditions (including Rayleigh fading) in Nakagami-m fading.The diversity gain of such network is also studied which demonstrates how the distortion outage probability decreases as a function of the long-term average power and the number of feedback bits when long-term average power becomes arbitrarily large.
Future extensions of this work may look at generalizing the point Gaussian source to a dynamical system where the distribution of the source changes with time in some fashion.Another direction is to extend the problem formulation to estimate a random field where the data collected by the sensors are spatially correlated, and/or the fading channels from the clusterheads to the FC are correlated instead of being statistically independent.

Appendix
Proof of Lemma 3.1 Recall the c.d.f expressed in the 'infinite-sum-series' form given in (11).Note that when j = 1, the expression corresponds to the outage probability, i.e., FN (s (N) 1 ) ≡ P outage .As P av ∞, s i, j 0, and (11) can be simplified as, The partial derivative of FN (s If all m i = m k ∀ i , k, then P i, j = P k, j , ∀i, j, k.This completes the proof.□ Proof of Lemma 3.2 In Lemma 3.1, we obtained FN (s (N) j ) and showed that as P av ∞ it is asymptotically optimal to transmit with P i,j = m i m k P k,j.We can use this result to express any P i, j in terms of P 1,j ∀i, j.Hence instead of dealing with a vector space, we can reduce the problem down to a scalar problem by expressing (21) as a function of P i, j given as where Q = N i=1 m i .The quantity j = N i=1 P i,j can also be written as a function of P 1,j given as As P av ∞ the channel thresholds become small, s i, j 0 ∀i, j.Hence the long-term average power in each quantized region can be approximated to be the same (EPPR), as shown in [8].Applying (24) and (25) to the constraints in Problem (13), we can derive the outage probability as a function of P av , N and L. This expression is also used to obtain the diversity gain of the network.Starting from the last equation in (13), as P av ∞, s i, j 0, Since s 1,L is small, s Q 1,L << s 1,L and we can discard the term with s Q 1,L in (26).After rearranging we obtain an expression of s 1,L given as Applying (24) and (25) to the constraint with j = L -1 in Problem (13) gives where the last line is obtained after substituting (27).Repeating the above steps for the remaining constraints in (13), we can obtain and the outage probability is □ Proof of Theorem 1 Let J(Q) = Q L + ... + Q 2 + Q.The diversity gain for the limited-feedback system can be obtained by substituting (30) to (19) and is given as □

Figure 1
Figure 1 Schematic diagram of a wireless sensor network for distributed estimation.

Figure 3
Figure 3 Vector channel quantization regions formed by a series of distortion curves for a 2-cluster network.
the outer-straight-line approximation [referred in this paper simply as the straight-line approximation (SLA)].A visual illustration comparing the actual outage region and the SLA approximation for N = 3 is shown in Figure 5.However, it is difficult to illustrate what the regions would look like for N > 3.

Figure 4
Figure 4 Inner and outer straight-line approximations.

3 Figure 5
Figure 5 Exact outage region and SLA approximation in 3 + .

Figure 6
Figure 6 Wireless sensor network topology.

Figure 7
Figure 7 Inner SLA versus outer SLA of the 6-cluster network with m = 1.

Figure 8 Figure 9
Figure8Outage performance of a single-cluster network employing EPA, 1, 2, 4 and 6 feedback bits and optimal full-CSI power allocation for m = 0.5.

Figure 14
Figure 14 Outage performance of 1, 2 and 4-bit feedback, full CSI and EPA of the 6-cluster network for m = 2.