 Research
 Open Access
 Published:
Conditional downsampling for energyefficient communications in wireless sensor networks
EURASIP Journal on Advances in Signal Processing volume 2013, Article number: 101 (2013)
Abstract
This paper deals with the power limitations in a wireless sensor network scenario. Concretely, we propose to use a conditional downsampling encoder (CDE) at the sensing nodes as an energyefficient solution for the communication problem. It exploits the knowledge about the signal structure, which is assumed to be timecorrelated, in order to decrease the sampling rate and hence to reduce the number of transmissions within the network. We analytically assess the performance of the CDE in terms of quadratic distortion, from which we derive closedform expressions when it is combined with one of the two decoders: the step decoder and the predictive decoder. Moreover, we propose two methodologies to design the CDE in order to guarantee a given coding rate. We also compare the CDE, both analytically and experimentally, with other classical decimator techniques, which are the deterministic downsampling encoder and the probabilistic downsampling encoder. Numerical simulation validates our analytical results. Moreover, we compare the obtained quadratic distortion and extract the conclusions of the capabilities of the studied encodingdecoding schemes.
1 Introduction
1.1 Motivation and previous work
Wireless sensor network (WSN) design is currently one of the most challenging topics in the communications field. In particular, WSNs are severely energyconstrained because they consist of many small, cheap, and powerlimited nodes, whose batteries cannot be recharged in most cases. Hence, the application of energyefficient algorithms turns out to be crucial.
Following with this motivation, many energyefficient strategies can be found in order to mitigate the energy costs and hence increase the lifetime of the WSN. Without the aim of being exhaustive, we point out some examples:

Energyaware routing for cooperative WSNs and ad hoc networks [1, 2]. These techniques seek the optimum path that minimizes the total spent energy in multihop WSNs.

Signal processing techniques for minimumpower distributed transmission schemes [3, 4]. Using distributed beamforming techniques, the nodes can decrease the transmitted power at the same time that they increase the total throughput of the network.

Dataaware techniques to reduce energy by efficient information processing [5, 6]. By means of signal processing techniques, the network exploits the inherent structure and properties of the measured signal in order to sample the data and therefore reduce the associated energy costs.
Our study falls in the third category and may be complementary to the other approaches. Actually, we propose to encode the sensed data removing redundancy in the time domain. Many transmission schemes use noncausal transmissions such as block coding. In these cases, the source collects a number of contiguous time samples in order to compress them by removing part of (or all) the redundancy among them. Within this group of encodersdecoders, a large amount of different techniques can be found. Albeit these transmissions are very appropriate for highrate transmissions and/or delaytolerant communications, these noncausal transmission schemes may not be applicable in some scenarios because block transmissions are not always allowed due to delay constraints and/or low symbol rates of the source.
For delaysensitive applications such as realtime monitoring in WSNs, where the reconstruction of the signal must take place at the same time instant as the corresponding input measurement, causal source codes are more convenient. Hence, a source code is said to be causal if the n th decoded sample depends on the output signal only through its first n components or, in other words, depends on the past and present outputs but not on future ones. Quantizers, delta modulators, differential pulse code modulators, and adaptive versions of these are all causal in the above sense. The basic properties of causal source codes have been introduced in 1982 in [7], and related works have been expanded so far. The work in [8] extends the general results of [7] for the case where side information, i.e., extra information that is correlated with the source, is available at the encoder and the decoder. In addition, a causal source code is called a zerodelay or sequential code if both the encoder and the decoder are causal (note that for the causal source code definition, the assumption of causality is only at the decoder) [9, 10].
In the literature, there are several zerodelay coding systems. One of the most common zerodelay coding systems is the wellknown differential pulse code modulation (DPCM). In a nutshell, the current sample to be coded is predicted from previously coded samples. This prediction is used as a reference, and it is compared with the current sample. Hence, the output of the encoder is the prediction error. The inverse operation takes place at the decoder side. According to [11], DPCM was first introduced in a US patent by C. Cutler in 1952. Since then, many results have appeared. In particular, the autoregressive (AR) model has received special attention for the study of zerodelay coding schemes. Some of the early works on AR models date back to the 1960s. The works in [12, 13] analyze the quadratic rate distortion of DPCM (the reader can find an extended description of the rate distortion in Chapter 13 of [14]). The work in [15] extends these results assuming a Gaussian distribution of the predicted error. Other works proposed algorithms for nonuniform quantizers optimized in order to minimize the distortion rate [16].
Later works, as the one in [17], try to particularize the results obtained by the DPCM also for the case of low bit rates. In such cases, the system performance becomes worse. Then, the classic DPCM encoder is modified in order to achieve better performance in terms of rate distortion for a lowbitrate regime.
Recent works on this field have tried to unify the theoretical limits of the DPCM (and other zerodelay schemes) for AR models with other information theory concepts. The authors in [11] provide analytical results for the existing duality between the rate distortion of an AR process with the capacity of the intersymbol interference channels. By contrast, other works such as [9] follow an information theoretical approach that adjusts the upper and lower limits of the rate distortion for generic zerodelay schemes using the mutual information as a measure of the achievable rate.
1.2 Our contribution
Our proposed work also follows the same sequential transmission approach exposed above. Concretely, the approach of this paper is similar to that of [6], where the authors seek for the optimal sampling in a WSN scenario with correlated sources. However, we present the problem from a more realistic energyefficient perspective. According to the results in the literature about energy consumption in sensor networks [18], the main source of energy spent in a sensor is the power dedicated to maintain the sensor awake. Concretely, most energy is consumed by the elements of the frontend [19]. Therefore, our goal is to reduce the number of total transmissions in order to keep the sensors in sleep mode as long as possible.
Note that for the complete characterization of the performance of real communication systems, several metrics should be evaluated, e.g., the robustness against noise in terms of the signaltonoise ratio, the quantization error as a function of the codification scheme, or the bit error rate related to a selected modulation. However, in this paper, we only focus on the the study of the downsampling distortion (see Section 2.2) as a figure of merit of the quadratic reconstruction error introduced by a downsampling technique at the fusion center. The study of other performance metrics, although interesting, is out of the scope of this paper.
In particular, we study downsampling techniques in which the samples of an input signal are either blocked or transmitted following a given criterion. For that purpose, we propose a downsampling encoding scheme called conditional downsampling encoder (CDE). A CDE benefits from the existing time correlation in the measured signal in order to sequentially elaborate the decimator pattern. Typically, the readings in WSNs are spacetimecorrelated, and hence, strategies in the two domains can potentially improve the accuracy of the signal recovered at the receiver side. However, note that considering not only the time correlation but also the space correlation at the sensing nodes would require intensive internode communication. Since this approach would penalize in terms of signaling, complexity, and energy consumption, we have discarded it. Basically, the CDE predicts the current sample using a linear estimation and takes this prediction as a reference. Then, the transmission is blocked if the prediction error does not exceed a given threshold and transmitted otherwise. It is clear that a key step of the CDE design is to determine the threshold that ensures a sample rate reduction of a factor γ. Therefore, two different threshold designs are proposed in this paper.
Clearly, the CDE presents some similarities with the DPCM in the sense that both schemes use (linear) prediction as a reference in order to encode the input signal. However, they present important differences as well, which can be summarized as follows:

A DPCM produces an outcome sample for each input sample. In other words, it does not change the sample rate. On the contrary, the CDE (and also the deterministic downsampling encoder (DDE) and the probabilistic downsampling encoder (PDE)) reduces the sample rate. This behavior is very convenient in some energyconstrained scenarios, such as WSNs, since the total number of transmissions is reduced by a factor γ, increasing the energy efficiency of the network.

While a DPCM works at the symbol level, the CDE does at the sample level. Thus, the downsampling encoderdecoder schemes studied in this paper are not exclusive to the DPCM or other zerodelay coding techniques. Actually, they can be used on top of them when the signal is transmitted.
In addition, we compare the performance loss of CDE with different encodingdecoding pairs when the number of samples is reduced by a factor γ. In particular, we study the following two downsampling criteria: (1) a DDE and (2) a PDE.
A DDE works as a decimator, i.e., it reduces the number of samples following a deterministic pattern. Hence, the DDE selects only one in γ ^{−1} samples, where γ ^{−1} is typically a natural number.
A PDE slightly differs from a common decimator since it reduces the number of samples following a probabilistic pattern, i.e., one sample will be transmitted with probability γ and otherwise blocked with probability 1−γ. This method eliminates the restriction of γ ^{−1} to be a natural number. However, we analytically show that a DDE outperforms a PDE in terms of quadratic distortion.
On the other hand, the decoder at the fusion center recovers the original sampling rate by upsampling the signal. We study two possible decoders: (1) a step decoder (SD) and (2) a predictive decoder (PD). A SD reconstructs the missing samples by replicating the last decoded sample. This does not require any side information knowledge. On the contrary, the PD reconstructs the missing samples by linear prediction (as in the CDE case). We analytically show the improvements in terms of quadratic distortion when the samples are predicted rather than simply replicated.
Hence, we give analytical expressions for the quadratic distortion of the following downsampling encodingdecoding pairs: DDESD, DDEPD, PDESD, and PDEPD. Furthermore, we also provide accurate approximations for the quadratic distortion of CDESD and CDEPD. Numerical simulations support our proposed analytical expressions.
1.3 Organization of the paper
The rest of the paper is organized as follows: In Section 2, we introduce the assumptions and the scenario considered throughout the paper. Section 3 presents the proposed CDE as well as the other encodingdecoding schemes under study. The analytical expressions of the downsampling distortion for the proposed CDE are detailed in Section 4. Also, two different design strategies are presented in this section. The analytical expressions of the downsampling distortion for other encodingdecoding schemes are detailed in Section 5. Simulation results are shown in Section 6. Conclusions and suggestions for future research are drawn in Section 7.
2 System model and assumptions
Let us consider a WSN configured in star topology that monitors a given physical scalar magnitude such as temperature or humidity. The network is composed of two types of nodes: (1) a set of S sensing nodes that transmit wirelessly the measurements to (2) one fusion center that manages, gathers, and processes the measurements from the sensing nodes.
2.1 Assumptions on the signal model
We consider the signal modeled as an Sdimensional stochastic process, namely^{a},
where x(n)=[x _{1}(n) x _{2}(n) …x _{ S }(n)]^{T} and x _{ s }(n) denotes the measurement of the s th sensor at the sample time n and N denotes the number of time samples in the observation window. Let x _{ s }(n) be a real and timediscrete autoregressive model of order 1 (AR1), variance ${\sigma}_{x}^{2}$, and sampled at a rate $\mathcal{R}$, which is commonly assumed in the signal processing literature in order to model real sources [20]. It is defined as
The autoregression coefficient is denoted by ρ∈[0,1] and assumed to be constant during the transmission. The random process z(n) is a sequence of Gaussiandistributed and independent random variables with zero mean and variance ${\sigma}_{z}^{2}$.
Without loss of generality, we also assume that the variance of the measurement x _{ s }(n), i.e., ${\sigma}_{x}^{2}$, is equal to 1. Therefore, the variance of the noise is well known, and it is ${\sigma}_{z}^{2}=1{\rho}^{2}$.
2.2 Assumptions on the system model
We do not assume any coordination among sensing nodes. Hence, each one will act noncooperatively. It reduces the required signaling in comparison to cooperative communications and also allows us to focus our analysis only in the communications between one sensing node and the fusion center without loss of generality. The transmission model under consideration is a generic one, and it is illustrated in Figure 1.
Note that for simplicity, we have replaced the notation x _{ s }(n) by x(n). Furthermore, we require that the signal x(n) is transmitted in a zerodelay manner from the source to the destination. Throughout this paper, we understand for zerodelay transmission when for each sample at time n, the receiver will have a reconstruction of the signal x(n). Furthermore, for time instant n, we are not interested in x(n−1) anymore, so delaytolerant strategies (such as block encoding schemes) are not feasible. Following this constraint, we will look for encoders that allow us to reduce the sample rate samplebysample in real time.
Hence, we consider a nonlinear encoder with a coding rate γ at the sensing nodes. In our particular case, the encoder selects which samples from x(n) are going to be transmitted with a rate of γ, and the rest will be discarded. The selected samples are represented by y(n); therefore, note that y(n) is only defined for those time slots in which the encoder decides to transmit.
Moreover, we consider nonlinear decoders in order to recover an approximation of x(n), i.e., $\stackrel{~}{x}\left(n\right)$, from y(n) at the fusion center. Roughly speaking, the decoder will construct $\stackrel{~}{x}\left(n\right)$ copying the samples of y(n) when the transmission exists and predicting the rest otherwise.
Definition 1
For a given pair of encoderdecoder, the sink will receive $\stackrel{~}{x}\left(n\right)$ with a given downsampling distortion. It defines the quadratic distortion introduced by the given downsampling encoderdecoder pair ed as
3 Dowsampling transmission schemes
3.1 Different encoding alternatives
We compare our proposed CDE with two selected downsampling encoders among many other possibilities. These are (1) the DDE and (2) the PDE. They have been chosen since they are simple and because many other strategies can be derived from them.
In order to describe the selected encoders, we first need to introduce the following definition:
Definition 2
The transmission support function of an encoder e, named g _{ e }(n), is an indicator function which takes the value 1 when the transmission exists and 0 otherwise.
3.1.1 Deterministic downsampling encoder
This encoder is the simplest and acts as a typical decimator. Its transmission support function is
Note that for uniform downsampling, the DDE is only defined for compression rates γ of the form ${\gamma}^{1}\in \mathbb{N}$.
3.1.2 Probabilistic downsampling encoder
This encoder solves the limitation of DDE that γ ^{−1} is a natural number. Basically, the symbol x(n) will be transmitted following a given probabilistic pattern. Thus, the transmission support function is
It is straightforward to see that in order to guarantee a compression rate of γ, the value of the transmission probability p should be p=γ.
3.1.3 Conditional downsampling encoder
Previous encoders do not assume any memory or prior information of the signal of interest x(n). On the contrary, the CDE uses the available information in order to decide whether the signal should be transmitted or not. In particular, we analyze the cases where the available information is either the last decoded sample $\stackrel{~}{x}(n1)$ or a linear prediction using the linear Wiener filter (LWF) solution in [21] with a given observation vector $\stackrel{~}{\mathbf{x}}\left(n\right)$. The available information is compared with the signal of interest x(n). If the absolute value of the difference is higher than a given threshold Δ, the encoder will transmit the signal. Otherwise, if the difference is below Δ, the transmission is blocked. Mathematically, for the first case,
For the LWF prediction, the CDE is
Although this scheme is quite simple, it has two main complications: (1) the LWF predictor assumes the knowledge of the correlation parameters R and R or at least good estimates of them, and (2) the threshold Δ should be designed in such a way that it ensures a coding rate of γ. The first problem adds some complexity to the system but can be efficiently solved using existing correlation estimators [22]. The second one is addressed later in Section 4.
3.2 Different decoding alternatives
As for the encoding strategies, we select two decoders from a bunch of possible solutions. The first one is probably the simplest and does not require any knowledge of the correlation parameters, while the second one exploits the signal correlation in order to achieve higher prediction accuracy.
3.2.1 Step decoder
It is the simplest decoder. It just copies the value of y(n) into $\stackrel{~}{x}\left(n\right)$ when g _{ e }(n)=1 or maintains the last decoded value $\stackrel{~}{x}(n1)$ if g _{ e }(n)=0. The decoder function is described as
This approach is very typical when the source is sensing a given timecorrelated phenomenon. Since it is assumed to be slow changing, the magnitude is maintained until we receive an update.
3.2.2 Predictive decoder
If we take advantage of the time correlation properties of x(n), we can obtain lower downsampling distortion than for the SD case. The behavior is similar to the previous decoder SD, but in this case, when g _{ e }(n)=0, the PD predicts x(n) using LWF instead of replicating $\stackrel{~}{x}\left(n\right)$. Mathematically,
4 Downsampling distortion of the conditional downsampling encoder
4.1 Signal prediction using incomplete observation vectors
Let the observation vector $\stackrel{~}{\mathbf{x}}\left(n\right)\in {\mathbb{R}}^{N}$, where $\stackrel{~}{\mathbf{x}}\left(n\right)={\left[\stackrel{~}{x}\right(n1\left)\phantom{\rule{0.3em}{0ex}}\stackrel{~}{x}\right(n1)\cdots \phantom{\rule{1em}{0ex}}\stackrel{~}{x}(nN\left)\right]}^{T}$, be an incomplete version of x(n). The vector $\stackrel{~}{\mathbf{x}}\left(n\right)$ is constructed using the N last decoded samples. This is because the decoder does not necessarily know all the values of x(n) and only knows the decoded ones. Hence, some values of $\stackrel{~}{\mathbf{x}}\left(n\right)$ are replicas of x(n), and the rest are predicted values $\widehat{x}\left(n\right)$.
Definition 3
Let the vector ${\stackrel{~}{\mathbf{x}}}_{t}$ be an instance of $\stackrel{~}{\mathbf{x}}\left(n\right)$ where the last true sample was received at time n−t. Mathematically,
Theorem 1
If ${\stackrel{~}{\mathbf{x}}}_{t}\left(n\right)$ is used as the observation vector of the LWF, the mean square error (MSE) is degraded as
Proof
It is proved by induction. First let us assume the case where the vector ${\stackrel{~}{\mathbf{x}}}_{2}\left(n\right)$ is of the form ${\stackrel{~}{\mathbf{x}}}_{2}\left(n\right)={\left[\widehat{x}\right(n1\left)\phantom{\rule{1em}{0ex}}x\right(n2)\dots x(nN\left)\right]}^{T}$, that is, all the positions in the vector correspond to true measurements except for the first one. In this case,
For the case where ${\stackrel{~}{\mathbf{x}}}_{3}\left(n\right)$ is of the form ${\stackrel{~}{\mathbf{x}}}_{3}\left(n\right)={\left[\widehat{x}\right(n1\left)\phantom{\rule{1em}{0ex}}\widehat{x}\right(n2\left)\phantom{\rule{1em}{0ex}}x\right(n3)\dots x(nN\left)\right]}^{T}$, the MSE is degraded as
It is straightforward to conclude that for the general case where ${\stackrel{~}{\mathbf{x}}}_{t}\left(n\right)$ is of the form ${\stackrel{~}{\mathbf{x}}}_{t}\left(n\right)={\left[\widehat{x}\right(n1)\dots \widehat{x}(nt+1\left)\phantom{\rule{0.3em}{0ex}}x\right(nt)\dots x(nN\left)\right]}^{T}$, the MSE is degraded as
□
Corollary 1
For a given ρ, the MSE is only a function of the position of the last true measurement in the observation vector for an AR1 process. Furthermore, it is not dependent on the dimension N of ${\stackrel{~}{\mathbf{x}}}_{t}\left(n\right)$.
Proof
The proof of the first statement is straightforward, and it is enough to verify that the MSE obtained by ${\stackrel{~}{\mathbf{x}}}_{t}\left(n\right)$ and ${\stackrel{~}{\mathbf{x}}}_{t}^{\prime}\left(n\right)$, where
is the same. Then, let us consider, for example, t=2,
Moreover, for observation vectors that only contain estimated measures (i.e., t>N), the MSE also follows (11). It can be observed that if t=N+1, then the MSE is
□
Similarly, if the last transmitted sample x(n−t) is directly used as a reference or prediction, the MSE when the observation vector is ${\stackrel{~}{\mathbf{x}}}_{t}$ is degraded as
Hence, the probability that the last true sample of the vector $\stackrel{~}{\mathbf{x}}\left(n\right)$ is in the position t depends directly on the downsampling criteria used at the encoder. Therefore, in order to compute the downsampling distortion for the CDE, we need to compute the probability of occurrence of the event t, or what is the same, the probability that the observation vector $\stackrel{~}{\mathbf{x}}$ is actually ${\stackrel{~}{\mathbf{x}}}_{t}$. Next, we illustrate the CDE problem using a Markov chain (MC) model.
4.2 The Markov chain solution for the incomplete observation vector case
Let a MC model be a discrete time process where a random variable E(n) is changing in time. The MCs have the property that to be in a state t, i.e., E(n)=t, only depends on the previous state, i.e., E(n−1). This property is very interesting in order to model AR1 processes. Moreover, a MC is said to be homogeneous when the probability of transition between the states of E(n) is invariant in time, i.e.,
Definition 4
Let the matrix $\mathbf{T}\in {\mathbb{R}}^{T\times T}$ denote the transition matrix of a homogeneous MC process of T states where
and each row represents a probability distribution, so [T ^{T}]_{ i } 1=1.
Definition 5
Let the vector $\mathbf{p}\in {\mathbb{R}}^{T}$ denote the stationary probability vector of a homogeneous MC process of T states and any vector that holds the stationary conditions
where p=[P _{0} P _{1} … P _{ T−1}]^{T} contains the probabilities to be in each state t=0,1,…,T in the stationary regime of the MC process.
4.3 The Markov chain model for the CDE
In this section, we analytically evaluate the performance of the proposed CDE with both PD and SD decoders in terms of the downsampling distortion.
The CDE can be modeled following the infinite Markov chain in Figure 2. The state E(n)=0 means that in time n, the transmission exists. Similarly, the state E(n)=t, for t≠0, means that the sample n−t was the last to be transmitted. The transition matrix (with dimension T→∞) that describes the process of the CDE is
From the stationary condition in (21), we can obtain the following relations:
where by definition $\sum _{i=1}^{\infty}{P}_{i}=1{P}_{0}$. Moreover, after some algebraic manipulations,
It is easy to observe that there are infinite solutions for the transition probabilities p _{ i,j }. Thus, we address the design and the corresponding performance in the following sections.
4.4 Approximations for the downsampling distortion of the CDEPD and CDESD
Following the scheme in (7), our aim is to design the threshold value Δ in order to guarantee that the source only transmits a fraction γ of the total samples. For thegeneral case, we may have different values of Δ according to each state t of the MC. Therefore, we define the threshold Δ _{ t } as the threshold value applied to the state t.
The condition in (7) modifies the probability density function (pdf) of the error.
Definition 6
Let the conditional pdf f(xx<Δ _{ t }) be the pdf of x conditioned to x<Δ _{ t }. Mathematically,
where f(x) is the original pdf of x and β(Δ _{ t })∈(0,1) is
Moreover, the rectangular function Π(x) is defined as follows: Π(x)=0 if x>0.5, Π(x)=1 if x<0.5, and Π(x)=0.5 if x=0.5. This definition is summarized in Figure 3.
Lemma 1
Let $x\sim \mathcal{N}(0,{\sigma}^{2})$. Then, the variance of the conditional pdf f(xx<Δ _{ t }) is
Proof
Let x ^{′} define the random variable
where ${x}_{1}\sim \mathcal{N}(0,{\sigma}^{2})$. Hence,
Using the relation
we obtain
The term P{x<Δ _{ t }} in the denominator is
So,
Applying the same relation as that in (30), we obtain
where the term P{x<Δ _{ t }x} is
Thus,
that comes from the relation
□
Definition 7
We define the conditional function $h\left({\sigma}^{2}\right{\Delta}_{t}):\mathbb{R}\to \mathbb{R}$ as
4.4.1 The pair CDEPD
The knowledge of some prior information about the signal can notably reduce the MSE at the decoder compared to other classical methods. This is because only the samples with lower MSE are predicted, i.e., the ones that satisfy $\leftx\right(n){\mathbf{w}}^{T}{\stackrel{~}{\mathbf{x}}}_{t}(n\left)\right<{\Delta}_{t}$, since they introduce less noise power at the decoder.
Lemma 2
Let ${\text{MSE}}_{t}^{\text{CDEPD}}$ be defined as the mean square error when the observation vector is ${\stackrel{~}{\mathbf{x}}}_{t}\left(n\right)$. Then, the ${\underline{\text{MSE}}}_{t}^{\text{CDEPD}}$ is an approximation of ${\text{MSE}}_{t}^{\text{CDEPD}}$ (i.e., the error introduced by the CDEPD pair at the state t) and defined as
Proof
For t=1, the error ${\text{MSE}}_{1}^{\text{CDEPD}}$ follows the conditional variance^{b} such that
Using Definition 7 and since $z\left(n\right)\sim \phantom{\rule{1em}{0ex}}\mathcal{N}\left(0,{\sigma}_{z}^{2}\right)$ where ${\sigma}_{z}^{2}=1{\rho}^{2}$, the ${\text{MSE}}_{1}^{\mathrm{CDE}\mathit{\text{SD}}}$ is
For t=2, the available knowledge is twofold: (1) we know that $\leftx\right(n){\mathbf{w}}^{T}{\stackrel{~}{\mathbf{x}}}_{2}(n\left)\right<{\Delta}_{2}$, and (2) we also know that in t=1 the error was z(n−1)<Δ _{1}. Therefore, the ${\text{MSE}}_{2}^{\text{CDEPD}}$ can be written as
The expectation in (42) can be computed as
This expression is actually the computation of the variance of a bivariant truncated normal distribution. The solution of a singly truncated bivariate distribution can be found in [23]. For higher orders, i.e., t>2, the solution refers to the calculation of the variance of a truncated multivariate normal distribution [24]. Although a solution already exists in the literature, it turns out to be quite complex. Moreover, its complexity increases in t. For that reason, we are considering the following approximation:
but in the general case, it does not necessarily follow a Gaussian distribution. The variance $\mathbb{E}\left[{\left(\mathrm{\rho z}\right(n1)+z(n\left)\right)}^{2}\left\right.\leftz\right(n1\left)\right<{\Delta}_{1}\right]$ can also be expressed as
so, the MSE introduced at t=2 is approximated by
It is easy to conclude that for the general case t, the ${\underline{\text{MSE}}}_{t}^{\text{CDEPD}}$ is
□
Hence, the $\mathcal{D}\left(\text{CDE,PD}\right)$ is approximated by
However, this is still an open problem. It is because the values of P _{ t } are not determined yet. We study this issue afterwards in Section 4.5.
4.4.2 The pair CDESD
If $\widehat{x}\left(n\right)$ is constructed from a linear prediction using the LWF, the MSE in prediction is directly ${\sigma}_{z}^{2}=1{\rho}^{2}$. However, using other strategies, the error will increase as we have seen in (18). In particular, the pair CDESD constructs $\widehat{x}\left(n\right)$ as the last transmitted sample, i.e., $\widehat{x}\left(n\right)=x(nt)$. This prediction scheme introduces an error not only due to z(n) but also due to x(n).
Lemma 3
The ${\underline{\text{MSE}}}_{t}^{\mathrm{CDE}\mathit{\text{SD}}}$ is an approximation of ${\text{MSE}}_{t}^{\mathrm{CDE}\mathit{\text{SD}}}$ (i.e., the error introduced by the CDESD pair at the state t) and it is defined as
Proof
Similarly to the CDESD, for t=1 the error ${\text{MSE}}_{1}^{\mathrm{CDE}\mathit{\text{SD}}}$ follows the conditional variance such that
where z ^{′}(n)=z(n)−(1−ρ)x(n−1) contains both the error contribution due to z(n) and x(n) with the variance ${\sigma}_{z}^{\prime 2}$ equal to
Therefore, the ${\text{MSE}}_{1}^{\mathrm{CDE}\mathit{\text{SD}}}$ is
For t=2 the available information is twofold: (1) we know that x(n)−x(n−2)<Δ _{2}, and (2) we also know that in t=1 the error was z ^{′}(n−1)<Δ _{1}. Therefore, the ${\text{MSE}}_{2}^{\mathrm{CDE}\mathit{\text{SD}}}$ can be written as
To solve the ${\text{MSE}}_{t}^{\mathrm{CDE}\mathit{\text{SD}}}$ in a recursive way may be harder than for the CDEPD case. It is because we cannot apply directly the conditional function since the expectation in (53) is not of the form $h\left({\sigma}_{x}^{2}\right\Delta )=\mathbb{E}\left[{x}^{2}\left\right.\leftx\right<\Delta \right]$. Hence, to simplify, we propose a lower bound for (53) such that
One can easily check that it is in fact a lower bound since
Our proposed lower bound is very close to the real value for high values of ρ. Using the same approximation as in the CDEPD case, and after some simple algebra, we can find the lower bound of (53) as
It is easy to conclude that for the general case t, the ${\underline{\text{MSE}}}_{t}^{\mathrm{CDE}\mathit{\text{SD}}}$ is
□
Hence, the $\mathcal{D}\left(\text{CDE,SD}\right)$ is lowerbounded by
As for the case of the CDEPD pair, this is still an open problem, and it is studied afterwards in Section 4.5.
4.5 Design of the CDESD and the CDEPD
From the design point of view, our aim is to obtain a set of Δ _{ t }’s that assure a coding rate at the CDE of γ. However, there are infinite solutions as we pointed out in (24). That is why we propose two possible approaches to face with the design of Δ _{ t }:

Fixed Δ _{ t }, i.e., Δ _{ t }=Δ for all t.

Variable Δ _{ t } in order to maintain constant transition probabilities, i.e., p _{ t−1,t }=p for all t.
4.5.1 Fixed Δ _{ t }design
This is probably the simplest approach to design the CDE since the encoder does not have to change the value of Δ _{ t } according to the current state since Δ _{ t }=Δ for all t.
First, we want to make explicit the existing relation between Δ and p _{ t−1,t }, as
where f _{ t }(x) is the pdf of the error at state t.
Following the assumption in (44), the variable $x\left(n\right)\widehat{x}\left(n\right)$ follows a Gaussian distribution with zero mean and variance ${\text{MSE}}_{t}^{\text{CDE}}\left(\Delta \right)$, where
Thus^{c},
where erf(x) is the error function of x. Using the result in (24), we can numerically approximate Δ that assures P _{0}=γ as the unique solution of
The solution of Δ for the different values of γ and ρ can be graphically seen in Figure 4.
4.5.2 Variable Δ _{ t }design
This approach allows for a slightly easier computation of the values of Δ _{ t }. The main difference with the previous design scheme is that we can use the result in the following lemma:
Lemma 4
The uniform solutions of the nonzero transition probabilities and for the stationary probability vector are
Proof
Let us first impose that P _{0}=γ. Hence, for the uniform probability case p _{ t−1,t }=p, and using (24)
So, if p _{0,1}=1−γ, we obtain that p _{0,0}=γ. In order to compute the probability of each state, and considering (23), we get
□
Hence, Δ _{ t } is directly
where ${\text{MSE}}_{0}^{\text{CDE{SD,PD}}}=0$; hence, ${\text{MSE}}_{0}^{\text{CDE}}=1{\rho}^{2}$ (as in (60)).
To graphically validate our design framework, we have proposed the following experiment:
Experiment 1
We have simulated the CDESD and the CDEPD for γ=[1/8 1/4 1/2] and for ρ∈[0,1]. The signal has been generated following the AR1 process of 5,000 samples (for each value of ρ). We have computed the probability of transmission P _{0} obtained using our threshold design framework.
From Experiment 1, we have plotted the probability of transmission P _{0} as a function of ρ and for each value of γ. We have used the variable Δ _{ t } design. In Figure 5, we have compared the obtained results with the target coding rate γ, and we have observed that for the case of CDEPD, the fitting is very accurate. For the case of CDESD, it is slightly worse. It is due to the approximation in (53). However, we have said that this approximation improves for ρ→1. This behavior can be observed in Figure 5.
5 Downsampling distortion of other typical strategies
In order to measure the performance of the CDE, we also evaluate the performance of different encoderdecoder pairs in terms of the downsampling distortion. These are DDESD, DDEPD, PDESD, and PDEPD.
5.1 The pair DDESD
The index t denotes the time spacing between the last available sample with the current one. Thus, we can compute the ${\text{MSE}}_{t}^{\text{DDESD}}$ using the result in (18) for each observation vector ${\stackrel{~}{\mathbf{\text{x}}}}_{t}$. Therefore, the downsampling distortion will be the sum of the MSE contributions for each state. Applying the definition of stationary probability vector in Definition 5, we extract that P _{ i }=P _{ j } for all i,j=0,1,…,T. Since we impose a coding rate of γ, the probability of transmission, i.e., P _{0}, is P _{0}=1/T=γ. The stationary probability vector is p=γ 1. Hence, the downsampling distortion for the DDESD can be computed as
5.2 The pair DDEPD
The knowledge of the correlation parameters is available at the PD, and hence, it can predict the nontransmitted samples using the LWF. Following Theorem 1, the ${\text{MSE}}_{t}^{\text{DDEPD}}=1{\rho}^{2t}$. Hence, the downsampling distortion for the DDEPD can be computed as
5.3 The pair PDESD
The PDE can also be modeled following the infinite MC in Figure 2. Hence, the transmission matrix T _{CDE} has the same structure than T _{CDE} in (22), and the expressions (23) and (24) are valid as well. However, the rest is different.
For simplicity, we assume that all p _{ t−1,t } are equal, i.e., the uniform probability case. The results of Lemma 4 also apply here. It gives us two main advantages:

1.
It is the easiest solution to be implemented in practice. The source decides either to transmit or not regardless of what the current state t is.

2.
It reduces the problem to a closedform solution.
Using the results for the MSE_{ t } in (18) corresponding to the decoder SD, we obtain
5.4 The pair PDEPD
The MSE associated to the state t obeys Theorem 1. The downsampling distortion for the PDEPD can be computed as
6 Performance evaluation
In this section, we evaluate and compare the performance of the different encoderdecoder pairs as a function of the downsampling distortion. Moreover, we introduce an experimental evaluation in order to confirm the validity of our theoretical results. For that, we have generated a signal x(n) as a sequence of 5,000 samples using the AR1 model in (2) and for different values of the autoregressive parameter ρ∈[0,1] with resolution 0.01. The results are computed for γ=[1/8, 1/4, 1/2].
6.1 The pair DDESD and the pair DDEPD
We analyze the downsampling distortion for the DDESD and DDEPD pairs. We compare the theoretical results with the experimental results. So, Figure 6 confirms the validity of our theoretical model for the downsampling distortion.
Also, we compare the difference in performance according to the decoder used. The PD takes into account the signal correlation information in the decoding process, and hence, the total performance is increased notably for low values of ρ. On the contrary, if ρ→1, both decoders perform similarly since x(n)−ρ ^{t} x(n−t)≈x(n)−x(n−t).
In Figure 6, we can also graphically evaluate the impact of γ. In our scenario, the signal x(n) is transmitted by the DDE in {8, 4, 2} times following a uniform pattern. It is easy to see that the larger the γ, the lower is the distortion. However, there exists a tradeoff between the downsampling distortion and the compression rate.
6.2 The pair PDESD and the pair PDEPD
The downsampling distortion for the PDESD and the PDEPD is plotted in Figure 7. However, the conclusions that can be extracted from these results with respect to the accuracy of the proposed analytical model and the behavior of ρ and γ with respect to the downsampling distortion are similar to the ones established in Section 6.1. For the sake of clarity, we compare the downsampling distortion results of the different pairs later in Section 6.4.
6.3 The pair CDESD and the pair CDEPD
The performance of the previous encoderdecoder pairs can be notably improved by conditional transmission at the encoder site. In particular, we study and compare the downsampling distortion of the two design approaches, i.e., the fixed Δ _{ t } design and the variable Δ _{ t } design (with uniform transition probabilities), depicted in Figures 8 and 9, respectively. As in the previous pairs, we compare both the experimental results with the theoretical results. However, in that case, our theoretical results are limited to an approximation rather than the real system performance. Even so, we can observe that the approximations are very accurate for all the different simulations. For the case of CDEPD, the approximation is so close to the system performance that the difference cannot be observed because it is masked by the small amount of noise due to the simulation. For the case of CDESD, the difference is slightly bigger because of the approximation in (55).
Another conclusion is that the downsampling distortion is notably higher for the fixed design. It is because their transition probabilities p _{ t−1,t } are increasing in t, and it facilitates to achieve higher states t in the MC with higher probability (i.e., higher MSE_{ t }’s). On the contrary, the variable design concentrates the states in lower t values.
From a practical point of view, the CDE is simpler if it follows a fixed design since the encoder only needs to know the value of Δ and also it does not need to track the current state t. However, from a computational point of view, the variable approach is simpler since it can be computed analytically, instead of numerically.
6.4 Comparison of the downsampling distortion
Finally, we compare the performance of the different encoderdecoder pairs. Although Figure 10 does not provide any extra information, it allows us to better compare the performance of the different schemes. For the sake of simplicity, we only compare the theoretical results for the case of γ=0.25.
It can be observed that the performance of the DDE and PDE are similar. However, the deterministic encoder works slightly better since it only uses the lowest γ ^{−1} states of the finite MC while PDE uses higher states that are related to higher errors. However, the main disadvantage of the DDE in front of the PDE is its lack of flexibility since the uniform solution is only valid for natural values of γ ^{−1}. Furthermore, the PDE with uniform transition probabilities does not need to track the current state t of the process, and hence, it is simpler.
The big hop in performance is observed for the CDE. This encoder eliminates the transmissions of the samples with the most redundant information. Thus, only the most ‘unpredictable’ samples are transmitted.
7 Conclusions
In this chapter, we have evaluated the performance of different encodingdecoding strategies in order to reduce the number of transmitted samples and hence to decrease the power spent in transmission. We have presented them as an energyefficient solution for the wireless sensor network communication problem. In particular, we define the downsampling distortion function in order to evaluate the performance in terms of the tradeoff between compression rate and distortion at the fusion center of the combination of three downsampling encoders, which are the DDE, the PDE, and the CDE, with two decoders: the SD and the PD.
We have obtained closedform expressions for the pairs DDESD, DDEPD, PDESD, and PDEPD and accurate approximations for CDESD and CDEPD. Moreover, we have proposed two strategies in order to design the threshold of the condition in the CDE, i.e., the fixed threshold design and the variable threshold design.
The simulation results validate our theoretical results. Furthermore, we have compared the performance of the different pairs and showed the impact of taking into account the signal model in the encodingdecoding process. Hence, the pair CDEPD (with variable threshold design) outperforms by far the rest of the studied strategies. However, extending the CDE analysis for higher order AR models or even for other timecorrelated signal models remains as an open problem.
Endnotes
^{a} Notation. Boldface uppercase letters denote matrices, boldface lowercase letters denote column vectors, and italics denote scalars. (·)^{T},(·)^{∗},(·)^{H} denote transpose, complex conjugate, and conjugate transpose (Hermitian), respectively. [X]_{ i,j } and [x]_{ i } are the (i th, j th) element of matrix X and the i th position of vector x, respectively. [X]_{ i } denotes the i th column of X. · is the absolute value. ∥a∥ represents the Euclidean norm of a. Let $\xe2$ refer to the estimated value of variable a. $\mathbb{E}[\xb7]$ is the statistical expectation. Function erf(·) represents the error function.
^{b} The conditional variance of a continuous random variable X given the condition Y=y is defined as $\text{var}\left(X\rightY=y)=\mathbb{E}[{X}^{2}Y=y]=\underset{\infty}{\overset{\infty}{\int}}{x}^{2}f\left(X\rightY=y)\mathit{\text{dx}}$, where f(XY=y) is the conditional pdf of X given Y=y.
^{c} It comes from the definition of the cumulative density function of a Gaussian variable such that ${\int}_{\infty}^{a}f\left(x\right)\mathit{\text{dx}}=\frac{1}{2}\left(1+\text{erf}\left(\frac{a}{\sqrt{2{\sigma}_{a}^{2}}}\right)\right)$.
References
 1.
Toh CK: Maximum battery life routing to support ubiquitous mobile computing in wireless ad hoc networks. Commun. Mag., IEEE 2001, 39(6):138147. 10.1109/35.925682
 2.
Younis O, Fahmy S: HEED: a hybrid, energyefficient, distributed clustering approach for ad hoc sensor networks. Mobile Comput. IEEE Trans 2004, 3(4):366379. 10.1109/TMC.2004.41
 3.
Mudumbai R, Brown D, Madhow U, Poor H: Distributed transmit beamforming: challenges and recent progress. Commun. Mag., IEEE 2009, 47(2):102110.
 4.
Zarifi K, Zaidi S, Affes S, Ghrayeb A: A distributed amplifyandforward beamforming technique in wireless sensor networks. Signal Process., IEEE Trans 2011, 59(8):36573674.
 5.
Pradhan S, Kusuma J, Ramchandran K: Distributed compression in a dense microsensor network. Signal Process. Mag., IEEE 2002, 19(2):5160. 10.1109/79.985684
 6.
Sun N, Wu J: Optimum sampling in spatialtemporally correlated wireless sensor networks. EURASIP J. Wireless Commun. Netw 2013, 2013: 5. 10.1186/1687149920135
 7.
Neuhoff D, Gilbert R: Causal source codes. Inf. Theory, IEEE Trans 1982, 28(5):701713. 10.1109/TIT.1982.1056552
 8.
Weissman T, Merhav N: On causal source codes with side information. Inf. Theory, IEEE Trans 2005, 51(11):40034013. 10.1109/TIT.2005.856978
 9.
Derpich M: Improved upper bounds to the causal quadratic ratedistortion function for Gaussian stationary. Inf. Theory, IEEE Trans 2012, 58(99):31313152.
 10.
Viswanathan H, Berger T: Sequential coding of correlated sources. Inf. Theory, IEEE Trans 2000, 46: 236246. 10.1109/18.817521
 11.
Zamir R, Kochman Y, Erez U: Achieving the Gaussian rate distortion function by prediction. Inf. Theory, IEEE Trans 2008, 54(7):33543364.
 12.
O’Neal JB, Deltamodulation quantizing noiseanalytic and computer simulation results for Gaussian and television input signals: Bell Syst. Tech. J. 1971, 45: 117141.
 13.
Protonotarios EN: Slope overload noise in differential pulse code modulation systems. Bell Syst. Tech. J 1967, 46: 21192161.
 14.
Cover TM, Thomas JA: Elements on Information Theory. New York: Wiley; 1991.
 15.
O’Neal JB: Signaltoquantizatingnoise ratio for differential PCM. IEEE Trans. Commun. Technol 1971, 19: 568570. 10.1109/TCOM.1971.1090668
 16.
Farvardin N, Modestino J: Ratedistortion performance of DPCM schemes for autoregressive sources. Inf. Theory, IEEE Trans 1985, 31(3):402418. 10.1109/TIT.1985.1057040
 17.
Guleryuz O, Orchard M: On the DPCM compression of Gaussian autoregressive sequences. Inf. Theory, IEEE Trans 2001, 47(3):945956. 10.1109/18.915650
 18.
Rugin R, Conti A, Mazzini G: Experimental investigation of the energy consumption for wireless sensor network with centralized data collection scheme. In Proceedings of the 15th International Conference on Software, Telecommunications and Computer Networks, 2007. SoftCOM 2007. SplitDubrovnik; 27–29 Sept 2007:15.
 19.
Wang Q: Traffic analysis, modeling and their applications in energyconstrained wireless sensor networks: on network optimization and anomaly detection. (Mid Sweden University, 2010) . Accessed 15 July 2012 http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva10690
 20.
Hashimoto T, Arimoto S: On the ratedistortion function for the nonstationary Gaussian autoregressive process. Inf. Theory, IEEE Trans 1980, 26(4):478480.
 21.
Haykin S: Adaptive Filter Theory. Upper Saddle River: Prentice Hall; 2001.
 22.
BarceloLlado J, Morell A, SecoGranados G: Enhanced correlation estimators for distributed source coding in large wireless sensor networks. IEEE Sensors J 2012, 12(9):27992806.
 23.
Rosenbaum S: Moments of a truncated bivariate normal distribution. J. R. Stat. Soc. Ser B (Methodological) 1961, 23(2):405408.
 24.
Manjunath BG, Wilhelm S: Moments calculation for the double truncated multivariate normal density (Social Science Research Network 2009). . Accessed 20 Aug 2012 http://dx.doi.org/10.2139/ssrn.1472153
Acknowledgements
This work is supported by the Spanish Government under project TEC201128219 and the Catalan Government under grant 2009 SGR 298.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
BarcelóLladó, J.E., Morell, A. & SecoGranados, G. Conditional downsampling for energyefficient communications in wireless sensor networks. EURASIP J. Adv. Signal Process. 2013, 101 (2013). https://doi.org/10.1186/168761802013101
Received:
Accepted:
Published:
Keywords
 Mean Square Error
 Wireless Sensor Network
 Fusion Center
 Observation Vector
 Rate Distortion