Distributed Event-Region Detection in Wireless Sensor Networks

We propose a graph-based method for distributed event-region detection in a wireless sensor network (WSN). The proposed method is developed by exploiting the fact that the true events at geographically neighboring sensors have a statistical dependency in an event-region detection scenario. This spatial dependence amongst the sensors is modeled using graphical models (GMs) and serves as a regularization term to enhance the detection accuracy. The method involves solving a linear system of equations, which can be readily implemented in a distributed fashion. Numerical results are presented to illustrate the performance of our proposed approach.


INTRODUCTION
With the emergence of low-cost and low-power sensors capable of limited computation and communication, the potential applications of WSNs for physical environment monitoring have become well appreciated and received much attention over the past few years [1][2][3][4][5]. In this paper, we focus on one particular class of environment surveillance problems: determining the event regions in an environment from the sensors' noisy observations. Such a problem arises in many scenarios. For example, as part of a building safety system, a WSN may be used to monitor hot spots and smoke. Also, using a WSN to sense the concentration of some chemical, we need to identify which regions have a chemical concentration greater than some threshold.
Consider a WSN composed of N geographically distributed sensor nodes, each sensor makes K noisy observations of its local signal values: x n (k) = μ n β n + w n (k), k = 1, . . . , K, where x n denotes the nth sensor's measurements, w n denotes the zero mean independent and identically distributed (i.i.d.) Gaussian noise, β n has a binary value with β n = 1 indicating event (signal) presence, and β n = 0 indicating event (signal) absence at sensor n, and we have μ n (0) = 0, signal absence, μ n (1) = θ n , θ n is the unknown nonzero signal.
The above model allows for space-varying signal values, that is, θ n can be dissimilar at different sensors. This corresponds to practical scenarios where the signal levels, such as the chemical concentration, vary across the event-region. The above formulation of event-region detection differs from the traditional distributed detection problem [3,4] in two aspects. Firstly, the probability distributions of the sensor observations are usually assumed known a priori in [3,4], while this is not the case for the event-region detection problem because the signal {θ n } is generally unknown. Secondly, the objective of event-region detection is to identify the locations where event occurs in a sensor network environment. This is different from previous detection techniques [3,4] that are developed for hypothesis testing of global phenomena. A simple approach for event-region detection is to let the sensors make their decisions based only on their own measurements. This can be solved by the generalized likelihood 2 EURASIP Journal on Advances in Signal Processing ratio test (GLRT). The local one-sided (suppose θ n > 0) GLRT at each sensor is given as [6,7] where x n (1/K) K k=1 x n (k), I A (x n ) is the indicator function whose value is equal to 1 if x n ∈ A and 0 otherwise, and we have The threshold τ GLRT can be determined based on a specified probability of false alarm P FA , and such a choice of τ GLRT is independent of {θ n } [6,7]. This approach, albeit simple, ignores the dependence among neighboring sensors. In practice, for a densely distributed sensor network environment, an event-region usually spans across an area which includes a certain number of sensors, and so does a nonevent region. Hence the true event indicator values, {β n }, of neighboring sensors are statistically dependent. By utilizing this spatial dependence, it is expected that we can remove most of the sporadic decision errors (false alarms and misses) caused by the noise and faulty measurements of unreliable sensors. Previous works on distributed event-region detection include [6,8]. The work [6] models the distributed observations as a random field with a Markovian dependence structure and proposed an iterative method. Another work [8] introduced a Bayesian decision algorithm based on local decisions from neighboring sensors to identify the faulty measurements. It requires a precise knowledge of the sensor fault probability, which may not be available in practice.
In this paper, we use graphical models (GMs) to model the spatial dependence amongst the sensors. GM, like Markov random fields (MRFs), provides a natural framework to represent the statistical dependency amongst a set of variables by means of a graph [9]. It has been widely employed in WSN applications, for example, [10][11][12][13][14]. Since the true event indicators {β n }, as mentioned previously in eventregion detection scenarios, are locally dependent, they can be modeled by a locally connected GM, in which only spatially neighboring sensors are connected by nonzero weighted edges. This encoded spatial dependence by GM serves as a regularization term to smooth the local GLRT decisions such that the final decisions, to some extent, match the expectation that geographically adjacent sensors generally should have similar decisions. We formulate the event-region detection as an optimization problem which involves solving a linear system of equations. Because of the locally connected structure of the GM, solving the linear equations admits a simple distributed implementation by using iterative matrix inverse techniques such as the Richardson iteration. The resulting implementation scheme only requires that each sensor exchanges data within its neighbors and thus is energy and bandwidth efficient.

PROPOSED DISTRIBUTED EVENT-REGION DETECTION APPROACH
We model the WSN as an undirected graph G = (V , E) whose vertices V = {1, 2, . . . , N} are the sensors and whose edges E = {e i, j } represent the connections between any two sensors. Each edge of the graph, joining vertices i and j, is assigned a weight g i, j = g j,i ≥ 0 to measure the statistical dependency between these two sensors. To capture the statistical dependency amongst geographically adjacent sensors, we only set nonzero weights to the edges connecting neighboring vertices (sensors); otherwise they are set to zero. We choose g i, j as (see [15] for a detailed discussion on the construction of a weighted graph) where d i, j denotes the Euclidean distance between vertices (sensors) i and j, mNN represents the m nearest neighbors in terms of Euclidean distance, φ and m are parameters of user choice that will be discussed later. We collect all the weights, {g i, j }, and form an N × N symmetric weight matrix G.

Graph-based decision-dependent regularization term
The statistical dependency amongst the neighboring sensors is measured by the weight matrix. It can serve as a regularization to update the initial estimates. We now discuss the construction of this regularization term. Consider a scalar function f [ f 1 f 2 · · · f N ] T defined on the set of ver- where f i corresponds to vertex i. A natural way to measure how much the vector f varies from our expected dependency amongst the neighboring vertices (sensors) is by the following quantity: where (a) follows from the fact that where L D − G is the so-called graph Laplacian matrix [16]. It can be readily observed that L is symmetric positive semidefinite and it has one null eigenvalue associated with the eigenvector 1, where 1 is the column vector with all unity elements. Clearly, the smaller the value in (6), the better the vector f matches the statistical dependency amongst the neighboring sensors, and vice versa.

Hard decision regularization
We can use the regularization term defined in (6) to smooth T denote the regularized decisions, then we can formulate the estimation of β r as the following constrained optimization problem: where, as indicated before, the first term serves as the regularization term to account for the spatial dependence; λ is a positive coefficient controlling the participation degree whose choice will be discussed later; the second term represents the distance between the two vectors β and β r , which should be minimized along with the regularization term. Clearly, this optimization is essentially a tradeoff between smoothing the decisions (to match our defined statistical dependency) and fitting the data. Note that the spatial smoothing effect can be easily observed from the fact that the regularization term has a minimal value, that is, zero, when the decisions at all sensors are identical, whatever they are ones or zeros. The optimization, therefore, penalizes isolated decisions that are different from their neighbors. Since decision errors (false alarms and misses) caused by noise and unreliable sensors usually occur in an independent and sporadic way, the optimization helps suppress false alarms and enhance eventregion detection. Note that the above constrained optimization problem is NP-hard. To make it tractable, we relax β r to take on real values. The real-valued solution β r can be obtained by solving the following equation: where I denotes the identity matrix and This real-valued solution, obviously, will not satisfy the constraint β r,n ∈ {0, 1}. Nevertheless, a splitting point (also called threshold), τ R , can be employed to transform this realvalued solution into a discrete form, that is, We will discuss the determination of τ R in the later part of this paper.

Soft decision regularization
For the case where the noise is i.i.d. across all the network and the signal values inside the event-regions are constant, that is, θ n = θ, ∀{n | β n = 1}, the sample mean value, x n (1/K) K k=1 x n (k), can be regarded as local soft decision at sensor n since a larger value of x n indicates a higher possibility of event presence. Hence we can further extend our approach by replacing β in (9) with the sample mean vector T . To simplify the exposition, here we allow an abuse of notation β r to denote the regularized soft decision vector which allows for real values. It can be solved as and consequently, The soft decision vector can also be transformed into hard decisions by using a threshold. As compared with the hard decision-based regularization, the soft decision-based method is able to provide a better performance since, for the hard decision case, some information about the observed data is lost after the 1-bit quantization, that is, computing the local GLRT decisions.

Choice of parameters
In this section, we discuss the choice of the parameters related to our proposed method. The parameters m and φ are used to quantify the statistical dependency among geographically adjacent sensors, in which m defines the degree to which the statistical dependency extends, and φ controls the values of the weights. Generally speaking, we can choose m from 1 to 4 according to the network topology. If the sensors are densely deployed in a 2D plane, a larger value such as m = 3 or m = 4 may capture the local statistical dependency better; if the sensors are placed along a line, then a smaller value such as m = 1 or m = 2 could be more appropriate. Such a choice of m indicates that one sensor is statistically correlated with its closest sensors, which is generally true for most event-region detection scenarios where the sensors are densely distributed. As for φ, it can be chosen to guarantee a < g i, j ≤ 1 for any nonzero g i, j , where a can be set 0.5 or 1/e. Simulation results show that our proposed method is not sensitive to m and φ as long as they are set in the above ranges.
The parameter λ controls the participation degree of the spatial smoothing term in the optimization. Clearly, a too small λ may not provide a sufficient involvement to suppress the false alarms. On the other hand, since 1 is the eigenvector of L associated with the smallest (zero) eigenvalue, a too large λ admits an excessive spatial smoothing effect that has the tendency to make the decisions homogeneous. Therefore, an appropriate λ is desirable to our method. Generally speaking, the choice of λ is dependent on the signal values, and the 4 EURASIP Journal on Advances in Signal Processing noise variance, and so forth. In practice, we may obtain some coarse information about the signal, noise, and the eventregion from a small subset of sensor observations, which can help us to determine an appropriate λ by a calibration procedure. The calibration procedure is also used in [6] for parameter choices. Generally speaking, a set of "training data" are randomly generated by simulating noisy realizations of a "calibration field." Note that the calibration field is not the true field to be detected but a field constructed from the coarse information about the signal, noise, and the eventregion obtained from a small subset of sensor observations of the true field (see [6, Section VI.D] for details). Using the training data generated on the calibration field, we can determine an appropriate λ by minimizing the detection errors.
We discuss the determination of the threshold τ R used to discretize the real-valued vector β r . Unlike the GLRT approach, since the real-valued vector β r is obtained from β r = (λL + I) −1 β or β r = (λL + I) −1 x, the resulting entries { β r,n } are correlated and the joint distribution of { β r,n } is dependent on the events. In this case, even if we have knowledge of the event-region and its related signal values, deriving an analytical expression for τ to satisfy a specified false alarm probability P FA is difficult. In practice, training data can be used to help determine τ R . Assume we have training data , where β i denotes the local GLRT decision and β i corresponds to the true event indicator value of sensor i. We compute the corresponding regularized decision vector β r . With the knowledge of the true event indicator values, we can easily find the threshold τ R to satisfy a specified false alarm probability on the training data. The above discussion applies to soft decision case by simply replacing the local GLRT decisions { β i } with the sample mean {x i }. We note that using training data has its disadvantages, for example, the accuracy of τ R is affected by the capability of the training data in capturing the true field. In practice, the training data should have a good representation of the true field, for example, the number of event-regions and their corresponding sizes.

DISTRIBUTED IMPLEMENTATION
We now discuss the distributed implementation of our proposed method. Considering WSNs with a fusion center (FC), we assume that the FC has knowledge of all sensors geographical locations by GPS or some other localization techniques. Therefore, it can compute the weight matrix G and consequently, the matrix (λL + I) −1 in advance and store the computation results. We can have each sensor report its local GLRT decision { β n }, along with its sensor index n, directly to the FC. The FC then computes β r = (λL+I) −1 β, which can be turned into hard decisions by using the estimated threshold τ. However, this implementation scheme may be impractical for the soft decision-based approach (13) since it requires each sensor to send its real-valued data to the FC, which can be quite bandwidth-and power-consuming. Another feasible implementation, like [6,8], is to let the sensors in the environment organize themselves and make decisions. This scheme is described as follows.
Note that both (10) and (14) involve the inverse of the sparse, positive definite matrix A (λL + I). Hence iterative matrix techniques that are readily implemented in a distributed fashion can be used to compute the exact closedform solution. Here we employ the modified Richardson iteration [18,19] to solve the linear equations (10) and (14). The Richardson iteration for (10) and (14) is given by respectively, where ω > 0 is a parameter that has to be chosen such that ω < 2/ρ(A), and ρ(A) denotes the spectral radius of A. This iteration results in a sequence { β r (k) } that finally converges to the correct solution. The proof of the convergence of the Richardson iteration can be found in [18,19] or on the website. 1 Note that for each row of A, its nonzero entries only occur for those j ∈ N i , where N i { j | j is among mNN of i or i is among mNN of j} denotes the neighborhood of sensor i. Thus the update equations at each sensor can be written as respectively, where a i, j denotes the (i, j)th element of A. From (16), we can see that at every iteration, each sensor only requires the data from its neighborhood for the update.
We summarize the implementation steps of this scheme as follows.
(1) For each sensor i ∈ V , we randomly generate an initial estimate β (0) r,i .
(2) At iteration k + 1 (k = 0, 1, . . . ), each sensor broadcasts its estimate β (k) r,i , along with its sensor index i, to its neighborhood N i , and in the mean time, it collects the data from its neighborhood sensors; based on the received data, each sensor updates its estimate according to (16).
(3) Stop if some preset convergence condition is satisfied; otherwise go to Step 2. (4) Each sensor makes its decision based on the final estimate β r,i and the specified threshold τ R .
As we can see, this implementation scheme is parallel, involves communication only among neighboring sensors, and therefore consumes minimal communication energy. This makes it applicable to WSN applications where power and communication are of concern. Moreover, the   convergence rate of the Richardson iteration can be controlled by the parameter ω. Specifically, it is determined by max i (|1 − ωμ i |), where {μ i } are eigenvalues of A. We can maximize the convergence rate by choosing ω ∈ (0, 2/ρ(A)) to minimize max i (|1 − ωμ i |). Besides the modified Richardson technique discussed here, there are some other iterative algorithms, for example, [12,13], to solve the linear equation (14). These algorithms, as the modified Richardson iteration, admit distributed implementation and, furthermore, they may provide faster convergence rate and are more robust to the transmission errors amongst sensors.

NUMERICAL RESULTS
We present simulation results to illustrate the performance of our proposed algorithm. Similarly as [6], we consider a WSN consisting of N = 300 sensors randomly distributed on a 20 m × 20 m grid with 1 m uniform spacing. For each sensor, it makes K local noisy observations: {x n (k)} K k=1 , the noise w n is i.i.d. Gaussian distributed with zero mean and variance σ 2 w = 0.5. The following results are obtained by simulating the distributed implementation scheme discussed in Section 3, in which we assume ideal data transmission amongst sensors. Experiments show that convergence can be achieved within a few tens of iterations. We note that although, in practice, the data transmission amongst sensors 6 EURASIP Journal on Advances in Signal Processing  suffers from errors because of data quantization and channel noise, more sophisticated distributed matrix techniques [12,13], as mentioned in last section, can be employed to enhance the robustness to the transmission errors amongst sensors.

Results of hard decision regularization
We consider a field containing two event-regions as shown in Figure 1. We have μ n (1) = 1 for those sensors {n} in the rectangle event-region, and μ n (1) = 0.8 for those sensors {n} in the circular event-region. In our simulations, the parameters φ and m in (5) are set 2 and 4, respectively. The coefficient λ controlling the spatial smoothing effect is chosen to be 1, 5, and 10, respectively. The local GLRT decisions { β n } are determined via (3), where the threshold τ GLRT is chosen to satisfy that the false alarm probability is 0.05, that is, P GLRT FA = 0.05. From β we can compute a regularized decision vector β r . The threshold τ R used for binary decisions of β r is chosen such that P R FA = P GLRT FA = 0.05, where P R FA denotes the false alarm probability of the proposed regularization method. To overcome the difficulty we mentioned previously (see Section 2.4) in obtaining τ R , we use the knowledge of the true event indicator {β n } to help determine τ R to achieve the specified P R FA . Figure 2 shows the miss probabilities as functions of K for the local GLRT approach and for our proposed method under different choices of λ. The results are averaged over 500 independent runs. We observe that, as compared with the local GLRT approach, our proposed method is effective in reducing the miss probabilities under different choices of λ, especially when the number of observations K is small. We also see that an appropriate choice of λ should be related to the signal-to-noise ratio (SNR), it is favorable to choose a large λ for a low SNR while a small λ for a high SNR (note that a large K has the effect of improving SNR and vice versa).  Let λ = 1 and K = 3, we plot the miss probability versus the false alarm probability in Figure 3, where each point on the curves corresponds to a value of (P FA , P M ) for a given threshold. Note that, when plotting the figure, since our method does not provide an explicit expression in determining a threshold to obtain a specified P FA , we just choose a set of thresholds and compute the (P FA , P M ) associated with each threshold. From the figure, we see that our proposed algorithm presents a clear performance advantage over the GLRT.

Results of soft decision regularization
We now consider the soft decision version of our proposed algorithm associated with the optimization (13). The field we test here contains a rectangle event-region with μ n (1) = 1; see Figure 4. We set λ = 5 and K = 5. Figure 5 shows one realization of the averaged noisy observations {x n } and its corresponding real-valued soft decisions { β rn } as functions of the sensor locations, where {x n } and { β rn } are proportionally scaled to [0, 1], respectively, and we use the grey levels to linearly represent the magnitudes of the scaled values (the larger the value, the darker the point). It can be clearly seen that the potential sporadic false alarms have successfully been suppressed, whereas the event-region is intensified. To further investigate the performance, we plot the miss probability versus false alarm probability in Figure 6. For our method, as we did for the hard decision case, we choose a set of thresholds and compute the (P FA , P M ) associated with every threshold. We see that our method presents a clear performance advantage over the GLRT.

CONCLUSION
We have proposed a new method for distributed event-region detection in WSNs, where the spatial dependence amongst neighboring sensors is modeled using the GMs and serves as a regularization. The method admits an energy and bandwidth efficient distributed implementation. Numerical simulation results show that our proposed method presents a clear performance advantage over the local GLRT and is effective in improving detection accuracy.