Greedy selection of sensors with measurements under correlated noise

We address the sensor selection problem where linear measurements under correlated noise are gathered at the selected nodes to estimate the unknown parameter. Since finding the best subset of sensor nodes that minimizes the estimation error requires a prohibitive computational cost especially for a large number of nodes, we propose a greedy selection algorithm that uses the log-determinant of the inverse estimation error covariance matrix as the metric to be maximized. We further manipulate the metric by employing the QR and LU factorizations to derive a simple analytic rule which enables an efficient selection of one node at each iteration in a greedy manner. We also make a complexity analysis of the proposed algorithm and compare with different selection methods, leading to a competitive complexity of the proposed algorithm. For performance evaluation, we conduct numerical experiments using randomly generated measurements under correlated noise and demonstrate that the proposed algorithm achieves a good estimation accuracy with a reasonable selection complexity as compared with the previous novel selection methods.


Introduction
In wireless sensor networks, a large number of sensor nodes are spatially distributed and transmit their noise-corrupted measurements which are typically formulated by a linear combination of the known observation matrix H and the parameter to be esti- mated.The sensor selection is conducted so as to optimize the estimation accuracy [1][2][3][4][5][6][7]: Convex relaxation [1]-and cross-entropy optimization [2]-based methods were presented with a prohibitive computational complexity for large sensor networks.Noting that greedy approach yields a feasible complexity, the sensor selection problem has been mostly tackled in a greedy manner.To guarantee near-optimality with regard to the mean squared estimation error (MSE), a submodular cost function called the frame potential was devised and a greedy removal method was proposed to find optimal sensor locations [3].A QR factorization-based greedy selection method was proposed to minimize the estimation error [4].To further reduce the complexity of the selection process, the log-determinant of the inverse estimation error covariance matrix was employed as a metric to present simple greedy methods instead of directly minimizing the estimation error [5,6].Specifically, analytic selection rules were established by proving the metric to be a monotone submodular function [5] and by employing the QR factorization [6], respectively.An efficient greedy algorithm was also developed for sensor placement by maximally projecting onto the minimum eigenspace of the dual observation matrix [7].Clearly, the selection of the best subset of p sensor nodes corresponds to the construction of the matrix with the most informative p rows selected from the matrix H.
It should be noticed that most of the selection methods have been derived under the assumption of uncorrelated noise: That is, off-diagonal elements of the noise covariance matrix are assumed to be approximately zero or ignorable.The sensor selection problem in the presence of correlated noise is challenging since the Fisher information (equivalently, the inverse estimation error covariance matrix) is no longer linear in the selected sensors [8][9][10].To solve the problem, various selection algorithms have been previously presented in [8][9][10][11][12].A multi-step sensor selection strategy was proposed for linear dynamic systems under correlated noise [8].Sparsity-aware sensor selection approaches in centralized and distributed versions were developed in the presence of correlated measurements using convex relaxation for the maximum likelihood estimation (MLE) [9].Two selection algorithms were derived for estimation of random parameters to minimize the MSE based on convex relaxation and greedy approach [10], respectively.Recently, a greedy selection algorithm maximizing the log-determinant of the inverse estimation error covariance matrix has been presented to provide an additional increase in estimation accuracy [11,12].Whereas the sensor selection problem has been extensively studied for the additive Gaussian linear measurement model which is considered in this work, it has been also addressed for nonlinear measurement model [13,14].In [13], the sensor selection algorithm for target tracking was proposed based on the extended Kalman filtering for an additive Gaussian nonlinear model.In [14], sensor selection methods for a general nonlinear model were developed with an aid of convex relaxation technique for estimation of unknown parameters.
In this paper, we consider the scenario where a given number of sensor nodes with measurements corrupted by correlated noise are selected to estimate the parameter.We aim to find the subset of sensor nodes that minimizes the estimation error.We first formulate the estimation error covariance matrix as a function of the matrix H S with rows selected from H and the covariance matrix K S of noise samples at the selected nodes.To expedite the selection procedure, we adopt the log-determinant of the inverse estimation error covariance matrix as a metric to be maximized, which allows us to avoid a large matrix inversion.We choose one node at each iteration in a greedy manner until the selection set of the desired cardinality is constructed.To simplify the metric, we factor the matrices H S and K S based on the QR and LU factorizations, respectively, to obtain the upper triangular matrix R and the lower triangular matrix M .We show that the factored matrices R and M can be updated by simply appending the newly computed column or row vector to the last positions of those matrices at the previous iteration, respectively.This simple computation of the key matrices R and M leads to an analytic rule which enables an efficient selection of such a maximizing node at each iteration.We also make a complexity analysis of the proposed algorithm, resulting in a competitive complexity of the same order as that of [4].We evaluate the performance of the proposed method through numerical experiments using randomly generated measurements under correlated noise.We demonstrate the advantage of the proposed technique over previous selection methods in terms of the estimation performance and the selection complexity.
Compared with [4,11,12], we have the new contributions in this paper: • In [4], the MSE is minimized by applying the QR factorization to H S in the assump- tion of uncorrelated noise.However, in the presence of correlated noise, the MSE is not manageable using only the QR factorization.The main contributions are given in two aspects: First, in order to yield a simple selection process, the final selection rule should be expressed as the first term related with the previous iteration plus the second term determined by only the current selection.The LU factorization of the covariance matrix K S is proved to generate the desired expression of the final rule.Second, the overall deri- vation process is completely different from [4] except for the QR factorization of H S .• The only similarity with [11,12] is to optimize the same metric which is the log-determinant of the inverse estimation error covariance matrix.Except for that, the optimization process to derive the proposed selection rule in this paper is totally different from [11,12].Furthermore, the proposed method is shown to operate faster than the method in [11,12] in numerical experiments while preserving the same estimation performance.
This paper is organized as follows.The problem is formulated in Sect. 2 in which the metric is shown to be expressed in terms of the sampled matrix H S , the covariance matrix K S and the statistics of the parameter.In Sect.3, the QR and LU factorizations are employed to simplify the metric and a simple selection criterion is derived.The complexity analysis of the proposed algorithm is provided in Sect.4.1.Extensive experiments are presented in Sect.4.2 and conclusions in Sect. 5.

Problem Formulation
Suppose that N wireless sensors deployed in a sensor field generate the measurements y ∈ R N .We seek to estimate the parameter vector θ ∈ R p with a Gaussian distribu- tion N (0, θ ) by using n(< N ) measurements gathered by the n selected sensors in the set S. We assume that the measurements are corrupted by the additive correlated Gaussian noise w ∼ N (0, K) ∈ R N independent of θ where the covariance matrix K is positive definite and symmetric.We also assume that the measurements are produced by a linear model with the known observation N × p full column rank matrix H with N row vectors We denote the estimator by θ(y S ) where y S is the |S| × 1 column vector with entries of y indexed by S. In this work, we use an optimal Bayesian linear estimator (e.g., maximum a posteriori (MAP) estimator or minimum mean squared error (MMSE) estimator).Note that the MAP estimator is equivalent to the MMSE estimator in our formulation where the parameter and the noise are assumed to be Gaussian.Then, we can derive the estimator θ(y S ) and the estimation error covariance matrix (S) as follows [5,15]: (1) where H S is the matrix with rows of the matrix H indexed by S and K S the |S| × |S| covariance matrix of the noise vector w S with entries of w indexed by S. Notice that the estimation of deterministic and unknown parameters can be also conducted by letting 2) and (3).
In this work, we aim to find the best subset S with |S| = p that maximizes the log-deter- minant of the inverse estimation error covariance matric given by log det (S) −1 which has been employed as a metric for the sensor selection problem in previous work [1,5,11,12] and shown to produce a good estimation performance with a reduced complexity.We also seek to select one node at each iteration in a greedy manner.More specifically, given the set S i of i nodes already selected until the ith iteration, we select one node from the set of the remaining nodes, S C i ≡ (V − S i ) that maximizes the intermediate metric log det (S i+1 ) −1 at the (i + 1) th iteration where S i+1 = S i + {j}, j ∈ S C i .The selection process is expressed by where H S i+1 is simply created from H S i and the jth row vector h ⊤ j selected from H: and K S i+1 is also constructed from K S i and the jth node selected from S C i : where the subscript (i) indicates the number of the node selected at the ith iteration and k ⊤ i+1 denotes the 1 ] and k = k jj .Notice that k (i)j represents the covariance of the noise samples at the (i)th and jth nodes.The process in (4) is performed repeatedly until the selection set S with |S| = p is constructed.

Method: efficient sampling algorithm
In this section, we present an analytic result for a simple selection process by manipulating the matrix H ⊤ S i+1 and the covariance matrix K S i+1 based on the QR and LU factorizations, respectively.Note that the Householder transformation is employed to perform the QR factorization since it has strength in terms of complexity and sensitivity to rounding error in comparison with the Gram-Schmidt orthogonalization [16].First, noting that the covariance matrix K S i+1 is symmetric, we can factor it as follows: where M S i+1 is the (i + 1) × (i + 1) lower triangular matrix.We also do the QR factoriza- tion of H ⊤ S i+1 = Q Ri+1 where Q is the p × p orthogonal matrix with p column vectors q j , j = 1, . . ., p and Ri+1 the p × (i + 1) matrix.Then, we manipulate the metric in (4) to derive a simpler form with assumption of θ = σ 2 θ I p where I p indicates the p × p iden- tity matrix.Specifically, where (8) follows from the notion that det Q det Q ⊤ = 1 .Note that Ri+1 can be written by where R i+1 is the (i + 1) × (i + 1) upper triangular matrix and 0 a×b indicates the a × b zero matrix.Then, plugging ( 9) into (8) yields Hence, we can further simplify the log-determinant in (8) as follows: where (12) follows since the second term in (11) is irrelevant in finding the maximizing node at (i + 1) th iteration.In this paper, noting that det(A) + det(B) ≤ det(A + B) for positive definite matrices A and B [17], we propose to maximize the lower bound of (12) to yield a low-complexity selection process: Specifically, the selection process in (4) is approximated by (8) log det (S i+1 ) −1 = log det 1 where (13) follows from the notion that R i+1 and M S i+1 are positive definite since the covariance matrix K is positive definite and H a full column rank matrix and ( 14) from the irrelevancy of det 1 (13) to the selection at (i + 1) th iteration.In the opti- mization perspective, the selection process in (14) corresponds to the methodology which makes the worst case as good as possible for the estimation of the parameters with equal variances (i.e., θ = σ 2 θ I p ). Notably, the proposed process in ( 14) is equiva- lent to the selection process for the estimation of unknown and deterministic parameters (i.e., θ −1 = 0 in (3)).Later in Sect.4.2, the assumption of θ = σ 2 θ I p is relieved to evaluate the proposed method in numerical experiments.
We prove a theorem which presents a simple criterion to enable an efficient selection of the node that maximizes the log-determinant in ( 14) at each iteration.
Theorem Let the ( i + 1)th column vectors r i+1 of R i+1 and k i+1 of K S i+1 be given by r Then, the node at the (i + 1) th iteration that maxi- mizes the log-determinant in (14) is simply selected from the set S C i consisting of (N − i) remaining nodes: where Proof We continue to simplify the metric in (14) to yield where (16) follows from det(A) = det(A ⊤ ) and det(A −1 ) = 1/ det(A) .In order to com- pute det R i+1 and det M S i+1 , it is noted that the matrices R i+1 and M S i+1 can be sim- ply obtained from those at the previous iteration.Specifically, for each j ∈ S C i we compute K S i+1 from (6) and also construct R i+1 and M S i+1 as follows: Here, r i+1 is given by r where r = q ⊤ i+1 h j .Also, the row vector m ⊤ i+1 = [m ⊤ m] can be easily computed by the LU factorization of the symmetric matrix K S i+1 : Then, we have from (19) Therefore, the log-determinant in ( 16) can be further simplified as follows: where (22) follows since the first term in ( 21) is irrelevant in finding the maximizing node at the ( i + 1)th iteration.
It should be noted that since M S i is lower triangular, m in (20) can be directly obtained by the forward substitution without computing the inverse of M S i .However, (20) requires i floating-point divisions repeated over (N − i) remaining nodes at the (i + 1) th iteration.Further, the division operation typically spends much more time than the multiplication.Hence, we avoid those divisions by computing m as follows: where M −1 S i+1 can be easily computed from M −1 S i without inversion of large matrices by using Note that (24) can be verified by multiplying M −1 S i+1 by M S i+1 in (18).It is obvious that (23) and (24) runs faster than (20) since (23) without divisions is repeated over ( N − i ) remaining nodes and (24) with one division is conducted once at each iteration.
Initially, q 1 = h j / � h j � , K 1 = jj with k = 0 for j ∈ V .Then we have r = q ⊤ 1 h j =� h j � and m 2 = k with m = 0 and the first node is selected as follows: Once the first node is selected, we update M −1 S i+1 by (24) (e.g., M −1

Complexity analysis of proposed algorithm
Given the covariance matrix K of the correlated noise vector w in (1), the proposed algo- rithm aims to construct the best set S * by conducting two main tasks.First, computa- tion of the last column vector r i+1 of R i+1 is performed by using the QR factorization of H ⊤ S i+1 = Q Ri+1 .Specifically, given Q and R i at the previous iteration, the last entry of r i+1 is computed for each h j , j ∈ S C i .Once the maximizing node is selected, Q is updated for the next iteration by the Householder transformation.The second is to compute m i+1 by using (23) given M −1 S i and k i+1 for each node j ∈ S C i .After the selection of the maximizing node, M −1 S i+1 is updated for the next iteration by (24).Since these two tasks are repeated |S| − 1 times, the proposed algorithm yields the complexity of O(Np|S| 2 ) which is the same complexity order as that of [4].
We evaluate three previous selection methods for the performance comparison.First, we consider the sensor selection method denoted by greedy sensor selection (GSS) [5] which seeks to maximize the log-determinant of the inverse error covariance matrix (S) −1 .GSS assumed uncorrelated noise (equivalently, K = I N ) and the metric is given by log det (S) −1 ≈ log det H ⊤ S H S under the assumption of θ −1 = 0 .Secondly, We compare with the method denoted by greedy sensor selection based on QR factorization (GSS-QR) in [4] which has been developed by applying the QR factorization of the matrix H ⊤ S in the MSE in (3).Unlike GSS, it aims to directly minimize the MSE(S) with K = I N .For this case, the MSE(S) is given by tr is assumed.Thirdly, we evaluate the method denoted by greedy sensor selection under correlated noise (GSS-CN) [11,12] which maximizes the same metric as the proposed algorithm.In Table 1, the complexity and the metrics of the selection methods are summarized for comparison.As seen, the proposed algorithm, GSS-QR and GSS-CN offer the same complexity order for the case of |S| = p .In the following section, the esti- mation performance of the methods is experimentally investigated for the case of |S| = p .

Experimental results
In this section, we conduct the extensive experiments to validate the proposed sensor selection algorithm by comparing with various selection methods in three different cases as follows: • Case 1: Random matrix H with Gaussian iid entries, h ij ∼ N (0, 0.1 2 ).
• Case 2: Random matrix H with Bernoulli iid entries, h ij which take binary values (0 or 1) with the probability 0.5.• Case 3: Random matrix H and covariance matrix K generated by using the linear reduced-order modeling [11,12].
In Case 1 and 2, we generate 50 different realizations for each type of H ∈ R N ×p with N = 1000 and p = 10, 15, . . ., 40 .For each realization of H , we generate 100 test sam- ples of the parameter vector θ and the correlated noise w drawn from N (0, θ ) and N (0, K) , respectively.The covariance matrix K is constructed from random data gener- ated from uniform distribution over [0 1] .In Case 3, 100 measurement vectors y ∈ R 1000 are generated from the normal distribution N (0, 1) and H and K are con- structed by using the linear reduced-order modeling with the order equal to p = |S| (see [11,12] for the details).From the matrices H and , we construct the set S with the car- dinality |S| = p by using the various sensor selection methods such as GSS, GSS-QR, GSS-CN and the proposed algorithm.We then select the measurements y i , i ∈ S from the measurement vector y and estimate the parameter vector θ by employing the opti- mal linear estimator in (2).For evaluation of the estimation performance, we compare the MSEs of the selection methods given by E �θ − θ � 2 �θ � 2 .

Performance evaluation with respect to parameter dimension
In this experiment, we generate the parameters by assuming θ = σ 2 θ I p , σ θ = 0.1 and run the selection methods in order to construct the sets S with |S| = p by varying the param- eter dimension p = 10, 15, . . ., 40 and compare the MSEs which are plotted in Figs. 1, 2  and 3.It is noted that the proposed method shows better performance than GSS and GSS-QR since unlike the methods, the proposed algorithm takes into account the correlation of

Method
Optimality criteria Decomposition Operation count GSS-CN [11,12] log det the noise to construct the set S. As expected, the proposed algorithm and GSS-CN yield identical MSEs because both of them optimize the same metric.Notably, as the parameter dimension increases, the estimation accuracy becomes improved because the optimal linear estimator using more measurements can suppress the effect of correlated noise more efficiently.Furthermore, we investigate the estimation performance of the methods with p = |S| = 50 for different signal-to-noise ratios (SNRs) by varying the standard deviation of the parameter, σ θ from 0.1 to 0.5.In Fig. 4, the MSEs are demonstrated in Case 1 (ran- dom Gaussian matrix H ). Obviously, the selection methods yield more accurate estimation θ I p and generate the parameters from the non-diagonal covariance matrix θ con- structed by using random data uniformly distributed over [0 1] .Notice that the p diag- onal entries of the covariance matrix θ take much higher values than σ 2 θ = 0.1 2 used in the experiment for the case of θ = σ 2 θ I p .The MSEs are provided in figure 5, showing a robust estimation performance of the proposed method for the case of the non-diagonal covariance matrix.The complexity of the selection methods is experimentally examined by measuring the execution time in second.Figure 6 plots the execution times of the methods in Case 1 (random Gaussian matrix H ) with respect to the parameter dimension p = |S| = 10, . . ., 40 .The proposed method offers a competitive complexity as compared with GSS-QR and GSS-CN.Specifically, the proposed method operates about twice as fast as GSS-CN in all three cases.

Conclusions
In a situation where the measurements at sensor nodes are contaminated by correlated noise, we considered a problem of finding the best set of sensor nodes which minimizes the estimation error computed by the signal values at the selected nodes.Instead of directly minimizing the estimation error, we focused on maximizing the log-determinant of the inverse estimation error covariance matrix and applied the QR and LU factorizations so as to present a simple analytic selection rule.We also provided a complexity analysis of the proposed algorithm to reveal a competitive complexity when compared with the previous novel selection methods.We finally investigated the performance of the proposed algorithm in various cases in terms of estimation performance and execution time and demonstrated that the proposed algorithm yields more accurate estimation with a reasonable complexity than the previous methods.

S 1 =
1/m ) and Q by the QR factorization.The updated matrices M −1 S i+1 and Q are used at the next iteration.In what follows, the proposed selection algorithm is briefly explained.

Fig. 1 Fig. 2
Fig. 1 Evaluation of estimation performance in Case 1 (random Gaussian matrix): the proposed is compared with different selection methods with σ θ = 0.1 by varying the dimension of the parameter, p = |S|

Fig. 3 Fig. 4
Fig. 3 Evaluation of estimation performance in Case 3 (linear reduced-order model): the proposed algorithm is compared with different selection methods by varying the dimension of the parameter, p = |S|

Fig. 5 Fig. 6
Fig. 5 Evaluation of estimation performance for non-diagonal covariance matrix of the parameter in Case 1 (random Gaussian matrix): the proposed algorithm is compared with different selection methods by varying the dimension of the parameter, p = |S|