A variable step-size strategy for distributed estimation over adaptive networks

A lot of work has been done recently to develop algorithms that utilize the distributed structure of an ad hoc wireless sensor network to estimate a certain parameter of interest. One such algorithm is called diffusion least-mean squares (DLMS). This algorithm estimates the parameter of interest using the cooperation between neighboring sensors within the network. The present work proposes an improvement on the DLMS algorithm by using a variable step-size LMS (VSSLMS) algorithm. In this work, first, the well-known variants of VSSLMS algorithms are compared with each other in order to select the most suitable algorithm which provides the best trade-off between performance and complexity. Second, the detailed convergence and steady-state analyses of the selected VSSLMS algorithm are performed. Finally, extensive simulations are carried out to test the robustness of the proposed algorithm under different scenarios. Moreover, the simulation results are found to corroborate the theoretical findings very well.


Introduction
The advent of ad-hoc wireless sensor networks has created renewed interest in distributed computing and opened up new avenues for research in the areas of estimation and tracking of parameters of interest where a robust, scalable, and low-cost solution is required.To illustrate this point clearly, consider a set of N sensor nodes spread over a wide geographic area, as shown in Figure 1.Each node takes sensor measurements at every sampling instant.The goal here is to estimate a certain unknown parameter of interest using these sensed measurements.In a centralized network, all nodes transmit their readings to a fusion center where the processing takes place.However, this system is prone to any failure of the fusion center.Furthermore, large amounts of energy and communication resources may be required for the complete signal processing task between the center and the entire network to be successfully carried out.These resources needed could become considerable, as the distance between the nodes and the center increases [1].
An ad hoc network, on the other hand, depends on distributed processing with the nodes communicating only with their neighbors and the processing taking place at the nodes themselves.As no hierarchical structure is involved, any node failure would not result in the failure of the entire network.Extensive research has been done in the field of consensus-based distributed signal processing and resulted in a variety of algorithms [2].
In this work, a brief overview of the virtues and limitations of these algorithms [3][4][5][6][7][8][9][10] is conducted, thus providing the background against which our contribution is justified.Incremental algorithms organize the network in a Hamiltonian cycle [3].The estimation is completed by passing the estimate from node to node, improving the accuracy of the estimate with each new set of data per node.This algorithm is termed the incremental least mean squares (ILMS) algorithm as it uses the LMS algorithm [11] with incremental steps within each iteration [3].A general incremental stochastic gradient descent algorithm was developed in [4], for which [3] can be considered a special case.These algorithms are heavily dependent on the Hamiltonian cycle and in case of a node failure, a new cycle has to be initiated.The problem of reconstructing the cycle is non-deterministic polynomialtime hard [12].A quantized version of the ILMS algorithm was suggested in [5].A fully distributed algorithm was proposed in [6], called diffusion least mean squares http://asp.eurasipjournals.com/content/2013/1/135(DLMS) algorithm.In the DLMS algorithm, neighbor nodes share their estimates in order to improve the overall performance; see also [7].Ultimately, this algorithm is robust to node failure as the network is not dependent on any single node.On the other hand, the authors in [8] introduced a scheme to adapt the combination weights at each iteration for each node instead of having as fixed weights for the shared data.In this case, the performance is improved if more weight is given to the estimates of the neighbor nodes that are providing more improvement per iteration.
All the previously mentioned algorithms are generally based on a non-constrained optimization technique, except in [8] which uses constrained optimization to adapt the weights only.However, the authors in [9] use the constraint that all nodes converge to the same steady-state estimate to derive the distributed LMS algorithm.This algorithm is unfortunately hierarchical, thus making it complex and not completely distributed.To remedy this situation, a fully distributed algorithm based on the same constraint was suggested in [10].The algorithms in [9,10] have been shown to be robust to inter-sensor noise, a property not possessed by the diffusion-based algorithms.However, it has been shown in [7] that diffusion-based algorithms perform better in general.
All the above-mentioned algorithms use a fixed stepsize.In general, the step-size is kept the same for all nodes.As a result, the nodes with better signal-to-noise ratio (SNR) may converge quicker and provide a reasonably good performance.However, the nodes with poor SNR will not provide similar performance despite cooperation from neighboring nodes.As a result, a distributed algorithm may provide improvement in average performance but individually, some nodes will still not be performing similarly to the other nodes.One solution for this problem was provided by [8].The work in [8] provides a computationally complex method to improve the performance of the DLMS algorithm.For every iteration, an error correlation matrix is formed for each node.A decision is made based on this decision as to the weights of the neighbors.Thus, the combiner weights are adapted at every iteration according to the performance of the neighbors of each node.Simulation results from [8] have shown slight improvement in the performance, but this has been achieved at the cost of high computational complexity.
In comparison, a much simpler solution was suggested in [13], using a variable step-size LMS algorithm.This resulted in the variable step-size diffusion LMS (VSSDLMS) algorithm.Preliminary results showed remarkable improvement in performance at a reasonably low computational cost.The idea is to vary the step-size at each iteration based on the error performance.Each node will alter its step-size according to its own individual performance.As a result, not only the average performance improves remarkably but the individual performances of the nodes also get better.
A different diffusion-based algorithm was proposed in [14] using the recursive least squares (RLS) algorithm to obtain the diffusion RLS (DRLS) algorithm.This DRLS algorithm provided exceptional results in both speed and performance.Another RLS-based distributed estimation algorithm has been studied in [15,16].The latter algorithm is hierarchical in nature, which makes its complexity higher than that of the DRLS algorithm.The RLS algorithm is inherently far more complex compared with the LMS algorithm.In this work, it is shown that despite the LMS algorithm being inferior to the RLS algorithm, using a variable step-size allows the LMS algorithm to achieve performance very close to that of the RLS algorithm.
In order to achieve better performance, various other algorithms have been proposed in the literature, such as in [17][18][19].The works in [17,18] propose distributed Kalman filtering algorithms that provide efficient solutions for several applications.A survey of distributed particle filtering is provided in [19].This work takes a look at several solutions proposed for nonlinear distributed estimation.However, the focus of this work is primarily on LMSbased algorithms, so these algorithms will not be included in any further discussion.
Our work extends that of [13] and investigates in detail the performance of the VSSDLMS algorithm.Here, the most popular variable step-size LMS (VSSLMS) algorithms are first investigated and compared with each other.Based on the best complexity-performance tradeoff, one variant of the VSSLMS algorithms is chosen for the proposed algorithm.Next, the incorporation of the selected VSSLMS algorithm in the diffusion scheme is described, and complete convergence and steady-state analyses are carried out.The stability of the algorithm is also analyzed.Finally, some simulations are carried out to first determine which of the various selected VSSLMS algorithms provide the best trade-off between performance and complexity, and then to compare the proposed http://asp.eurasipjournals.com/content/2013/1/135algorithm with similar existing algorithms.Note that the performance of the proposed algorithm is assessed under different conditions.Interestingly, the theoretical findings are found to corroborate the simulation results very well.Moreover, a sensitivity analysis is performed on the proposed algorithm.
The paper is organized as follows.Section 2 describes the problem statement and briefly introduces the DLMS algorithm.Section 3 incorporates the VSSLMS algorithm into the DLMS algorithm, and then complete convergence and stability analyses are carried out.Simulation results are given in Section 4 followed by a thorough discussion of the results.Finally, Section 5 concludes the work.
Notation.Boldface letters are used for vectors/matrices and normal font for scalar quantities.Matrices are defined by capital letters and small letters are used for vectors.The notation (.) T stands for transposition operation for vectors and matrices and expectation operation is denoted by E [.].Any other mathematical operators that have been used will be defined as they are introduced in the paper.

Problem statement
Consider a network of N sensor nodes deployed over a geographical area for estimating an unknown parameter vector w o of size (M × 1), as shown in Figure 1.Each node k has access to a time realization of a known regressor vector u k (i) of size (1 × M) and a scalar measurement d k (i) that are related by where v k (i) is a spatially uncorrelated zero-mean additive white Gaussian noise with variance σ 2 v k and i denotes the time index.The measurements, d k (i) and u k (i), are used to generate an estimate, w k (i) of size (M × 1), of the unknown vector w o .Assuming that each node cooperates only with its neighbors, then each node k has access to updates w l (i), from its N k neighbor nodes at every time instant i, where l ∈ N k l =k , in addition to its own estimate, w k (i).Two different schemes have been introduced in the literature for the diffusion algorithm.The adapt-thencombine (ATC) scheme [7] first updates the local estimate using the adaptive algorithm used and then the intermediate estimates from the neighbors are fused together.The second scheme, called combine-then-adapt (CTA) [6], reverses the order.It is found that the ATC scheme outperforms the CTA scheme [7] and therefore, this work uses the ATC scheme.
The objective of the adaptive algorithm is to minimize the following global cost function given by ( 2 ) The steepest descent algorithm is given as where and μ is the step-size.The recursion defined in (3) requires full knowledge of the statistics of the entire network.A more practical solution utilizes the distributive nature of the network.The work in [6] gives a fully distributed solution, which has been modified and improved in [7].The update equation is divided into two steps.The first step performs the adaptation, while the second step combines the intermediate updates from neighboring nodes.The resulting scheme is called adapt-thencombine or ATC.Using the ATC scheme, the diffusion LMS algorithm is given as where k (i + 1) is the intermediate update, μ k is the stepsize for node k, and c lk is the weight connecting node k to its neighboring node l ∈ N k , where N k includes node k, and c lk = 1.Further details on the formation of the weights c lk can be found in [6,7].

Variable step-size diffusion LMS algorithm
The VSSLMS algorithms show marked improvement over the LMS algorithm at a low computational complexity [20][21][22][23][24][25].Therefore, this variation is inserted in the distributed algorithm to inherit the improved performance of the VSSLMS algorithm.Different variations have their own advantages and disadvantages.A complex step-size adaptation algorithm would not be suitable because of the physical limitations of the sensor node.As shown in [23], the algorithm proposed by [20] shows the best performance as well as having low complexity.Therefore, it is well suited for this application.A further comparison of performance of these variants in the present scenario confirm our choice of the VSSLMS algorithm.
The proposed algorithm simply incorporates the VSSLMS algorithm into the diffusion scheme given by (4).Using a VSSLMS algorithm, the step-size will also become a variable in this system of equations defining the proposed distributed algorithm.Then the VSSDLMS algorithm is governed by the following: where f [μ k (i)] is the step-size adaptation function that is defined using the VSSLMS adaptation given in [20] where the update equation is given by where Since nodes exchange data amongst themselves, their current update will then be affected by the weighted average of the previous estimates.Therefore, to account for this inter-node dependence, it is suitable to study the performance of the whole network.Hence, some new variables need to be introduced and the local ones are transformed into global variables as follows: From these new variables, a completely new set of equations representing the entire network is formed, starting with the relation between the measurements where Similarly, the update equations can be remodeled to represent the entire network where G = C⊗I M ; C is an N ×N weighting matrix, where {C} lk = c lk ; ⊗ is the Kronecker product; D(i) is the diagonal step-size matrix; and the error energy matrix, E(i), is given by Considering the above set of equations, the mean and mean-square analyses and the steady-state behavior of the VSSDLMS algorithm are carried out as shown next.The mean analysis considers the stability of the algorithm and derives a bound for the step-size which would guarantee convergence.The mean-square analysis also derives transient and steady-state expressions for the mean square deviation (MSD) and the excess mean square error (EMSE).The MSD is defined as the error in the estimate of the unknown vector.The weight-error vector for node k is given by then the MSD can be simply defined as Similarly, the EMSE is derived from the error equation as follows: which can be solved further to get the following expression for the EMSE: where R k is the autocorrelation matrix for node k.

Mean analysis
To begin with, let us introduce the global weight-error vector defined in [6,26] as Since Gw (o) = w (o) , by incorporating the global weighterror vector into (8), we get Here we use the assumption that the step-size matrix D(i) is independent of the regressor matrix U(i) [20].Accordingly, for small values of γ in (6), the following relation holds true asymptotically where E U T (i)U(i) = R U is the auto-correklation matrix of U(i), and taking the expectation on both sides of ( 14) gives where the expectation of the second term of the righthand side of ( 14) is 0 since the measurement noise is spatially uncorrelated with the regressor and zero-mean, as explained earlier.
From ( 16), we see that for stability in the mean we must have Since G comes from C and we know that GB 2 ≤ G 2 .B 2 , we can safely infer that Since there is already a condition that C 2 = 1 and for noncooperative schemes, we have (G = I MN ), we can safely conclude that So we can see that the cooperation mode only enhances the stability of the system (for further details, refer to [6,7]).Since stability is also dependent on the step-size, then the algorithm will be stable in the mean if which holds true if the mean of the step-size is governed by where λ max R u,k is the maximum eigenvalue of the autocorrelation matrix R u,k .This scenario is different from that of the fixed step-size as in this case where the system is stable only when the mean of the step-size is within the limits defined by (20).

Mean-square analysis
In this section, the mean-square analysis of the VSS-DLMS algorithm is investigated.Here, the weighted norm has been used instead of the regular norm.The motivation behind using a weighted norm stems from the fact that even though the MSD does not require a weighted norm, the evaluation of the EMSE depends on a weighted norm.In order to accommodate both these measures, a general analysis is conducted using a weighted norm, where a weighting matrix is replaced by an identity matrix for the case of MSD, where a weighting matrix is not required [26].
We take the weighted norm of ( 14) and then apply the expectation operator to both of its sides.This yields the following: where Using the data independence assumption [26] and applying the expectation operator gives For ease of notation, we denote E ˆ = for the remaining analysis.

Mean-square analysis for Gaussian data
The evaluation of the expectations in ( 24) is quite tedious for non-Gaussian data.Therefore, it is assumed here that the data is Gaussian in order to evaluate (24).The autocorrelation matrix can be decomposed as R U = T T T , where is a diagonal matrix containing the eigenvalues for the entire network and T is a matrix containing the eigenvectors corresponding to these eigenvalues.Using this eigenvalue decomposition, we define the following relations where the input regressors are considered independent of each other at each node and the step-size matrix D(i) is block-diagonal, so it does not transform since T T T = I.Using these relations, ( 21) and ( 24) can be rewritten, respectively, as and where It can be seen that E ŪT (i) Ū(i) = .Also, using the bvec operator [27], we have σ = bvec ¯ , where the bvec operator divides the matrix into smaller blocks and then applies the vec operator to each of the smaller blocks.Now, let R v = v I M denote the block diagonal noise covariance matrix for the entire network, where denotes the block Kronecker product [27] and v is a diagonal noise variance matrix for the network.Hence, the second term of the right-hand side of ( 25) is where b (26) remains to be evaluated.Using the step-size independence assumption and the operator, we get bvec E ŪT (i) where we have from [6] and each matrix A k is given by where k defines the diagonal eigenvalue matrix and λ k is the eigenvalue vector for node k.

The output of the matrix E [D(i) D(i)] can be written as
(31) Now applying the bvec operator to the weighting matrix ¯ using the relation bvec ¯ = σ , where we can get back the original ¯ through bvec σ = ¯ , we get bvec where Then (21) will take on the following form: which characterizes the transient behavior of the network.Although (34) does not explicitly show the effect of the variable step (VSS) algorithm on the network's performance, this effect is in fact subsumed in the weighting matrix, F(i) which varies for each iteration, unlike in the fixed step-size LMS algorithm where the analysis shows that this weighting matrix remains fixed at all iterations.Also, (33) clearly shows the effect of the VSS algorithm on the performance of the algorithm through the presence of the diagonal step-size matrix D(i).

Learning behavior of the proposed algorithm
In this section, the learning behavior (which shows how the algorithm evolves with time) of the VSSDLMS algorithm is evaluated.Starting with w0 = w (o) and D 0 = μ 0 I MN , we have for iteration (i + 1) then incorporating the above relations in (34) gives Now, after subtracting the results of iteration i from those of iteration (i + 1) and simplifying them, we get where which can be defined iteratively as In order to evaluate the MSD and EMSE, we need to define the corresponding weighting matrix for each of them.Taking σ = (1/N) bvec {I MN } = q η and η(i) = (1/N) E w(i) 2 for the MSD, we get 2 , the EMSE behavior is governed by The relations in (46) and (47) govern the transient behavior of the MSD and EMSE of the proposed algorithm.These relations show how the effect on the proposed algorithm's transient behavior of the weighting matrix varies from one iteration to the next as the weighting matrix itself varies at each iteration.This is not the case in the simple fixed step-size DLMS in [6] where the weighting matrix remains constant for all iterations.Since the weighting matrix depends on the step-size matrix, which becomes very small asymptotically, then both the norm and influence of the weighting matrix also become asymptotically small.From the above relations, it is seen that both the MSD and EMSE become very small at steady-state because the weighting matrix itself becomes small at steady-state and these relations will then depend only on the product of the weighting matrices at each iteration.

Steady-state analysis
From the second relation in (8), it is seen that the stepsize for each node is independent of the data received from other nodes.Even though the connectivity matrix, G, does not permit the weighting matrix, F(i), to be evaluated separately for each node, this is not the case for the determination of the step-size at any node.Here, we define the misadjustment as the ratio of the EMSE to the minimum mean square error.The misadjustment value is used in determining the steady-state performance of the algorithm [11].Therefore, taking the approach of [20], we first find the misadjustment, as given by Then solving (36) and (37) along with (48) leads to the steady-state values for the step-size and its square for each node Mathews [22] Mayyas [19] Costa [20] Kwong [18] Figure 2 MSD for various VSSLMS algorithms applied to the diffusion scheme.
Incorporating these two steady-state relations in (33) yields the steady-state weighting matrix as where D ss = diag μ ss,k I M .Thus, the steady-state mean-square behavior is given by E wss and σ = λ ζ , respectively, as in ( 46) and (47).This gives us the steady-state values for MSD and EMSE as follows: The above two steady-state relationships depend on the steady-state weighting matrix which becomes very small at steady-state, as explained before.As a result, the steadystate results for the proposed algorithms become very small compared to those for the fixed step-size DLMS algorithm.

Numerical results
In this section, several simulation scenarios are considered and discussed to assess the performance of the proposed VSSDLMS algorithm.Results have been conducted for different average SNR values.The performance measure used throughout these simulations is the MSD.The length of the unknown vector is taken as M = 4.The size of the network is N = 20 nodes.The sensors are randomly placed in an area of 1 unit square.The input regressor vector is assumed to be white Gaussian with auto-correlation matrix having the same variance for all nodes.Results averaged over 100 experiments are shown for the SNR value of 20 dB, a normalized communication range of 0.3, and a Gaussian environment.
First, the discussed that VSSLMS algorithms are compared with each other for the case of SNR 20 dB and the results of this comparison are reported in Figure 2. As can be depicted from Figure 2, the algorithm of [20] performs the best and is therefore chosen for our proposed VSSDLMS algorithm.
The sensitivity analysis for the selected VSSDLMS algorithm operating in the scenario explained above is discussed next.Since the VSSDLMS algorithm depends upon the choice of α and γ , these values are varied to check their effect on the performance of the algorithm.As can be seen from Figure 3, the performance of the VSSDLMS algorithm degrades as α gets larger.Similarly, the performance of the proposed algorithm improves as γ increases as depicted in Figure 4.This investigation therefore allows for a proper choice of α and γ to be made.In order to show the importance of varying the step-size, two experiments were run separately.In the first experiment, the DLMS algorithm was simulated with a high step-size while the initial value for the proposed algorithm was kept both low and high.In the second experiment, the step-size of the DLMS algorithm was changed to a low value.As can be seen from Figures 5 and 6, the proposed algorithm converges to the same error floor for both scenarios.However, the DLMS algorithm converges fast but at a higher error floor in Figure 5.The low value of stepsize results in the DLMS algorithm converging at the same error floor as the proposed algorithm but very slowly.Thus, the proposed algorithm provides fast convergence as well as better performance.
Next, the proposed algorithm is compared with some key existing algorithms, which are the no-cooperation LMS, the distributed LMS [10], the DLMS [6], the DLMS with adaptive combiners (DLMSAC) [8] and the DRLS [14].Figure 7 reports the performance behavior of these different algorithms.As can be seen from this figure, the performance of the proposed VSSDLMS algorithm is second only that of the DRLS algorithm.However, the gap in performance is narrow.These results show that when compared with other algorithms of similar complexity, the VSSDLMS algorithm exhibits a significant improvement in performance.A similar performance in the steady-state behavior of the proposed VSSDLMS algorithm is obtained as shown in Figure 8.As expected,  the DRLS algorithm performs better than all other algorithms included in this comparison, but the proposed algorithm remains second only to the DRLS algorithm in the steady-state mode.Also, the diffusion process can be appropriately viewed as an efficient and indirect way of adjusting the step-size in all neighboring nodes, which resulted in keeping the steady-state MSD for all nodes nearly the same for all cases.
Next, the comparison between the results predicted by the theoretical analysis of the proposed algorithm and the simulation results is reported in Figures 9 and 10.As can be seen from these figures, the simulation analysis corroborates the theoretical findings very well.This is done for a network of 15 nodes with M = 2 and a communication range of 0.35.Two values for α, namely α = 0.995 and α = 0.95, are chosen whereas γ = 0.001.
An important aspect of working with sensor nodes is the possibility of a node switching off.In such a case the network may be required to adapt itself.The diffusion scheme is robust to such a change, and this scenario has been considered here and results are shown in Figure 11.A network of 50 nodes is chosen so that enough nodes can be switched off in order to study the performance of the proposed algorithm in this scenario.Two cases are considered.In the first case, 15 nodes are turned off after 50 iterations and then a further 15 nodes are switched off after 300 iterations.In the second case, 15 nodes are switched off after 250 iterations and the next 15 nodes are switched off after 750 iterations.In both cases, the performance degrades initially but recovers to give a similar performance to the case where there are no nodes being switched off.The difference between the best and worst case scenarios is only about 2 dB.For the DLMS algorithm, however, the performance is worse off the earlier the nodes are switched off.The difference between the best and worst case scenarios is almost 9 dB, which further enhances the robustness of the proposed algorithm.
Finally, the comparison between the theoretical and simulated steady-state values for the MSD and EMSE for two different input regressor auto-correlation matrices is given in Table 1.As can be seen from this table, a close agreement between theory and simulations has been obtained.

Figure 1
Figure 1 Adaptive network of N nodes.

Figure 5 Figure 6
Figure 5 Comparison with the DLMS algorithm having a high step-size.

Figure 11
Figure 11 Robustness of algorithm at SNR = 20 dB.