Skip to content

Advertisement

  • Research
  • Open Access

Variable forgetting factor mechanisms for diffusion recursive least squares algorithm in sensor networks

  • 1,
  • 1Email author,
  • 1 and
  • 2
EURASIP Journal on Advances in Signal Processing20172017:57

https://doi.org/10.1186/s13634-017-0490-z

Received: 9 November 2016

Accepted: 6 July 2017

Published: 15 August 2017

Abstract

In this work, we present low-complexity variable forgetting factor (VFF) techniques for diffusion recursive least squares (DRLS) algorithms. Particularly, we propose low-complexity VFF-DRLS algorithms for distributed parameter and spectrum estimation in sensor networks. For the proposed algorithms, they can adjust the forgetting factor automatically according to the posteriori error signal. We develop detailed analyses in terms of mean and mean square performance for the proposed algorithms and derive mathematical expressions for the mean square deviation (MSD) and the excess mean square error (EMSE). The simulation results show that the proposed low-complexity VFF-DRLS algorithms achieve superior performance to the existing DRLS algorithm with fixed forgetting factor when applied to scenarios of distributed parameter and spectrum estimation. Besides, the simulation results also demonstrate a good match for our proposed analytical expressions.

Keywords

  • Sensor networks
  • Distributed parameter estimation
  • Distributed spectrum estimation
  • Diffusion recursive least-squares
  • Variable forgetting factor

1 Introduction

Distributed estimation is commonly utilized for distributed data processing over sensor networks, which exhibits increased robustness, flexibility, and system efficiency compared to centralized processing. Owing to these merits, distributed estimation has received more and more attention and been widely used in applications ranging from environmental monitoring [1], medical data collecting for healthcare [2], animal tracking in agriculture [1], monitoring physical phenomena [3], localizing moving mobile terminals [4, 5] to national security. Particularly, distributed estimation technique relies on the cooperation among geographically spread sensor nodes to process locally collected data. With different cooperation strategies employed, distributed estimation algorithms can be classified into the incremental type and the diffusion type. Note that we consider the diffusion cooperation strategy in this paper since the incremental strategy requires the definition of a path through the network and may be not suitable for large networks or dynamic configurations [6, 7]. Many distributed estimation algorithms with the diffusion strategy have been put forward recently, such as diffusion least-mean squares (LMS) [8, 9], diffusion sparse LMS [1012], variable step size diffusion LMS (VSS-DLMS) [13, 14], diffusion recursive least squares (RLS) [6, 7], distributed sparse RLS [15], distributed sparse total least squares (LS) [16], diffusion information theoretic learning (ITL) [17], and the diffusion-based algorithm for distributed censor regression [18]. Among assorted distributed estimation algorithms, the RLS-based algorithms achieve superior performance to the LMS-based ones by inheriting the advantages of fast convergence and low steady-state misadjustment from the RLS technique. Thus, the distributed estimation algorithms based on the diffusion strategy and the RLS adaptive technique are investigated in this paper.

However, the existing RLS-based distributed estimation algorithms provide a fixed forgetting factor, which has some drawbacks. With a fixed forgetting factor, the algorithm fails to keep up with real-time variations in environment, such as variations in sensor network topology. Moreover, it is expected to adjust the forgetting factors automatically according to the estimation errors rather than choose appropriate values for them through simulations. There have been several studies on variable forgetting factor (VFF) methods. Specifically, the classic gradient-based VFF (GVFF) mechanism was proposed in [19], and most of the existing VFF mechanisms are extensions of this method [2024]. Nevertheless, the GVFF mechanism requires a large amount of computation. In order to reduce the computational complexity, the improved low-complexity VFF mechanisms have been reported in [25, 26]. To the best of our knowledge, the existing VFF mechanisms are mostly employed in a centralized context and have not been considered in the field of distributed estimation yet.

In this work, the previously reported VFF mechanisms [25, 26] are employed to the diffusion RLS algorithms for distributed signal processing applications, by simplifying the inverse relation between the forgetting factor and the adaptation component to provide lower computational complexity. The resulting algorithms are referred to as low-complexity time-averaged VFF diffusion RLS (LTVFF-DRLS) algorithm and low-complexity correlated time-averaged VFF diffusion RLS (LCTVFF-DRLS) algorithm, respectively. Compared with the GVFF mechanisms, the proposed LTVFF and LCTVFF mechanisms can reduce the computational complexity significantly [25, 26]. Then, we carry out the analysis for the proposed algorithms in terms of the mean and mean square error performance. Finally, we provide simulation results to verify the effectiveness of the proposed algorithms when applied in distributed parameter estimation and distributed spectrum estimation.

Our main contributions are summarized as follows:
  1. 1)

    We propose the low-complexity VFF-DRLS algorithms for distributed estimation in sensor networks. To the best of our knowledge, the VFF mechanisms have not been considered in the distributed estimation algorithms yet.

     
  2. 2)

    We study the mean and mean square performance for the proposed algorithms in a general case, and provide the transient analysis for a specialized case. Specifically, for the general case, in terms of the mean performance, we show that the mean value of the weight error vector approaches zero as iteration numbers go to infinity, which implies the asymptotical convergence of the proposed algorithms; from the perspective of mean square performance, we derive the mathematical expressions for the steady-state MSD and EMSE values. In the specialized case, we study the transient analysis by focusing on the learning curve and prove that the proposed algorithms are convergent and the convergence rate is related to the varying forgetting factors.

     
  3. 3)

    We perform simulations to evaluate the performance of the proposed algorithms when applied to distributed parameter estimation and distributed spectrum estimation tasks. The simulation results indicate that the proposed algorithms exhibit remarkable improvements in convergence and steady-state performance when compared with the DRLS algorithm that has a fixed forgetting factor. Besides, effectiveness of our analytical expressions for calculating the steady-state MSD and EMSE is verified by the simulation results. In addition, we also provided detailed simulation results regarding the choice of the parameters in the proposed algorithms to help with the parameter selection in practice.

     

This paper is organized as follows. Section 2 provides the system model for the distributed estimation over sensor networks. Besides, the DRLS algorithm with the fixed forgetting factor is described briefly. In Section 3, two low-complexity VFF mechanisms are presented, followed by the analyses for the variable forgetting factor in terms of steady-state statistical properties. Besides, the proposed LTVFF-DRLS algorithm and the LCTVFF-DRLS algorithm are presented. In the last part of this section, the computational complexity of the VFF mechanisms as well as the proposed algorithms is analyzed. In Section 4, detailed analyses based on mean and mean-square performance for the proposed algorithms are carried out and analytical expressions to compute MSD and EMSE are derived. In addition, transient analysis for a specialized case is provided in the last part of Section 4. In Section 5, simulation results are presented for distributed parameter estimation and distributed spectrum estimation. Section 6 draws the conclusions.

Notation: Boldface letters are used for vectors or matrices, while normal font for scalar quantities. Matrices are denoted by capital letters and small letters are used for vectors. We use the operator row {·} to denote a row vector, col {·} to denote a column vector, and diag {·} to denote a diagonal matrix. The operator E[·] stands for the expectation of some quantity, and Tr {·} represents the trace of a matrix. We use (·) T and (·)−1 to denote the transpose and inverse operator, respectively, and (·) for complex conjugate-transposition. We also use the symbol I n to represent an identity matrix of size n and \(\mathbf {\mathbb {I}}\) to denote a vector of appropriate size with all elements equal to one.

2 System model and diffusion-based DRLS algorithm

In this section, we first illustrate the system model for the distributed estimation over sensor networks. Following this, we review the conventional DRLS algorithm with the fixed forgetting factor briefly.

2.1 System model

Let us consider a sensor network consisting of N sensor nodes which are spatially distributed over a geographical area. The set of nodes connected to node k including itself are called the neighbor nodes of node k, denoted by \(\mathcal {N}_{k}\). The number of the nodes linked to node k is the degree of k, denoted by n k . The system model for the distributed estimation over sensor networks is presented in Fig. 1.
Figure 1
Fig. 1

System model

At each time instant i, each node k has access to complex valued time realizations {d k,i ,u k,i }, k=1,2,…,N, i=1,2,…, with d k,i a scalar measurement and u k,i a M×1 input vector. The relation between the measurement d k,i and the input vector u k,i can be characterized as
$$ d_{k,i}=\mathbf{u}_{k,i}^{*}\mathbf{w}^{o}+v_{k,i} $$
(1)

where w o is the unknown optimal weight vector of size M×1, and v k,i is zero-mean additive white Gaussian noise with variance \({\sigma }_{v,k}^{2}\). Particularly, we assume that the noise variance has been determined in advance somehow. We also assume that the noise samples v k,i , k=1,2,…,N, i=1,2,…, are independent of each other as well as the input vectors u k,i . We aim to estimate the unknown optimal weight vector w o in a distributed manner. That is, each sensor node k obtains a local estimate w k,i of size M×1 to approach the optimal weight vector w o as much as possible. To this end, each node k not only uses its local measurement d k,i and input vector u k,i but also cooperates with its closest neighbors for updating its local estimate w k,i . Specifically, by cooperation, each node k has access to its neighbors’ data {d l,i ,u l,i } and estimates w l,i at each time instant i where \(l\in \mathcal {N}_{k}\), and then, each node k fuses all the available information to update its local estimate ψ k,i .

Let us first introduce some vectors and matrices. At each time instant i, by collecting all nodes’ measurements into vector y i , noise samples into vector v i (both of length N), and input vectors into the matrix H i of size N×M, we obtain
$$ \begin{aligned} \mathbf{y}_{i}&=\text{col}\{d_{1,i} \ldots d_{N,i}\}\\ \mathbf{H}_{i}&=\text{col}\{\mathbf{u}_{1,i}^{*} \ldots \mathbf{u}_{N,i}^{*}\}\\ \mathbf{v}_{i}&=\text{col}\{v_{1,i} \ldots v_{N,i}\}. \end{aligned} $$
(2)
Following this, we define the covariance matrix of the noise vector v i as
$$ \mathbf{R}_{v}=E[\mathbf{v}_{i}\mathbf{v}_{i}^{*}]= \text{diag}\left\{ {\sigma}_{v,1}^{2},{\sigma}_{v,2}^{2},\ldots,{\sigma}_{v,N}^{2} \right\}. $$
(3)
Next, we stack y i , v i and H i from time instant 0 to time instant i into matrices respectively, which are given by
$$ \begin{aligned} \mathbf{\mathcal{Y}}_{i}&=\text{col}\{\mathbf{y}_{i} \ldots \mathbf{y}_{0}\}\\ \mathbf{\mathcal{H}}_{i}&=\text{col}\{\mathbf{H}_{i} \ldots \mathbf{H}_{0}\}\\ \mathbf{\mathcal{V}}_{i}&=\text{col}\{\mathbf{v}_{i} \ldots \mathbf{v}_{0}\}. \end{aligned} $$
(4)

Besides, we define \(\mathbf {\mathcal {R}}_{v,i}=E[\mathbf {\mathcal {V}}_{i}\mathbf {\mathcal {V}}_{i}^{*}]\).

2.2 Brief review of diffusion-based DRLS algorithm

In this part, we give a brief introduction to the diffusion-based DRLS algorithm [6, 7].

For the diffusion-based DRLS algorithm, the local optimization problem to estimate the optimal weight vector w o at each node k can be formulated as follows:
$$ \mathbf{\psi}_{k,i}=\mathop{\arg}\mathop{\min}_{\mathbf{w}} \left\{\|\mathbf{w}\|_{\mathbf{\Pi}_{i}}^{2}+\|\mathbf{\mathcal{Y}}_{i}- \mathbf{\mathcal{H}}_{i}\mathbf{w}\|_{\mathbf{\mathcal{W}}_{k,i}}^{2}\right\} $$
(5)

Note that the notation \(\|\mathbf {a}\|_{\boldsymbol {\Sigma }}^{2}=\mathbf {a}^{*}\boldsymbol {\Sigma }\mathbf {a}\) represents the weighted vector norm of any positive definite Hermitian matrix Σ. Besides, the matrix Π i is given by Π i =λ i+1 Π where 0λ<1 representing the forgetting factor and Π=δ −1 I M with δ>0. Furthermore, the matrix \(\boldsymbol {\mathcal {W}}_{k,i}\) can be expressed as \(\boldsymbol {\mathcal {W}}_{k,i}=\boldsymbol {\mathcal {R}}_{v,i}^{-1}\boldsymbol {\Lambda }_{i}\text {diag}\{\mathbf {C}_{k},\mathbf {C}_{k},\ldots,\mathbf {C}_{k}\}\), where Λ i =diag{I N ,λ I N ,…,λ i I N } and C k is a diagonal matrix. It is worth noting that the main diagonal elements of the matrix C k is composed of the kth column of matrix C. Particularly, the matrix C is the adaptation matrix for the diffusion-based DRLS algorithm and satisfies \(\mathbf {\mathbb {I}}^{T}\mathbf {C}=\mathbf {\mathbb {I}}\) and \(\mathbf {C}\mathbf {\mathbb {I}}=\mathbf {\mathbb {I}}\) [6]. Also, note that the matrix C is a doubly stochastic matrix, that is, both a left stochastic matrix and a right stochastic matrix.

The optimization problem (5) can be rewritten as follows [6]:
$$ {{\begin{aligned} \boldsymbol{\psi}_{k,i}=\mathop{\arg}\mathop{\min}_{\mathbf{w}}\left\{\lambda^{i+1}\|\mathbf{w}\|_{\boldsymbol{\Pi}}^{2}+\sum\limits_{j=0}^{i}\lambda^{i-j}\sum\limits_{l=1}^{N}\frac{C_{l,k}|d_{l,j}-\mathbf{u}_{l,j}^{*}\mathbf{w}|^{2}}{{\sigma}_{v,l}^{2}}\right\} \end{aligned}}} $$
(6)
where C l,k represents the (l,k)th element of the matrix C. The closed-form solution to (6) is given by [6, 7]
$$ \boldsymbol{\psi}_{k,i}=\mathbf{P}_{k,i}\boldsymbol{\mathcal{H}}_{i}^{*}\boldsymbol{\mathcal{W}}_{k,i}\boldsymbol{\mathcal{Y}}_{i} $$
(7)
where P k,i can be expressed as
$$ \mathbf{P}_{k,i}=\left[\lambda^{i+1}\boldsymbol{\Pi}+\boldsymbol{\mathcal{H}}_{i}^{*}\boldsymbol{\mathcal{W}}_{k,i}\boldsymbol{\mathcal{H}}_{i}\right]^{-1}. $$
(8)

However, the closed-form solution in (7) is obtained via calculating the inversion of matrices, which requires large computation. Instead, the diffusion-based DRLS algorithm provides a recursive approach to solve (6), which can be implemented by the following two steps.

Step 1: Let us take the updates at time instant i for example. Note that we denote the iteration number at time instant i as the superscript (·) l with l=0 representing the initial value. At the very start, we initialize the intermediate local estimate ψ k,i and the inverse matrix P k,i for each node k by utilizing the updated results from time instant i−1, that is
$$ \begin{aligned} \boldsymbol{\psi}_{k,i}^{0}&=\mathbf{w}_{k,i-1}\\ \mathbf{P}_{k,i}^{0}&=\lambda^{-1}\mathbf{P}_{k,i-1} \end{aligned} $$
(9)
Then, for each node k, its data is updated incrementally among its neighbors, which is given by
$$\begin{array}{@{}rcl@{}} \boldsymbol{\psi}_{k,i}^{l}&{\longleftarrow}&\boldsymbol{\psi}_{k,i}^{l-1}+ \frac{C_{l,k}\mathbf{P}_{k,i}^{l-1}\mathbf{u}_{l,i}\left[d_{l,i}-\mathbf{u}_{l,i}^{*}\boldsymbol{\psi}_{k,i}^{l-1}\right]} {\sigma_{v,l}^{2}+C_{l,k}\mathbf{u}_{l,i}^{*}\mathbf{P}_{k,i}^{l-1}\mathbf{u}_{l,i}} \end{array} $$
(10)
$$\begin{array}{@{}rcl@{}} \mathbf{P}_{k,i}^{l}&{\longleftarrow}&\mathbf{P}_{k,i}^{l-1}- \frac{C_{l,k}\mathbf{P}_{k,i}^{l-1}\mathbf{u}_{l,i}\mathbf{u}_{l,i}^{*}\mathbf{P}_{k,i}^{l-1}}{\sigma_{v,l}^{2}+C_{l,k}\mathbf{u}_{l,i}^{*}\mathbf{P}_{k,i}^{l-1}\mathbf{u}_{l,i}} \end{array} $$
(11)
where the left arrow denotes the operation of assignment. Finally, each node k obtains its ultimate intermediate local estimate ψ k,i which can be expressed as
$$\begin{array}{@{}rcl@{}} \boldsymbol{\psi}_{k,i}&\longleftarrow&\boldsymbol{\psi}_{k,i}^{|\mathcal{N}_{k}|} \end{array} $$
(12)
Step 2: Each node k combines the ultimate intermediate local estimate of its own, i.e., ψ k,i , obtained in step 1 with that of its neighbors, i.e., ψ l,i , \(l\in \mathcal {N}_{k}\) by performing the following diffusion to obtain the local estimate w k,i :
$$ \boldsymbol{w}_{k,i}=\sum\limits_{l=1}^{N}A_{l,k}\boldsymbol{\psi}_{l,i} $$
(13)

where A l,k denotes the (l,k)th element of the matrix A. Particularly, the matrix A is the combination matrix for the diffusion-based DRLS algorithm and is chosen such that \(\mathbf {\mathbb {I}}^{T}\mathbf {A}=\mathbf {\mathbb {I}}\) [6].

Note that the steps (9)–(13) constitute the diffusion-based DRLS algorithm [6, 7].

3 Low-complexity variable forgetting factor mechanisms

In this section, we introduce the LTVFF mechanism and the LCTVFF mechanism that are employed by our proposed algorithms. Particularly, the analyses for the variable forgetting factor in terms of the steady-state properties of the first-order statistics are presented, and the LTVFF-DRLS algorithm that employs the LTVFF mechanism as well as the LCTVFF-DRLS algorithm that applies the LCTVFF mechanism are proposed. In the last part of this section, we analyze the computational complexity for these two VFF mechanisms as well as the proposed algorithms.

3.1 LTVFF mechanism

Motivated by the VSS mechanism [13, 14] for the diffusion LMS algorithm, the low-complexity VFF mechanisms are designed such that smaller forgetting factors are employed when the estimation errors are large in order to obtain a faster convergence speed, whereas the forgetting factor increases when the estimation errors become small so as to yield better steady-state performance. Based on the above idea, an effective rule to adapt the forgetting factor can be formulated as
$$ \lambda_{k}(i)=[1-\zeta_{k}(i)]_{\lambda_{-}}^{\lambda_{+}} $$
(14)

where the quantity ζ k (i) is related to the estimation errors and varies in an inverse way to the forgetting factor, which is referred to as the adaptation component. The operator \([\cdot ]_{\lambda _{-}}^{\lambda _{+}}\) denotes the truncation of the forgetting factor to the limits of the range [λ +,λ ].

For the LTVFF mechanism, the adaptation component is given by
$$ \zeta_{k}(i)=\alpha\zeta_{k}(i-1)+\beta|e_{k}(i)|^{2} $$
(15)
with parameters 0<α<1 and β>0. Besides, α is chosen close to 1 and β is set to be a small value. The quantity e k (i) denotes the priori estimation error [19] of each node for the DRLS algorithm, which can be expressed as
$$ e_{k}(i)=d_{k,i}-\mathbf{u}_{k,i}^{*}\mathbf{w}_{k,i-1}. $$
(16)

That is to say, in the LTVFF mechanism, the adaptation component is updated based on the instantaneous estimation error.

The LTVFF mechanism is given by (14) and (15). The value of the forgetting factor λ k (i) is controlled by the parameters α and β. Particularly, the effects of α and β on the performance of our proposed algorithms are investigated in Section 5. As can be seen from (14) and (15), large estimation errors will cause an increase in the adaptation component ζ k (i), which yields a smaller forgetting factor and provides a faster tracking speed. Conversely, small estimation errors will lead to the decrease of the adaptation component ζ k (i), and thus, the forgetting factor λ k (i) will be increased to yield smaller steady-state misadjustment.

Next, we study the steady-state statistical properties of the adaptation component ζ k (i) and the forgetting factor λ k (i). Based on (15), it is reasonable to assume that ζ k (i) and ζ k (i−1) are approximately equivalent when i. By taking expectations on both sides of (15) and letting i goes to infinity, we can obtain E[ζ k ()]
$$ E[\zeta_{k}(\infty)]=\frac{\beta}{1-\alpha}E\left[\left|e_{k}(\infty)\right|^{2}\right]. $$
(17)
Then, we compute the quantity of E[|e k ()|2]. Let us define the weight error vector for node k as
$$ \mathbf{\widetilde{w}}_{k,i}=\mathbf{w}_{k,i}-\mathbf{w}^{o}. $$
(18)
According to (16) and (18), we can rewrite E[|e k (i)|2] as
$$ \begin{aligned} E[|e_{k}(i)|^{2}]&=E\left[|d_{k,i}-\mathbf{u}_{k,i}^{*}(\mathbf{\widetilde{w}}_{k,i-1}+\mathbf{w}^{o})|^{2}\right]\\ &=E\left[|v_{k,i}-\mathbf{u}_{k,i}^{*}\mathbf{\widetilde{w}}_{k,i-1}|^{2}\right]\\ &={\sigma}_{v,k}^{2}+E\left[|\mathbf{u}_{k,i}^{*}\mathbf{\widetilde{w}}_{k,i-1}|^{2}\right] \end{aligned} $$
(19)
where the term \(E\left [\left |\mathbf {u}_{k,i}^{T}\mathbf {\widetilde {w}}_{k,i-1}\right |^{2}\right ]\) denotes the excess error. Since it is sufficiently small when i compared with the variance of noise, it can be neglected. As a consequence, the following approximation holds
$$ E\left[\left|e_{k}(\infty)\right|^{2}\right]\approx\varepsilon_{\text{min}} $$
(20)
where ε min denotes the minimum mean-square error and can be expressed as
$$ \varepsilon_{\text{min}}=E\left[\left|d_{k,i}-\mathbf{u}_{k,i}^{*}\mathbf{w}^{o}\right|^{2}\right]={\sigma}_{v,k}^{2}. $$
(21)
Subsequently, by substituting (20) into (17), we can approximately write
$$ E[\zeta_{k}(\infty)]\approx\frac{\beta}{1-\alpha}\varepsilon_{\text{min}}. $$
(22)
According to (14), we can deduce
$$ E[\lambda_{k}(\infty)]=1-E[\zeta_{k}(\infty)]. $$
(23)
By substituting (22) into (23), we can obtain the first-order statistics of the forgetting factor for the LTVFF mechanism:
$$ E[\lambda_{k}(\infty)]=1-\frac{\beta}{1-\alpha}\varepsilon_{\text{min}}. $$
(24)
By applying the LTVFF mechanism to the diffusion-based DRLS algorithm, we propose the LTVFF-DRLS algorithm, which is exhibited in the left column of Table 1.
Table 1

LTVFF-DRLS and LCTVFF-DRLS algorithms

 

LTVFF-DRLS Algorithm

 

LCTVFF-DRLS Algorithm

1

For each node k=1,2,…,N do

1

For each node k=1,2,…,N do

2

Initialize w k,−1=0, P k,−1=Π −1.

2

Initialize w k,−1=0, P k,−1=Π −1.

3

For time instant i=1,2,… do

3

For time instant i=1,2,… do

4

\(\lambda _{k}(i)=[1-\zeta _{k}(i)]_{\lambda _{-}}^{\lambda _{+}}\).

4

\(\lambda _{k}(i)=[1-\zeta _{k}(i)]_{\lambda _{-}}^{\lambda _{+}}\).

5

ζ k (i)=α ζ k (i−1)+β|e k (i)|2.

5

ζ k (i)=α ζ k (i−1)+β|ρ k (i)|2.

  

6

ρ k (i)=γ ρ k (i−1)+(1−γ)|e k (i−1)||e k (i)|.

6

Set ψ k,i =w k,i−1, \(\mathbf {P}_{k,i}=\lambda ^{-1}_{k}(i)\mathbf {P}_{k,i-1}\).

7

Set ψ k,i =w k,i−1, \(\mathbf {P}_{k,i}=\lambda ^{-1}_{k}(i)\mathbf {P}_{k,i-1}\).

7

For \(l\in \mathcal {N}_{k}\) do

8

For \(l\in \mathcal {N}_{k}\) do

8

\(\boldsymbol {\psi }_{k,i}{\longleftarrow }\boldsymbol {\psi }_{k,i}+ \frac {C_{l,k}\mathbf {P}_{k,i}\mathbf {u}_{l,i}[d_{l,i}-\mathbf {u}_{l,i}^{*}\boldsymbol {\psi }_{k,i}]} {\sigma _{v,l}^{2}+C_{l,k}\mathbf {u}_{l,i}^{*}\mathbf {P}_{k,i}\mathbf {u}_{l,i}}\).

9

\(\boldsymbol {\psi }_{k,i}{\longleftarrow }\boldsymbol {\psi }_{k,i}+ \frac {C_{l,k}\mathbf {P}_{k,i}\mathbf {u}_{l,i}[d_{l,i}-\mathbf {u}_{l,i}^{*}\boldsymbol {\psi }_{k,i}]} {\sigma _{v,l}^{2}+C_{l,k}\mathbf {u}_{l,i}^{*}\mathbf {P}_{k,i}\mathbf {u}_{l,i}}\).

9

\(\mathbf {P}_{k,i}{\longleftarrow }\mathbf {P}_{k,i}- \frac {C_{l,k}\mathbf {P}_{k,i}\mathbf {u}_{l,i}\mathbf {u}_{l,i}^{*}\mathbf {P}_{k,i}}{\sigma _{v,l}^{2}+C_{l,k}\mathbf {u}_{l,i}^{*}\mathbf {P}_{k,i}\mathbf {u}_{l,i}}\).

10

\(\mathbf {P}_{k,i}{\longleftarrow }\mathbf {P}_{k,i}- \frac {C_{l,k}\mathbf {P}_{k,i}\mathbf {u}_{l,i}\mathbf {u}_{l,i}^{*}\mathbf {P}_{k,i}}{\sigma _{v,l}^{2}+C_{l,k}\mathbf {u}_{l,i}^{*}\mathbf {P}_{k,i}\mathbf {u}_{l,i}}\).

10

End

11

End

11

Generate the final estimate \(\mathbf {w}_{k,i}=\sum \limits _{l\in \mathcal {N}_{k}}A_{l,k}\boldsymbol {\psi }_{l,i}\).

12

Generate the final estimate \(\mathbf {w}_{k,i}=\sum \limits _{l\in \mathcal {N}_{k}}A_{l,k}\boldsymbol {\psi }_{l,i}\).

12

End

13

End

13

End

14

End

3.2 LCTVFF mechanism

For the LCTVFF mechanism, the forgetting factor can be calculated through (14) while the adaptation component ζ k (i) can be adjusted according to an alternative rule, that is, the time-averaged estimation of the correlation of two consecutive estimation errors is employed to the updating equation of the adaptation component ζ k (i). Therefore, the rule to update the adaptation component can be described as
$$ \zeta_{k}(i)=\alpha\zeta_{k}(i-1)+\beta|\rho_{k}(i)|^{2} $$
(25)
where 0<α<1 and β>0. Particularly, α is set close to 1 and β is chosen to be slightly larger than 0. The quantity ρ k (i) denotes the time-averaged estimation of the correlation of two consecutive estimation errors, which is defined by
$$ \rho_{k}(i)=\gamma\rho_{k}(i-1)+(1-\gamma)|e_{k}(i-1)||e_{k}(i)| $$
(26)

where 0<γ<1 and γ is slightly smaller than 1. Note that the LCTVFF mechanism is given by (14), (25), and (26).

Next, we consider the steady-state statistical properties of ρ k (i), ζ k (i), and λ k (i) for the LCTVFF mechanism. As we will see in simulation results, the proposed algorithm converges to the steady-state in numerous iterations, and thus, the values of ρ k (i−1) and ρ k (i) can be assumed to be approximately equivalent, respectively, when i is large enough. Thus, we can obtain E[|e k (i−1)||e k (i)|]≈E[|e k (i)|2] and ρ k (i−1)≈ρ k (i) when i. Then, by taking expectations on both sides of (26) and letting i go to infinity, we can obtain the first-order statistical properties of ρ k (i):
$$ E[\rho_{k}(\infty)]{\approx}\varepsilon_{\text{min}}. $$
(27)
To study the second-order statistical properties of ρ k (i), we consider the square of (26), which is given by
$$ \begin{aligned} \rho_{k}^{2}(i)&=\gamma^{2}\rho_{k}^{2}(i-1)+(1-\gamma)^{2}|e_{k}(i-1)|^{2}|e_{k}(i)|^{2}\\ &\quad+2\gamma(1-\gamma)\rho_{k}(i-1)|e_{k}(i-1)||e_{k}(i)|. \end{aligned} $$
(28)
Recall that |e k (i−1)| and |e k (i)| can be considered equivalent when i, and thus, we can rewrite (28) as
$$ \begin{aligned} \rho_{k}^{2}(i)&{\approx}\gamma^{2}\rho_{k}^{2}(i-1)+(1-\gamma)^{2}|e_{k}(i)|^{4} \\ &\quad+2\gamma(1-\gamma)\rho_{k}(i-1)|e_{k}(i)|^{2}. \end{aligned} $$
(29)
Since (1−γ)2|e k (i)|4 is sufficiently small when compared with other terms in (29), it can be neglected. Therefore, we can obtain
$$ \rho_{k}^{2}(i){\approx}\gamma^{2}\rho_{k}^{2}(i-1)+2\gamma(1-\gamma)\rho_{k}(i-1)|e_{k}(i)|^{2}. $$
(30)
According to (16) and (26), the quantities of ρ k (i−1) and |e k (i)|2 can be considered uncorrelated at steady state, that is to say, E[ρ k (i−1)|e k (i)|2]≈E[ρ k (i−1)]E[|e k (i)|2]. Note that the detailed derivation is presented in Appendix A: Proof of the uncorrelation of ρ k (i−1) and |e k (i)|2 in the steady state. Then, by taking expectations on both sides of (30), we can obtain the following result:
$$ E\left[\rho_{k}^{2}(\infty)\right]=\frac{2\gamma}{1+\gamma}E\left[\rho_{k}(\infty)\right] E\left[|e_{k}(\infty)|^{2}\right]. $$
(31)
Substituting (20) and (27) into (31) results in
$$ E\left[\rho_{k}^{2}(\infty)\right]\approx\frac{2\gamma}{1+\gamma}\varepsilon^{2}_{\text{min}}. $$
(32)
To calculate the first-order statistics of the adaptation component ζ k (i), we take expectations on both sides of (25) and let i goes to infinity, as a result, we obtain
$$ E[\zeta_{k}(\infty)]=\frac{\beta}{1-\alpha}E[\rho_{k}^{2}(\infty)]. $$
(33)
Substituting (32) into (33) leads to
$$ E[\zeta_{k}(\infty)]=\frac{2\gamma\beta}{(1+\gamma)(1-\alpha)}\varepsilon^{2}_{\text{min}}. $$
(34)
Consequently, we have the first-order steady-state statistics of the forgetting factor for the LCTVFF mechanism as follows:
$$\begin{array}{@{}rcl@{}} E[\lambda_{k}(\infty)]=1-\frac{2\gamma\beta}{(1+\gamma)(1-\alpha)}\varepsilon^{2}_{\text{min}}. \end{array} $$
(35)

By employing the LCTVFF mechanism to the diffusion-based DRLS algorithm, we propose the LCTVFF-DRLS algorithm, which is presented in the right column of Table 1.

3.3 Computational complexity

In this part, we study the computational complexity of the proposed LTVFF and LCTVFF mechanisms in comparison with the GVFF mechanism. Generally, we evaluate the number of arithmetic operations in terms of complex additions and multiplications for each node at each iteration. The results have been shown in Tables 2 and 3. From Table 3, the additional computational complexity of the proposed LTVFF and LCTVFF mechanisms is evaluated by fixed small values for each node at each iteration. However, for the GVFF mechanism, the additional computational complexity increases with the size of the sensor network for each node at each iteration. The result in Table 3 clearly reveals that the proposed LTVFF and LCTVFF mechanisms greatly reduce computational cost when compared to the GVFF mechanism.
Table 2

Computational complexity of the DRLS algorithm

 

Number of operations for each node at each iteration

 

Multiplications

Additions

DRLS (fixed forgetting factor)

M 2+N 2(4M 2+5M+1)+M

4N 2 M 2+M−1

Table 3

Additional computational complexity of the analyzed VFF mechanisms

 

Number of operations for each node at each iteration

 

Multiplications

Additions

GVFF

M 2+N 2(9M 2+4M)+2M+1

M 2+N 2(9M 2−3M−1)+2M−2

LTVFF

3

2

LCTVFF

6

3

4 Performance analysis

In this section, we carry out the analyses in terms of mean and mean square error performance for the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms. In particular, we derive mathematical expressions to describe the steady-state behavior based on MSD and EMSE. In addition, we also perform transient analysis in a specialized case for the proposed algorithms in the last part of this section. To proceed with the analysis, we first introduce several assumptions, which have been widely adopted in the analysis for the RLS-type algorithms and have been verified by simulations [7, 27].

Assumption 1

To facilitate analytical studies, we assume that all the input vectors u k,i ,k,i are independent of each other and the correlation matrix of the input vector u k,i is invariant over time, which is defined as
$$ E[\mathbf{u}_{k,i}\mathbf{u}_{k,i}^{*}]\mathbf{R}_{u_{k}}. $$
(36)

Assumption 2

For the proposed LTVFF and LCTVFF mechanisms, when i becomes large, we assume that there exists a positive number N i , when i>N i , for which we have that the forgetting factor λ k (i) varies slowly around its mean value, that is
$$ E\{\lambda_{k}(N_{i})\}{\simeq}E\{\lambda_{k}(N_{i}+1)\}{\simeq}\ldots{\simeq}E\{\lambda_{k}(i)\}{\simeq}E\{\lambda_{k}(\infty)\}. $$
(37)

For the RLS-type algorithms with the fixed forgetting factor, we have the ergodicity assumption for P k,i [6, 7, 27], that is, the time average of a sequence of random variables can be replaced by its expected value so as to make the analysis for the performance of these algorithms tractable. Similarly, for the RLS-type algorithms with variable forgetting factors, we still have the ergodicity assumption:

Assumption 3

We assume that there exists a number N i >0, when i>N i , for which we can replace \(\mathbf {P}_{k,i}^{-1}\) by its expected value \(E\left [\mathbf {P}_{k,i}^{-1}\right ]\), which can be represented as
$$ {\lim}_{i\to\infty}\mathbf{P}_{k,i}^{-1}\approx{\lim}_{i\to\infty}E\left[\mathbf{P}_{k,i}^{-1}\right] $$
(38)
where \(\lim \limits _{i\to \infty }E\left [\mathbf {P}_{k,i}^{-1}\right ]\) can be calculated through
$$ {\lim}_{i\to\infty}E\left[\mathbf{P}_{k,i}^{-1}\right]=\frac{1}{1-E[\lambda_{k}(\infty)]}\sum_{l=1}^{N}\frac{C_{l,k}}{ \sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}. $$
(39)

The derivation is presented in Appendix B. Since \(\lim \limits _{i\to \infty }E\left [\mathbf {P}_{k,i}^{-1}\right ]\) is independent of i, we can denote it by \(\mathbf {P}_{k}^{-1}\). Moreover, based on the ergodicity assumption, it is also common in the analysis of the performance of the RLS-type algorithms to replace the random matrix P k,i by P k when i is large enough.

4.1 Mean performance

In light of (1) and (13), the following relation holds [7] after the incremental update of ψ l,i is complete:
$$ \mathbf{P}^{-1}_{l,i}\boldsymbol{\psi}_{l,i}=\lambda_{l}(i)\mathbf{P}^{-1}_{l,i-1}\mathbf{w}_{l,i-1}+\sum\limits_{m=1}^{N}\frac{C_{m,l}}{ \sigma^{2}_{v,m} }\mathbf{u}_{m,i}d_{m,i}. $$
(40)
By substituting (1) and (18) into (40), we obtain the following equation:
$$ \mathbf{P}^{-1}_{l,i}(\boldsymbol{\psi}_{l,i}-\mathbf{w}^{o})=\lambda_{l}(i)\mathbf{P}^{-1}_{l,i-1}\widetilde{\mathbf{w}}_{l,i-1} +\sum_{m=1}^{N}\frac{C_{m,l}}{\sigma_{v,m}^{2}}\mathbf{u}_{m,i}v_{m,i}. $$
(41)
Next, let us define the intermediate weight error vector \(\widetilde {\boldsymbol {\psi }}_{k,i}\) for node k as
$$ \widetilde{\boldsymbol{\psi}}_{k,i}=\boldsymbol{\psi}_{k,i}-\mathbf{w}^{o}. $$
(42)
Substituting (42) into (41) results in the following result:
$$ \widetilde{\boldsymbol{\psi}}_{l,i}=\lambda_{l}(i)\mathbf{P}_{l,i}\mathbf{P}^{-1}_{l,i-1}\widetilde{\mathbf{w}}_{l-1,i} +\mathbf{P}_{l,i}\sum\limits_{m=1}^{N}\frac{C_{m,l}}{\sigma_{v,m}^{2}}\mathbf{u}_{m,i}v_{m,i}. $$
(43)
Then, we construct \(\widetilde {\mathbf {w}}_{k,i}\) from \(\widetilde {\boldsymbol {\psi }}_{l,i}\) based on (13) and obtain
$$ \widetilde{\mathbf{w}}_{k,i}=\!\sum\limits_{l=1}^{N}A_{l,k}\!\left[\!\lambda_{l}(i)\mathbf{P}_{l,i}\mathbf{P}^{-1}_{l,i-1}\widetilde{\mathbf{w}}_{l,i-1}+\mathbf{P}_{l,i}\!\sum\limits_{m=1}^{N}\!\frac{C_{m,l}}{\sigma_{v,m}^{2}}\mathbf{u}_{m,i}v_{m,i}\!\right]\!. $$
(44)
Note that P k,i can be replaced by P k when i is large enough (cf. Assumption 3), and thus, it is reasonable to assume that P k,i converges as i. Therefore, we can approximately have
$$ \mathbf{P}_{k,i}{\approx}E[\mathbf{P}_{k,i}]. $$
(45)
Besides, in view of Assumption 3 and the Eq. (39), we can obtain
$$ {\begin{aligned} \mathbf{P}_{k,i}=\left(\mathbf{P}^{-1}_{k,i}\right)^{-1}{\approx}\left\{E\left[\mathbf{P}_{k,i}^{-1}\right]\right\}^{-1}{\approx}\left(1-E[\lambda_{k}(\infty)]\right)\left(\sum\limits_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}\right)^{-1}. \end{aligned}} $$
(46)
By combining (45) and (46), we have the following approximation:
$$ \mathbf{P}_{k,i}\mathbf{P}^{-1}_{k,i-1}{\approx}E^{-1}\left[\mathbf{P}_{k,i}^{-1}\right]E\left[\mathbf{P}_{k,i}^{-1}\right]=\mathbf{I}_{M}. $$
(47)
Then, substituting (47) into (44) yields the following result when i is sufficiently large:
$$ \widetilde{\mathbf{w}}_{k,i}=\sum\limits_{l=1}^{N}A_{l,k}\left[\lambda_{l}(i)\widetilde{\mathbf{w}}_{l,i-1}+\mathbf{P}_{l,i}\sum\limits_{m=1}^{N}\frac{C_{m,l}}{\sigma_{v,m}^{2}}\mathbf{u}_{m,i}v_{m,i}\right]. $$
(48)
Following this, two global matrices \(\widetilde {\mathbf {W}}_{i}\) and \(\boldsymbol {\mathcal {P}}\) are built in the following form in order to collect the weight error vectors \(\widetilde {\mathbf {w}}_{k,i},k=1,\cdots,N\) and matrices P k ,k=1,,N, respectively:
$$\begin{array}{@{}rcl@{}} &&\widetilde{\mathbf{W}}_{i}=\text{row}\{\widetilde{\mathbf{w}}_{1,i},\widetilde{\mathbf{w}}_{2,i},\ldots,\widetilde{\mathbf{w}}_{N,i}\}\\ &&\boldsymbol{\mathcal{P}}=\text{row}\{\mathbf{P}_{1},\mathbf{P}_{2},\ldots,\mathbf{P}_{N}\}. \end{array} $$
(49)
In addition, we introduce a global diagonal matrix D(i) to collect the forgetting factors of all nodes at time instant i, which is given by
$$ \boldsymbol{\Lambda}_{i}=\text{diag}\{\lambda_{1}(i),\lambda_{2}(i),\ldots,\lambda_{N}(i)\}. $$
(50)
Using the vectors in (2), the term \(\sum \limits _{m=1}^{N}\frac {C_{m,l}}{\sigma _{vm}^{2}}\mathbf {u}_{m,i}v_{m,i}\) in (44) can be rewritten as \(\mathbf {H}_{i}^{*}\mathbf {C}_{l}\mathbf {R}_{v}^{-1}\mathbf {v}_{i}\). By collecting the vectors \(\mathbf {H}_{i}^{*}\mathbf {C}_{l}\mathbf {R}_{v}^{-1}\mathbf {v}_{i}\), l=1,2,…,N, into a block diagonal matrix G i , we obtain
$$ \mathbf{G}_{i}=\text{diag}\left\{\mathbf{H}_{i}^{*}\mathbf{C}_{1}\mathbf{R}_{v}^{-1}\mathbf{v}_{i},\mathbf{H}_{i}^{*} \mathbf{C}_{2}\mathbf{R}_{v}^{-1}\mathbf{v}_{i},\ldots,\mathbf{H}_{i}^{*}\mathbf{C}_{N}\mathbf{R}_{v}^{-1}\mathbf{v}_{i}\right\}. $$
(51)
To separate the noise vectors, we can rewrite (51) as
$${} \mathbf{G}_{i}=\text{diag}\left\{\mathbf{H}_{i}^{*}\mathbf{C}_{1}\mathbf{R}_{v}^{-1},\mathbf{H}_{i}^{*}\mathbf{C}_{2}\mathbf{R}_{v}^{-1},\ldots,\mathbf{H}_{i}^{*}\mathbf{C}_{N}\mathbf{R}_{v}^{-1}\right\}(\mathbf{I}_{N}\otimes\mathbf{v}_{i}). $$
(52)
where denotes the Kronecker product of two matrices [28]. Subsequently, we express (48) in a more compact way, which leads to the following updating equation for the global matrix \(\widetilde {\mathbf {W}}_{i}\):
$$ \widetilde{\mathbf{W}}_{i}=\widetilde{\mathbf{W}}_{i-1}\boldsymbol{\Lambda}_{i}\mathbf{A}+\boldsymbol{\mathcal{P}}\mathbf{G}_{i}\mathbf{A}. $$
(53)
In order to simplify the notation Λ i A, we denote it as F(i), and thus, we can rewrite (53) as
$$ \widetilde{\mathbf{W}}_{i}=\widetilde{\mathbf{W}}_{i-1}\mathbf{F}(i)+\boldsymbol{\mathcal{P}}\mathbf{G}_{i}\mathbf{A}. $$
(54)
In order to facilitate analysis, we assume that \(\widetilde {\mathbf {W}}_{i-1}\) and F(i) can be considered uncorrelated, that is, \(E[\widetilde {\mathbf {W}}_{i-1}\mathbf {F}(i)]\approx E[\widetilde {\mathbf {W}}_{i-1}]E[\mathbf {F}(i)]\). As we will see in simulation results, this assumption works well for theoretical analysis, which matches numerical results perfectly. By taking expectations on both sides of (54), we obtain the following result:
$$ E[\widetilde{\mathbf{W}}_{i}]=E[\widetilde{\mathbf{W}}_{i-1}]E[\mathbf{F}(i)]+\boldsymbol{\mathcal{P}}E[\mathbf{G}_{i}]\mathbf{A}. $$
(55)
Recall (52), since the noise samples v i have zero mean, E[G i ] equals to zero; therefore, we can obtain
$$ E[\widetilde{\mathbf{W}}_{i}]=E[\widetilde{\mathbf{W}}_{i-1}]E[\mathbf{F}(i)]. $$
(56)
Following this, we assume that there exists a number N i >0 and iterate (56) starting from the time instant i to N i , as a result, we obtain
$$ E[\widetilde{\mathbf{W}}_{i}]=E[\widetilde{\mathbf{W}}_{N_{i}}]\prod_{j=N_{i}+1}^{i}E[\mathbf{F}(j)]. $$
(57)
Recalling that F(i)=Λ i A, with Λ i a diagonal matrix, we have the following relation for each element in F(i):
$$ \mathbf{F}_{m,n}(i)=\lambda_{m}(i)\mathbf{A}_{m,n}(i), \forall m,n\in \{1, 2, \cdots, N\} $$
(58)
where the subscript m,n represents the (m,n)-th element in the matrix. Given that the elements of A are all between 0 and 1 and each element in the diagonal matrix Λ i does not exceed the upper bound λ +, which is smaller than unity, we have
$$ {{\begin{aligned} \mathbf{F}_{m,n}(i)=\lambda_{m}(i)\mathbf{A}_{m,n}(i)<\lambda_{+}\mathbf{A}_{m,n}(i)<1, \forall m,n\in \{1, 2, \cdots, N\} \end{aligned}}} $$
(59)

Each element in the product \(\prod \limits _{j=N_{i}+1}^{i}E[\mathbf {F}(j)]\) can be viewed as a polynomial of F 1,1(i),F 1,2(i),, with an order of iN i +1. When i, each element of this product approaches zero since F 1,1(i),F 1,2(i), are all smaller than unity. Now, assuming that all the elements of \(E[\widetilde {\mathbf {W}}_{N_{i}}]\) are bounded in absolute value by some finite constant, therefore, all the elements of \(E[\widetilde {\mathbf {W}}_{i}]\) converge to zero when i. As a result, we can conclude that the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms converge asymptotically when i.

4.2 Mean-square error and deviation performances

In this part, we perform analyses for the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms based on mean square performance and derive expressions for the steady-state MSD and EMSE, which are defined as
$$ \begin{aligned} {MSD}_{k}^{ss}&={\lim}_{i\to\infty}E\left[\|\widetilde{\mathbf{w}}_{k,i}\|^{2}\right]\\ {EMSE}_{k}^{ss}&={\lim}_{i\to\infty}E\left[|\mathbf{u}_{k,i}^{*}\widetilde{\mathbf{w}}_{k,i-1}|^{2}\right]. \end{aligned} $$
(60)
We start with (54) and then operate recursively from time instant N i , which yields
$$ \widetilde{\mathbf{W}}_{i}=\widetilde{\mathbf{W}}_{N_{i}}\prod\limits_{j=N_{i}+1}^{i}\mathbf{F}(j)+\boldsymbol{\mathcal{P}}\sum\limits_{t=N_{i}+1}^{i}\mathbf{G}_{t}\mathbf{A}\prod\limits_{j=t+1}^{i}\mathbf{F}(j). $$
(61)
Then, the kth column of \(\widetilde {\mathbf {W}}_{i}\) is given by
$$ \widetilde{\mathbf{w}}_{k,i}=\widetilde{\mathbf{W}}_{N_{i}}\prod\limits_{j=N_{i}+1}^{i}\mathbf{F}(j)\mathbf{e}_{k} +\boldsymbol{\mathcal{P}}\sum\limits_{t=N_{i}+1}^{i}\mathbf{G}_{t}\mathbf{A}\prod_{j=t+1}^{i}\mathbf{F}(j)\mathbf{e}_{k} $$
(62)

where e k is a column vector of length N with unity for the kth element and zero for the others. Next, we write the Euclidean norm of the weight error vector \(\widetilde {\mathbf {w}}_{k,i}\), that is, \(\|\widetilde {\mathbf {w}}_{k,i}\|^{2}\), or equivalently, \(Tr\{\widetilde {\mathbf {w}}_{k,i}\widetilde {\mathbf {w}}_{k,i}^{*}\}\).

Since the elements of F(i) are all bounded by zero and one, \(\prod \limits _{j=N_{i}+1}^{i}\mathbf {F}(j)\) vanishes when i, which leads to the expectation of the first term becoming zero. Moreover, seeing that the cross terms incorporate the zero-mean vectors v i , their expectations also become zero. As a result, we have the following expression:
$$ E\left[\|\widetilde{\mathbf{w}}_{k,i}\|^{2}\right]=E\left[\left\|\boldsymbol{\mathcal{P}}\sum_{t=n_{i}+1}^{i}\mathbf{G}_{t}\mathbf{A}\prod_{j=t+1}^{i}\mathbf{F}(j)\mathbf{e}_{k}\right\|^{2}\right] $$
(63)
which can be rewritten as
$${} \begin{aligned} E\left[\|\widetilde{\mathbf{w}}_{k,i}\|^{2}\right]&=E\left[Tr\left\{\boldsymbol{\mathcal{P}}\sum\limits_{t=N_{i}+1}^{i}\mathbf{G}_{t}\mathbf{A}\prod\limits_{j=t+1}^{i}\mathbf{F}(j)\mathbf{e}_{k}\mathbf{e}_{k}^{*}\sum\limits_{l=N_{i}+1}^{i}\right.\right.\\ &\quad\left.\left.\times\left(\prod\limits_{j=l+1}^{i}\mathbf{F}(j)\right)^{*}\mathbf{A}^{*}\mathbf{G}_ {l}^{*}\boldsymbol{\mathcal{P}}^{*}\right\}\right]. \end{aligned} $$
(64)
For simplicity, we have the following notation:
$$ \mathbf{J}^{t,l}(i)=\mathbf{A}\prod_{j=t+1}^{i}\mathbf{F}(j)\mathbf{e}_{k}\mathbf{e}_{k}^{*}\left(\prod\limits_{j=l+1}^{i}\mathbf{F}(j)\right)^{*}\mathbf{A}^{*} $$
(65)
where J t,l (i) is a matrix of size N×N. By combining (52), (64), and (65), let us first compute \((\mathbf {I}_{N}{\otimes }\mathbf {v}_{t})\mathbf {J}^{t,l}(i)(\mathbf {I}_{N}{\otimes }\mathbf {v}_{l}^{*})\). According to the properties of the Kronecker product, we have the following equality:
$$ (\mathbf{A}\otimes\mathbf{B})(\mathbf{C}\otimes\mathbf{D})=\mathbf{AC}\otimes\mathbf{BD}. $$
(66)
Therefore, \((\mathbf {I}_{N}{\otimes }v_{t})\mathbf {J}^{t,l}(i)\left (\mathbf {I}_{N}{\otimes }v_{l}^{*}\right)\) can be expressed as
$${} \begin{aligned} (\mathbf{I}_{N}{\otimes}\mathbf{v}_{t})\mathbf{J}^{t,l}(i)\left(\mathbf{I}_{N}{\otimes}\mathbf{v}_{l}^{*}\right)&=\left(\mathbf{I}_{N}{\otimes}\mathbf{v}_{t}\right)\left(\mathbf{J}^{t,l}(i){\otimes}1\right)\left(\mathbf{I}_{N}{\otimes}\mathbf{v}_{l}^{*}\right)\\ &=\mathbf{J}^{t,l}(i){\otimes}\left(\mathbf{v}_{t}\mathbf{v}_{l}^{*}\right). \end{aligned} $$
(67)
Note that, in light of (65), the matrix J t (i) and the covariance matrix of noise R v can be considered uncorrelated. Then, by taking expectations on both sides of (67), we have the following results:
$$ \begin{aligned} E\left[(\mathbf{I}_{N}{\otimes}\mathbf{v}_{t})\mathbf{J}^{t,l}(i)\left(\mathbf{I}_{N}{\otimes}\mathbf{v}_{l}^{*}\right)\right] &=E\left[\mathbf{J}^{t,l}(i)\right]{\otimes}E\left[(\mathbf{v}_{t}\mathbf{v}_{l}^{*})\right]\\ &=\left\{\begin{array}{ll} E[\mathbf{J}^{t}(i)]{\otimes}\mathbf{R}_{v}&t=l,\\ 0 &t\neq l, \end{array}\right. \end{aligned} $$
(68)
where we drop the index and denote J t,t (i) as J t (i). By substituting (68) into (64), we can obtain
$$ E\left[\|\widetilde{\mathbf{w}}_{k,i}\|^{2}\right]=E\left[Tr\left\{\boldsymbol{\mathcal{P}}\sum_{t=N_{i}+1}^{i}\mathbf{G}_{t}\mathbf{J}^{t}(i)\mathbf{G}_{t}^{*}\mathbf{\mathcal{P}}^{*}\right\}\right]. $$
(69)
Note that P k , k=1,2,…,N is Hermitian; therefore, we have the following expression:
$$ E\left[\|\widetilde{\mathbf{w}}_{k,i}\|^{2}\right]=E\left[Tr\left\{\boldsymbol{\mathcal{P}}\sum_{t=N_{i}+1}^{i}\mathbf{G}_{t} \mathbf{J}^{t}(i)\mathbf{G}_{t}^{*}\boldsymbol{\mathcal{P}}^{T}\right\}\right] $$
(70)
where \(\mathbf {G}_{t}\mathbf {J}^{t}(i)\mathbf {G}_{t}^{*}\) can be represented as a block matrix K t (i), which can be decomposed into N×N blocks of size M×M each. The (m,l)th block is given by
$$ \mathbf{K}_{m,l}^{t}(i)=\mathbf{H}_{t}^{*}\mathbf{C}_{m}\mathbf{R}_{v}^{-1}\mathbf{J}_{m,l}^{t}(i)\mathbf{v}_{t}\mathbf{v}^{*}_{t}C_{l}\mathbf{R}_{v}^{-1}\mathbf{H}_{t}. $$
(71)
By taking expectations on both sides of (71), we obtain the following equality:
$$ E[\mathbf{K}_{m,l}^{t}(i)]=E\left[\mathbf{J}_{m,l}^{t}(i)\right]\sum_{n=1}^{N}\frac{C_{n,m}C_{n,l}}{\sigma_{v,n}^{2}}\mathbf{R}_{u_{n}}. $$
(72)
Substituting (65) and (72) into (70) yields the following result:
$$ {{\begin{aligned} E[\|\widetilde{\mathbf{w}}_{k,i}\|^{2}]&=Tr\left\{\sum_{t=N_{i}+1}^{i}\sum_{l=1}^{N}\sum_{m=1}^{N}\mathbf{P}_{m}E\left[\mathbf{K}_{m,l}^{t}(i)\right]\mathbf{P}_{l}\right\}\\ &=\sum_{t=N_{i}+1}^{i}\sum_{l=1}^{N}\sum_{m=1}^{N}\sum_{n=1}^{N}Tr\{\mathbf{P}_{m}\mathbf{R}_{u_{n}}\mathbf{P}_{l}\} \frac{C_{n,m}C_{n,l}}{\sigma_{v,n}^{2}}\\ &\quad\times\left\{\mathbf{A}\prod_{j=t+1}^{i}E[\mathbf{F}(j)]\right\}_{m,k}\left\{\mathbf{A}\prod_{j=t+1}^{i}E[\mathbf{F}(j)]\right\}_{l,k}. \end{aligned}}} $$
(73)
In view of Assumption 2, we can verify that there exists a number N i >0, when i>N i , for which F(i) satisfies
$$ E[\mathbf{F}(N_{i})]{\simeq}E[\mathbf{F}(N_{i}+1)]{\simeq}\ldots{\simeq}E[\mathbf{F}(i)]{\simeq}E[\mathbf{F}(\infty)]. $$
(74)
Therefore, we replace E[F(i)] with E[F()] when i>N i and then reformulate (73) as
$$ {{\begin{aligned} E[\|\widetilde{\mathbf{w}}_{k,i}\|^{2}]&\approx\sum_{t=N_{i}+1}^{i}\sum_{l=1}^{N}\sum_{m=1}^{N}\sum_{n=1}^{N}Tr\{\mathbf{P}_{m}\mathbf{R}_{u_{n}}\mathbf{P}_{l}\}\\ &\quad\times\frac{C_{n,m}C_{n,l}}{\sigma_{v,n}^{2}}\left\{\mathbf{A}E^{i-t}[\mathbf{F}(\infty)]\right\}_{m,k}\left\{\mathbf{A}E^{i-t}[\mathbf{F}(\infty)]\right\}_{l,k}. \end{aligned}}} $$
(75)
Subsequently, we replace it with t in (75) and then let i goes to infinity. As a result, we can obtain the expression of the steady-state MSD for node k:
$${} \begin{aligned} {MSD}_{k}^{ss}&={\lim}_{i\to\infty}E[\|\widetilde{\mathbf{w}}_{k,i}\|^{2}]\\ &={\lim}_{i\to\infty}\sum_{t=0}^{i}\sum_{l=1}^{N}\sum_{m=1}^{N}\sum_{n=1}^{N}Tr\{\mathbf{P}_{m}\mathbf{R}_{u_{n}}\mathbf{P}_{l}\}\\ &\quad\times\frac{C_{n,m}C_{n,l}}{\sigma_{v,n}^{2}}\left\{\mathbf{A}E^{t}[\mathbf{F}(\infty)]\right\}_{m,k}\left\{\mathbf{A}E^{t}[\mathbf{F}(\infty)]\right\}_{l,k}. \end{aligned} $$
(76)
Next, we calculate the steady-state EMSE for node k. According to (60), the EMSE for node k can be expressed as follows
$$ \begin{aligned} E\left[|\boldsymbol{u}_{k,i}^{*}\widetilde{\mathbf{w}}_{k,i-1}|^{2}\right]&=E\left[Tr\left\{\widetilde{\mathbf{w}}_{k,i-1}^{*}\mathbf{u}_{k,i}\mathbf{u}_{k,i}^{*}\widetilde{\mathbf{w}}_{k,i-1}\right\}\right]\\ &=E[Tr\{\mathbf{u}_{k,i}\mathbf{u}_{k,i}^{*}\widetilde{\mathbf{w}}_{k,i-1}\widetilde{\mathbf{w}}_{k,i-1}^{*}\}]\\ &=Tr\left\{\mathbf{R}_{u_{k}}E\left[\widetilde{\mathbf{w}}_{k,i-1}\widetilde{\mathbf{w}}_{k,i-1}^{*}\right]\right\}. \end{aligned} $$
(77)
Note that u k,i is independent of \(\widetilde {\mathbf {w}}_{k,i-1}\). By substituting (76) into (77), we can obtain the expression of the steady-state EMSE for node k:
$${} \begin{aligned} {EMSE}_{k}^{ss}&={\lim}_{i\to\infty}\sum_{t=0}^{i}\sum_{l=1}^{N}\sum_{m=1}^{N}\sum_{n=1}^{N}Tr\left\{\mathbf{R}_{u_{k}}\mathbf{P}_{m}\mathbf{R}_{u_{n}}\mathbf{P}_{l}\right\}\\ &\quad\times\frac{C_{n,m}C_{n,l}}{\sigma_{v,n}^{2}}\left\{\mathbf{A}E^{t}[\mathbf{F}(\infty)]\right\}_{m,k}\left\{\mathbf{A}E^{t}[\mathbf{F}(\infty)]\right\}_{l,k}. \end{aligned} $$
(78)

Expressions (76) and (78) describe the steady-state behavior of the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms. By comparing the expressions (76) and (78) with the analytical results in [7], it is clear that the fixed matrix λ 2 A in the expressions for the conventional DRLS algorithms has been replaced by the matrix F(i) in the expressions (76) and (78), which is weighted by the matrix Λ i . Since Λ i varies from one iteration to the next, F(i) varies for each iteration as well, which improves the tracking performance of the resulting algorithms. Furthermore, since all the elements in F(i) are bounded by zero and unity, the values of the steady-state MSD and EMSE given by (76) and (78) are both very small values when i is large enough. Thus, we can verify that the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms both converge in the mean-square sense.

4.3 Transient analysis under spatial invariance assumption

In this subsection, we consider a specialized case that the noise variances and input vector covariance matrices are the same for all the sensor nodes, and provide transient analysis for this specific case. Particularly, we assume spatial invariance:
$$\begin{array}{*{20}l} \sigma_{v_{1}}^{2}&=\sigma_{v_{2}}^{2}=\cdots=\sigma_{v_{N}}^{2}=\sigma_{v}^{2} \end{array} $$
(79)
$$\begin{array}{*{20}l} \mathbf{R}_{u_{1}}&=\mathbf{R}_{u_{2}}=\cdots=\mathbf{R}_{u_{N}}=\mathbf{R}_{u}. \end{array} $$
(80)

In addition, to facilitate analysis, we assume that all elements of the adaptation matrix C are equal to \(\frac {1}{N}\).

We study the transient analysis through focusing on the learning curve, which is obtained by depicting the squared priori estimation error, i.e., \(E\left [|\mathbf {u}_{k,i}^{*}(\mathbf {w}_{k,i}-\mathbf {w}^{o})|^{2}\right ]\) [29, 30], as a function of the iteration number i. We first rewrite this squared priori estimation error in a more compact form:
$$ \begin{aligned} &E\left[|\mathbf{u}_{k,i}^{*}(\mathbf{w}_{k,i}-\mathbf{w}^{o})|^{2}\right]\\ =&E\left[|\mathbf{u}_{k,i}^{*}\tilde{\mathbf{w}}_{k,i}|^{2}\right]\\ =&E\left[\tilde{\mathbf{w}}_{k,i}^{*}\mathbf{u}_{k,i}\mathbf{u}_{k,i}^{*}\mathbf{w}_{k,i}\right]\\ =&E\left[\tilde{\mathbf{w}}_{k,i}^{*}\mathbf{R}_{u}\tilde{\mathbf{w}}_{k,i}\right]\\ =&E\left[\|\tilde{\mathbf{w}}_{k,i}\|^{2}_{\mathbf{R}_{u}}\right] \end{aligned} $$
(81)

where we use the representation \(\|\mathbf {t}\|_{\mathbf {A}}^{2}=\mathbf {t}^{*}\mathbf {A}\mathbf {t}\) in the last equality.

Then, we use the spatial invariance assumption to simply (39) and (48). Particularly, by taking advantage of the assumption that the input vector covariance matrix is the same over all sensor nodes, we can derive the following expression from (39), when i is large enough:
$$ {{\begin{aligned} \mathbf{P}_{k,i}\approx E\left[\mathbf{P}_{k,i}^{-1}\right]^{-1}\approx (1-E[\lambda_{k}(i)])\sigma_{v}^{2}\mathbf{R}_{u}^{-1}\approx (1-\lambda_{k}(i))\sigma_{v}^{2}\mathbf{R}_{u}^{-1}. \end{aligned}}} $$
(82)
By substituting (82) into (48), we can arrive at
$${} \begin{aligned} \tilde{\mathbf{w}}_{k,i}&=\sum\limits_{l=1}^{N}A_{l,k}\left[\lambda_{l}(i)\tilde{\mathbf{w}}_{l,i}+(1-\lambda_{l}(i))\mathbf{R}_{u}^{-1}\sum\limits_{m=1}^{N}\mathbf{u}_{m,i}v_{m,i}\right]\\ &=\sum\limits_{l=1}^{N}A_{l,k}\lambda_{l}(i)\tilde{\mathbf{w}}_{l,i}+\sum\limits_{l=1}^{N}A_{l,k}(1-\lambda_{l}(i))\mathbf{R}_{u}^{-1}\mathbf{H}_{i}^{*}\mathbf{v}_{i}\\ &=\sum\limits_{l=1}^{N}A_{l,k}\lambda_{l}(i)\tilde{\mathbf{w}}_{l,i}+\sum\limits_{l=1}^{N}A_{l,k}(1-\lambda_{l}(i))\mathbf{s}_{i}\\ &=\sum\limits_{l=1}^{N}A_{l,k}\lambda_{l}(i)\tilde{\mathbf{w}}_{l,i}+\left(1-\sum\limits_{l=1}^{N}A_{l,k}\lambda_{l}(i)\right)\mathbf{s}_{i} \end{aligned} $$
(83)
where we use the column vector s i to denote the quantity \(\mathbf {R}_{u}^{-1}\mathbf {H}_{i}^{*}\mathbf {v}_{i}\) in the third equality, and we use the property of the combination matrix, i.e., \(\sum _{l=1}^{N}A_{l,k}=1, \forall k\in \{1, 2, \cdots, N\}\), to arrive at the fourth equality. Let us define
$$ \begin{aligned} \tilde{\boldsymbol{\mathcal{W}}}_{i}&=\text{col}\{\tilde{\mathbf{w}}_{1,i}, \tilde{\mathbf{w}}_{2,i},\cdots, \tilde{\mathbf{w}}_{N,i}\}\\ \boldsymbol{\lambda}_{i}&=\text{col}\{\lambda_{1}(i), \lambda_{2}(i), \cdots, \lambda_{N}(i)\}. \end{aligned} $$
(84)
Note that Λ i =diag{λ i }. Then, we can write the recursive equation of type (83) for all sensor nodes in a more compact form as follows:
$$ \begin{aligned} \tilde{\boldsymbol{\mathcal{W}}}_{i}&=\mathbf{A}^{T}\boldsymbol{\Lambda}_{i}\tilde{\boldsymbol{\mathcal{W}}}_{i-1}+\left(\mathbf{\mathbb{I}}-\mathbf{A}^{T}\boldsymbol{\lambda}_{i}\right)\otimes\mathbf{s}_{i}\\ &=\mathbf{A}^{T}\boldsymbol{\Lambda}_{i}\tilde{\boldsymbol{\mathcal{W}}}_{i-1}+\mathbf{f}(i)\otimes\mathbf{s}_{i} \end{aligned} $$
(85)
where \(\mathbf {f}(i)=\mathbf {\mathbb {I}}-\mathbf {A}^{T}\boldsymbol {\lambda }_{i}\) in the second equality. Then, we have the following global squared priori estimation error for all sensor nodes by using the last equality in (85):
$$ {{\begin{aligned} E[\|\tilde{\boldsymbol{\mathcal{W}}}_{i}\|^{2}_{\mathbf{R}_{u}}]&=E[\tilde{\boldsymbol{\mathcal{W}}}_{i}^{*}\mathbf{R}_{u}\tilde{\boldsymbol{\mathcal{W}}}_{i}]\\ &=E\left[\tilde{\boldsymbol{\mathcal{W}}}_{i-1}^{*}\boldsymbol{\Lambda}_{i}\mathbf{A}\mathbf{R}_{u}\mathbf{A}^{T}\boldsymbol{\Lambda}_{i}\tilde{\boldsymbol{\mathcal{W}}}_{i-1}+\left(\mathbf{f}(i)^{T}\otimes\mathbf{s}_{i}^{*}\right)\mathbf{R}_{u}\left(\mathbf{f}(i)\otimes\mathbf{s}_{i}\right)\right]\\ &=E\left[\|\tilde{\boldsymbol{\mathcal{W}}}_{i-1}\|^{2}_{\boldsymbol{\Sigma}}\right]+E\left[\left(\mathbf{f}(i)^{T}\otimes\mathbf{s}_{i}^{*}\right)(\mathbf{R}_{u}\otimes1)(\mathbf{f}(i)\otimes\mathbf{s}_{i})\right]\\ &=E\left[\|\tilde{\boldsymbol{\mathcal{W}}}_{i-1}\|^{2}_{\boldsymbol{\Sigma}}\right]+E\left[\left(\left(\mathbf{f}(i)^{T}\mathbf{R}_{u}\right)\otimes\mathbf{s}_{i}^{*}\right)(\mathbf{f}(i)\otimes\mathbf{s}_{i})\right]\\ &=E\left[\|\tilde{\boldsymbol{\mathcal{W}}}_{i-1}\|^{2}_{\boldsymbol{\Sigma}}\right]+E\left[\left(\mathbf{f}(i)^{T}\mathbf{R}_{u}\mathbf{f}(i)\right)\otimes(\mathbf{s}_{i}^{*}\mathbf{s}_{i})\right]\\ &=E\left[\|\tilde{\boldsymbol{\mathcal{W}}}_{i-1}\|^{2}_{\boldsymbol{\Sigma}}\right]+E\left[\mathbf{f}(i)^{T}\mathbf{R}_{u}\mathbf{f}(i)\right]E[\mathbf{s}_{i}^{*}\mathbf{s}_{i}] \end{aligned}}} $$
(86)
where Σ=Λ i A R u A T Λ i , and we use the property of the Kronecker product, i.e., (66), in the fourth and fifth equalities, and the fact that both quantities of f(i) T R u f(i) and \(\mathbf {s}_{i}^{*}\mathbf {s}_{i}\) are scalar and they are independent to arrive at the last equality. Particularly, \(E[\mathbf {s}_{i}^{*}\mathbf {s}_{i}]\) can be rewritten as
$$ \begin{aligned} &E[\mathbf{s}_{i}^{*}\mathbf{s}_{i}]\\ =&E\left[\text{Tr}\left(\mathbf{v}_{i}^{*}\mathbf{H}_{i}\left(\mathbf{R}_{u}^{-1}\right)^{*}\mathbf{R}_{u}^{-1}\mathbf{H}_{i}^{*}\mathbf{v}_{i}\right)\right]\\ =&E\left[\text{Tr}\left(\mathbf{H}_{i}\left(\mathbf{R}_{u}^{-1}\right)^{*}\mathbf{R}_{u}^{-1}\mathbf{H}_{i}^{*}\mathbf{v}_{i}\mathbf{v}_{i}^{*}\right)\right]\\ =&\sigma_{v}^{2}E\left[\text{Tr}\left(\left(\mathbf{R}_{u}^{-1}\right)^{*}\mathbf{R}_{u}^{-1}\mathbf{H}_{i}^{*}\mathbf{H}_{i}\right)\right]\\ =&N\sigma_{v}^{2}\text{Tr}\left(\left(\mathbf{R}_{u}^{-1}\right)^{*}\mathbf{R}_{u}^{-1}\mathbf{R}_{u}\right)\\ =&N\sigma_{v}^{2}\text{Tr}\left(\mathbf{R}_{u}^{-1}\right) \end{aligned} $$
(87)
where we use the spatial invariance assumption, i.e., \(\mathbf {v}_{i}\mathbf {v}_{i}^{*}=\text {diag}\{\sigma _{v}^{2}, \sigma _{v}^{2}, \cdots, \sigma _{v}^{2}\}=\sigma _{v}^{2}\mathbf {I}_{N}\) and \(\mathbf {H}_{i}^{*}\mathbf {H}_{i}=\sum _{m=1}^{N} \mathbf {u}_{m,i}\mathbf {u}_{m,i}^{*}=\sum _{m=1}^{N}\mathbf {R}_{u_{m}}=N\mathbf {R}_{u}\), to arrive at the third and fourth equalities, respectively, and the symmetry of the input vector covariance matrix in the last equality. By plugging (87) back into (86), we have
$$ \begin{aligned} &E\left[\left\|\tilde{\boldsymbol{\mathcal{W}}}_{i}\right\|^{2}_{\mathbf{R}_{u}}\right]\\ =&E\left[\left\|\tilde{\boldsymbol{\mathcal{W}}}_{i-1}\right\|^{2}_{\boldsymbol{\Sigma}}\right]+N\sigma_{v}^{2}E\left[\text{Tr}\left(\mathbf{f}(i)^{T}\mathbf{R}_{u}\mathbf{f}(i)\mathbf{R}_{u}^{-1}\right)\right]\\ =&E\left[\left\|\tilde{\boldsymbol{\mathcal{W}}}_{i-1}\right\|^{2}_{E[\boldsymbol{\Sigma}]}\right]+N\sigma_{v}^{2}E\left[\text{Tr}\left(\mathbf{f}(i)^{T}\mathbf{R}_{u}\mathbf{f}(i)\mathbf{R}_{u}^{-1}\right)\right]. \end{aligned} $$
(88)
For convenience, we use the notation \(\|\mathbf {t}\|^{2}_{\text {vec}\{\mathbf {A}\}}\)to denote the weighted norm \(\|\mathbf {t}\|^{2}_{\mathbf {A}}\), where the symbol vec{A} represents the vectorization of a matrix. Particularly, by using the equality vec{A B C}=(C T A)vec{B}, we can vectorize the matrix Σ=Λ i A R u A T Λ i as follows
$$ \begin{aligned} &\text{vec}\{\boldsymbol{\Sigma}\}\\ =&\text{vec}\left\{\boldsymbol{\Lambda}_{i}\mathbf{A}\mathbf{R}_{u}\mathbf{A}^{T}\boldsymbol{\Lambda}_{i}\right\}\\ =&\left(\left(\mathbf{A}^{T}\boldsymbol{\Lambda}_{i}\right)\otimes(\boldsymbol{\Lambda}_{i}\mathbf{A})\right)\text{vec}\{\mathbf{R}_{u}\}\\ =&\left((\boldsymbol{\Lambda}_{i}\mathbf{A})\otimes(\boldsymbol{\Lambda}_{i}\mathbf{A})\right)\text{vec}\{\mathbf{R}_{u}\}\\ =&(\mathbf{F}(i)\otimes\mathbf{F}(i))\text{vec}\{\mathbf{R}_{u}\}\\ =&\boldsymbol{\mathcal{F}}_{i}\boldsymbol{\gamma} \end{aligned} $$
(89)
where F(i)=Λ i A, \(\boldsymbol {\mathcal {F}}_{i}=\mathbf {F}(i)\otimes \mathbf {F}(i)\) and γ=vec{R u }. Ultimately, we have
$$ \begin{aligned} &E\left[\left\|\tilde{\boldsymbol{\mathcal{W}}}_{i}\right\|^{2}_{\boldsymbol{\gamma}}\right]\\ =&E\left[\left\|\tilde{\boldsymbol{\mathcal{W}}}_{i-1}\right\|^{2}_{E[\boldsymbol{\mathcal{F}}_{i}]\boldsymbol{\gamma}}\right]+N\sigma_{v}^{2}E\left[\text{Tr}\left(\mathbf{f}(i)^{T}\mathbf{R}_{u}\mathbf{f}(i)\mathbf{R}_{u}^{-1}\right)\right]. \end{aligned} $$
(90)

This recursive equation is stable and convergent if \(E[\boldsymbol {\mathcal {F}}_{i}]\) is stable [31].

Particularly, the quantity \(\boldsymbol {\mathcal {F}}_{i}\) has a spectral radius smaller than unity and thus is stable. This can be proved as follows: If we replace each element in Λ i by its upper bound λ +, then we have \(\boldsymbol {\mathcal {F}}_{i}\) replacced by \(\lambda _{+}^{2}\mathbf {A}\otimes \mathbf {A}\). Note that A satisfies \(\mathbf {\mathbb {I}}^{T}\mathbf {A}=\mathbf {\mathbb {I}}\), and then, we can readily verify that each column of AA sums up to unity. Hence, the quantity \(\lambda _{+}^{2}\mathbf {A}\otimes \mathbf {A}\) has the spectral radius \(\lambda _{+}^{2}\) that is smaller than one. Given that each element in Λ i does not exceed λ +, the spectral radius of \(\boldsymbol {\mathcal {F}}_{i}\) is smaller than \(\lambda _{+}^{2}\) and surely is smaller than unity. Therefore, for this specialized case, it can be verified theoretically that the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms are convergent in terms of the learning curve and the convergence rate is related to the varying forgetting factors.

Also note that, since the convergence performance of the adaptive algorithms does not depend on the outside environment but rely on the network topology and the design of algorithms, the analytical results in this specialized case also apply to the general case.

5 Simulation results

In this section, we present the simulation results for the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms when applied in two applications, that is, distributed parameter estimation and distributed spectrum estimation over sensor networks.

5.1 Distributed parameter estimation

In this part, we evaluate the performance of the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms when applied to distributed parameter estimation in comparison with the DRLS algorithm with the fixed forgetting factor and the GVFF-DRLS algorithm. In addition, we also verify the effectiveness of the proposed analytical expressions in (76) and (78) based on simulations.

We assume that there are 10 nodes in the sensor network and the length of the unknown weight vector is M=5. The input vectors u k,i , k=1,2,…,N are assumed to be Gaussian with zero means and variances \(\left \{\sigma _{u,k}^{2}\right \}\) chosen randomly between 1 and 2 for each node. The Gaussian noise samples v k,i , k=1,2,…,N have variances \(\left \{\sigma _{v,k}^{2}\right \}\) that are chosen randomly between 0.1 and 0.2 for each node. We generate the measurements {d k,i } according to (1). Simulation results are averaged over 100 experiments. The adaptation matrix C is governed by the Metropolis rule, while the choice of the diffusion matrix A follows the relative-degree rule [8]. The network topology used for the simulations is shown in Fig. 2.
Figure 2
Fig. 2

Network topology for the simulation results in Section 5.1

5.1.1 Effects of α, β, and γ

In this subsection, we study the effects of the parameters α, β, and γ on the performance of the proposed LTVFF and LCTVFF mechanisms. For the LTVFF mechanism, we investigate the steady-state MSD values versus α for β=0.0015,0.002,0.0025,0.005. The simulation results are shown in Fig. 3. For the LCTVFF mechanism, we first depict the steady-state MSD values versus α for β=0.0025,0.005,0.0075,0.01 in Fig. 4. Then, the effects of γ are illustrated in Fig. 5 by investigating the steady-state MSD values against γ for different pairs of α and β.
Figure 3
Fig. 3

Steady-state MSD versus α for different values of β for the LTVFF mechanism

Figure 4
Fig. 4

Steady-state MSD versus α for different values of β for the LCTVFF mechanism when γ=0.95

Figure 5
Fig. 5

Steady-state MSD versus γ for different values of α and β for the LCTVFF mechanism

As can be seen from Figs. 3 and 4 for both the LTVFF and LCTVFF mechanisms, the optimal choice of α and β is not unique. Specifically, different pairs of α and β can yield the same steady-state MSD value. For example, for the LTVFF mechanism, the pairs α=0.91,β=0.0015, α=0.89,β=0.002, and α=0.87,β=0.0025 provide almost the same steady-state MSD performance. For the LCTVFF mechanism, when γ=0.95, the pairs α=0.93,β=0.0025, α=0.90,β=0.005, α=0.85,β=0.0075, and α=0.80,β=0.01 yield almost the same steady-state MSD value. In addition, it can also be observed that with the decreasing of α and β, the steady-state performance degrades. Furthermore, the result in Fig. 5 reveals that the steady-state MSD performance of the LCTVFF mechanism does not change so much as γ varies for different pairs of α and β.

However, when we choose appropriate values for α, β, and γ, only considering the effects on the steady-state behaviors is not enough. This is because that the convergence speed is closely connected to the steady-state MSD values. That is to say, when the algorithm assumes a faster convergence speed, the steady-state error floor rises; if the convergence speed is controlled to be slower, the steady-state performance improves. Figures 6 and 7 show the trade-off between convergence speed and steady-state performance by depicting learning curves against different values of α and β for LTVFF-DRLS and LCTVFF-DRLS algorithms, respectively. Therefore, we need to keep a good balance between the steady-state behaviors and the convergence speed in order to ensure good performance. In practical applications, the optimized values of α, β, and γ should be obtained through experiments and then stored for the future use.
Figure 6
Fig. 6

Learning curves against different values of α and β for LTVFF-DRLS algorithm

Figure 7
Fig. 7

Learning curves against different values of α and β for LCTVFF-DRLS algorithm when γ=0.95

5.1.2 MSD and EMSE performance

Figures 8 and 9 show the MSD curves against the number of iterations for the LTVFF-DRLS and LCTVFF-DRLS algorithms with different initial values for the forgetting factor in comparison with the conventional DRLS algorithm and the GVFF-DRLS algorithm, respectively. The parameters of the considered algorithms are listed in Table 4. From the results, the LTVFF-DRLS algorithm converges to almost the same error floor in two scenarios where the variable forgetting factor is initialized to be small or large. This is also true for the LCTVFF-DRLS algorithm, which has lower error floor and faster convergence speed than the LTVFF-DRLS algorithm. However, as shown in Fig. 8, for the conventional DRLS algorithm, its convergence speed and steady-state error floor both have obvious changes when the fixed forgetting factors increases. Specifically, when the fixed forgetting factor is small, the conventional DRLS algorithm converges faster but has a higher error floor than the LTVFF-DRLS algorithm; however, as the fixed forgetting factors increase, it converges to a lower error floor (not as good as the LTVFF-DRLS algorithm) but has slower convergence speed. Besides, from Fig. 9, the MSD performance of the proposed LTVFF-DRLS and LCTVF-DRLS algorithms are less sensitive to the initial values for the forgetting factor than that of the GVFF-DRLS algorithm. Therefore, by employing the LTVFF and LCTVFF mechanisms, the proposed algorithms can track the optimal performance regardless of the initial values for the forgetting factor and greatly reduce the difficulty in choosing the appropriate value for the forgetting factor.
Figure 8
Fig. 8

MSD performance against iterations for the proposed algorithms with different initial values for the forgetting factor compared with the DRLS algorithm with the fixed forgetting factor

Figure 9
Fig. 9

MSD performance against iterations for the proposed algorithms with different initial values for the forgetting factor compared with the GVFF-DRLS algorithm

Table 4

Optimized parameters for different algorithms considered in Figs. 8 and 9

LTVFF-1

α=0.91,β=0.0015

 

λ 0=0.995,λ +=0.9998,λ =0.980

LTVFF-2

α=0.91,β=0.0015

 

λ 0=0.950,λ +=0.9998,λ =0.950

LCTVFF-1

α=0.95,β=0.005,γ=0.95

 

λ 0=0.995,λ +=0.9998,,λ =0.950

LCTVFF-2

α=0.95,β=0.005,γ=0.95

 

λ 0=0.950,λ +=0.9998,,λ =0.950

GVFF-1

λ 0=0.995,μ=0.005,λ +=0.9998,λ =0.990

GVFF-2

λ 0=0.950,μ=0.005,λ +=0.9998,λ =0.950

Fixed-1

λ=0.998

Fixed-2

λ=0.995

Fixed-3

λ=0.950

In Figs. 10, 11, 12, and 13, we evaluate the performance of the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms based on MSD and EMSE behaviors in comparison with that of the conventional DRLS with the fixed forgetting factor and the GVFF-DRLS algorithms. Specifically, the MSD and EMSE curves against the number of iterations for the analyzed algorithms are depicted in Figs. 10 and 11, respectively, while the steady-state MSD and EMSE values for each node are shown in Figs. 12 and 13, respectively. As can be seen from these results, both the LTVFF-DRLS and LCTVFF-DRLS algorithms converge after a number of iterations and achieve lower steady-state MSD and EMSE values compared to the DRLS algorithm with the fixed forgetting factor and the GVFF-DRLS algorithm. Besides, we also depict the analytical results which are calculated through expressions (76) and (78) in Figs 10, 11, 12, and 13. From these results, it is clear that analytical expressions corroborate the simulated results very well. The parameters of the considered algorithms are shown in Table 5, which are tuned through experiments by referring to the investigation in Section 5.1.1.
Figure 10
Fig. 10

MSD curve against number of iterations for the proposed and existing algorithms

Figure 11
Fig. 11

EMSE curve against number of iterations for the proposed and existing algorithms

Figure 12
Fig. 12

Steady-state MSD value versus node for the proposed and existing algorithms

Figure 13
Fig. 13

Steady-state EMSE value versus node for the proposed and existing algorithms

Table 5

Optimized parameters for different algorithms considered in Figs. 10, 11, 12, and 13

Fixed scheme

λ=0.990

GVFF

μ=0.005,λ 0=0.990,λ +=0.9998,λ =0.990

LTVFF

α=0.91,β=0.0015,λ 0=0.990

 

λ +=0.9998,λ =0.980

LCTVFF

α=0.95,β=0.005,γ=0.95

 

λ 0=0.990,λ +=0.9998,,λ =0.950

In Fig. 14, we test the performance of different algorithms considered in a non-stationary environment. Specifically, in order to simulate the non-stationary environment, we consider the scenario where the topology of the sensor network varies over time, that is, the total number of sensor nodes is set to 40 at the start, and then, we switch off half of the nodes after 100 iterations and another 10 nodes after 800 iterations. The MSD curves against the number of iterations for the proposed algorithms in comparison with the conventional DRLS algorithm with the fixed forgetting factor and the GVFF-DRLS algorithm in the non-stationary environment are depicted in Fig. 14. As can be observed, the switching off of some sensor nodes results in the degradation of the performance for all the algorithms. However, the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms still outperform the other two existing algorithms in MSD performance. Besides, they exhibit better tracking properties by showing smaller and smoother variations in the MSD curves at the time of switching sensor nodes.
Figure 14
Fig. 14

MSD performance against number of iterations for the proposed and existing algorithms in a nonstationary environment

Next, we elaborate the numerical stability of the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms. Through tuning the parameters α, β, γ, λ +, λ + to different values, the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms can have different convergence speed and steady-state performance, but their MSD and EMSE curves always decrease to the steady-state. Indeed, after a number of experiments, we have not encountered the case where they diverge. Hence, the proposed LTVFF and LCTVFF mechanisms do not make the numerical stability of the DRLS algorithm worse. Besides, the simulation results in Fig. 14 show that, after switching some nodes in the network, the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms still achieve superior performance to the conventional DRLS algorithm, and they exhibit smoother MSD curves at the time of switching nodes, especially the LCTVFF-DRLS algorithm. This further verifies that the proposed algorithms improve instead of impair the numerical stability of the DRLS algorithm by keeping better tracking of the variations.

5.2 Distributed spectrum estimation

In this part, we extend the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms to the application of distributed spectrum estimation, for which we focus on estimating the parameter w 0 that is relevant to the unknown spectrum of a transmitted signal s. First of all, we characterize the system model of distributed spectrum estimation.

We denote the power spectral density (PSD) of the unknown spectrum of the transmitted signal s by Φ s (f), which can be well approximated by the following basis expansion model [32] with N b sufficiently large:
$$ \Phi_{s}(f)=\sum\limits_{m=1}^{N_{b}}b_{m}(f){w}_{0m}=\mathbf{b}_{0}^{T}(f)\mathbf{w}_{0} $$
(91)

where \(\phantom {\dot {i}\!}\mathbf {b}_{0}(f)=\text {col}\{b_{1}(f),b_{2}(f),\ldots,b_{N_{b}}(f)\}\) is the vector of basis functions [33, 34], \(\phantom {\dot {i}\!}\mathbf {w}_{0}=\text {col}\{{w}_{01},{w}_{02},\ldots,{w}_{0N_{b}}\}\) is the expansion parameter to be estimated and represents the power that transmits the signal s over each basis, and N b is the number of basis functions.

We assume H k (f,i) to be the channel transfer function between the source emitting the signal s and the receiver node k at time instant i. Based on (91), the PSD of the signal received by node k can be represented as
$$ \begin{aligned} \Phi_{r}(f)&=|H_{k}(f,i)|^{2}\Phi_{s}(f)+\sigma_{r,k}^{2}\\ &=\sum\limits_{m=1}^{N_{b}}|H_{k}(f,i)|^{2}b_{m}(f){w}_{0m}+\sigma_{r,k}^{2}\\ &=\mathbf{b}^{T}_{k,i}(f)\mathbf{w}_{0}+\sigma_{r,k}^{2} \end{aligned} $$
(92)

where \(\mathbf {b}_{k,i}(f)=\left [|H_{k}(f,i)|^{2}b_{m}(f)\right ]_{m=1}^{N_{b}}\in \mathbb {R}^{N_{b}}\) and \(\sigma _{r,k}^{2}\) denotes the receiver noise power at node k.

At each time instant i, by observing the received PSD described in (92) over N c frequency samples f j =f min :(f max f min )/N c :f max , for j=1,2,…,N c , each node k takes measurements according to the following model:
$$ d_{k,i}^{j}=\mathbf{b}^{T}_{k,i}(f_{j})\mathbf{w}_{0}+\sigma_{r,k}^{2}+v_{k,i}^{j} $$
(93)
where \(v_{k,i}^{j}\) denotes the sampling noise at frequency f j with zero mean and variance \(\sigma _{n,j}^{2}\). The receiver noise power \(\sigma _{r,k}^{2}\) can be estimated with high accuracy preliminarily and then subtracted from (93) [35, 36]. Therefore, we can obtain
$$ d_{k,i}^{j}=\mathbf{b}^{T}_{k,i}(f_{j})\mathbf{w}_{0}+v_{k,i}^{j}. $$
(94)
By collecting the measurements over N c frequencies into a column vector d k,i , we obtain the following system model of distributed spectrum estimation:
$$ \mathbf{d}_{k,i}=\mathbf{B}_{k,i}\mathbf{w}_{0}+\mathbf{v}_{k,i}. $$
(95)

where \(\mathbf {d}_{k,i}=\left [d_{k,i}^{f_{j}}\right ]_{j=1}^{N_{c}}\in \mathbb {R}^{N_{c}}\), \(\mathbf {B}_{k,i}=\left [\mathbf {b}^{T}_{k,i}(f_{j})\right ]_{j=1}^{N_{c}}\in \mathbb {R}^{N_{c}{\times }N_{b}}\), with N c >N b , and \(\mathbf {v}_{k,i}=\left [v_{k,i}^{j}\right ]_{j=1}^{N_{c}}\in \mathbb {R}^{N_{c}}\).

Next, we carry out simulations to show the performance of the proposed algorithms when applied to distributed spectrum estimation. We consider a sensor network composed of N=20 nodes in order to estimate the unknown expansion parameter w 0. We use N b =50 non-overlapping rectangular basis functions with amplitude equal to one to approximate the PSD of the unknown spectrum. The nodes can scan N c =100 frequencies over the frequency axis, which is normalized between 0 and 1. In particular, we assume that only 8 entries of w 0 are non-zero, which implies that the unknown spectrum is transmitted over 8 basis functions. Thus, the sparsity ratio equals to 8/50. We set the power transmitted over each basis function to be 0.7 and the variance of the sampling noise to be 0.004.

In Fig. 15, we compare the performance of different algorithms for the distributed spectrum estimation in terms of MSD. As can be depicted, the proposed LTVFF-DRLS and LCTVFF-DRLS algorithms still outperform the conventional DRLS algorithm in steady-state performance. By tuning parameters, the GVFF-DRLS algorithm can achieve similar performance to the proposed algorithms in the convergence speed and steady-state MSD values but at huge computational cost. We have listed the simulation time of running each algorithm for 600 iterations and 1 Monte Carlo experiment in Table 6. As can be observed, the simulation time of running the GVFF-DRLS algorithm is almost 3 times of that for running the other algorithms. In Fig. 16, we take node 1 as an example to investigate the performance of different algorithms in estimating the true PSD. From the results, although different algorithms obtain similar estimates of the true PSD, the proposed LCTVFF-DRLS algorithm obviously leads to smaller side lobes in the PSD curve than the other three.
Figure 15
Fig. 15

MSD performance for different algorithms applied in distributed spectrum estimation

Figure 16
Fig. 16

PSD performance for different algorithms applied in distributed spectrum estimation

Table 6

Simulation time of running different algorithms in Figs. 15 and 16

 

Simulation time (seconds)

LTVFF

61.11

LCTVF

66.35

GVFF

176

fixed

73

6 Conclusions

In this paper, we have proposed two low-complexity VFF-DRLS algorithms for distributed estimation including the LTVFF-DRLS and LCTVFF-DRLS algorithms. For the LTVFF-DRLS algorithm, the forgetting factor is adjusted by the time-averaged cost function, while for the LCTVFF-DRLS algorithm, the forgetting factor is adjusted by the time-averaged of the correlation of two successive estimation errors. We also have investigated the computational complexity of the low-complexity VFF mechanisms as well as the proposed VFF-DRLS algorithms. In addition, we have carried out the convergence and steady-state analysis for the proposed algorithms. Moreover, we also have derived analytical expressions for the steady-state MSD and EMSE. The simulation results have shown the superiority of the proposed algorithms to the conventional DRLS and GVFF-DRLS algorithms in applications of distributed parameter estimation and distributed spectrum estimation and have verified the effectiveness of our proposed analytical expressions for the steady-state MSD and EMSE.

7 Appendices

7.1 A: Proof of the uncorrelation of ρ k (i−1) and |e k (i)|2 in the steady state

By multiplying both sides of (26) by |e k (i)|2 and taking expectaitons, we have the following equation:
$$ {{\begin{aligned} E\left[\rho_{k}(i-1)|e_{k}(i)|^{2}\right]&={\gamma}E\left[\rho_{k}(i-2)|e_{k}(i)|^{2}\right]\!\\ &\quad+\!(1\,-\,\gamma)E\left[|e_{k}(i-2)||e_{k}(i-1)||e_{k}(i)|^{2}\right]. \end{aligned}}} $$
(96)
Recall that the values of e k (i−1) and e k (i) and the values of ρ k (i−1) and ρ k (i) can be considered approximately equivalent when i; therefore, we have the following results:
$$ {{\begin{aligned} E\left[\rho_{k}(i-1)|e_{k}(i)|^{2}\right] \approx&{\gamma}E\left[\rho_{k}(i-2)|e_{k}(i)|^{2}\right]\\ &+(1-\gamma)E\left[|e_{k}(i-1)|^{2}\right]E\left[|e_{k}(i)|^{2}\right]\\ \approx&{\gamma}E\left[\rho_{k}(i-1)|e_{k}(i)|^{2}\right]\\ &+(1-\gamma)\varepsilon_{\text{min}}^{2}. \end{aligned}}} $$
(97)
By recalling (27), we can obtain
$$\begin{array}{*{20}l} E\left[\rho_{k}(i-1)|e_{k}(i)|^{2}\right]\approx\varepsilon_{\text{min}}^{2}{\approx}E\left[\rho_{k}(i-1)]E[e_{k}(i)|^{2}\right] \end{array} $$
(98)

That is, we can conclude that ρ k (i−1) and |e k (i)|2 are uncorrelated in the steady state.

7.2 B: Proof of (39)

According to (8), we can obtain the following equation:
$$ \mathbf{P}^{-1}_{k,i}=\prod\limits_{j=0}^{i}\lambda_{k}(j)\boldsymbol{\Pi}+\boldsymbol{\mathcal{H}}_{i}^{*}\boldsymbol{\mathcal{W}}_{k,i}\boldsymbol{\mathcal{H}}_{i} $$
(99)
where the matrices \(\boldsymbol {\mathcal {H}}_{i}\) and \(\boldsymbol {\mathcal {W}}_{k,i}\) can be expressed as follows
$$ \begin{aligned} \boldsymbol{\mathcal{H}}_{i}&= \left[\begin{array}{c} \mathbf{H}_{i}\\ \boldsymbol{\mathcal{H}}_{i-1} \end{array}\right]\\ \boldsymbol{\mathcal{W}}_{k,i}&= \left[\begin{array}{cc} \mathbf{R}^{-1}_{v}\mathbf{C}_{k}&{}\\ {}&\lambda_{k}(i)\boldsymbol{\mathcal{W}}_{k,i-1} \end{array}\right]. \end{aligned} $$
(100)
Therefore, (99) can be reformulate as
$$\begin{aligned} \mathbf{P}^{-1}_{k,i}&=\lambda_{k}(i)\left(\prod\limits_{j=0}^{i-1}\lambda_{k}(j)\boldsymbol{\Pi}+\boldsymbol{\mathcal{H}}_{i-1}^{*}\boldsymbol{\mathcal{W}}_{k,i-1}\boldsymbol{\mathcal{H}}_{i-1}\right)\\ & \quad +\mathbf{H}^{*}_{i}\mathbf{R}^{-1}_{v}\mathbf{C}_{k}\mathbf{H}_{i}. \end{aligned} $$
(101)
Substituting (2) into (101) yields the following recursion:
$$ \mathbf{P}^{-1}_{k,i}=\lambda_{k}(i)\mathbf{P}^{-1}_{k,i-1}+\sum_{m=1}^{N}\frac{C_{m,k}}{\sigma_{v,m}^{2}}\mathbf{u}_{m,i}\mathbf{u}_{m,i}^{*}. $$
(102)
By employing the iterative Eq. (102), we can write
$$ \begin{aligned} \mathbf{P}_{k,i}^{-1}&=\sum\limits_{l=1}^{N}\frac{C_{l,k}}{\sigma_{,}^{2}}\mathbf{u}_{l,i}\mathbf{u}_{l,i}^{*}+{\lambda}_{k}(i)\sum_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{u}_{l,i-1}\mathbf{u}_{l,i-1}^{*}\\ &\quad+{\lambda}_{k}(i){\lambda}_{k}(i-1)\sum_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{u}_{l,i-2}\mathbf{u}_{l,i-2}^{*}+\ldots\\ &\quad+\prod\limits_{j=i}^{1}{\lambda}_{k}(j)\sum\limits_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{u}_{l,0}\mathbf{u}_{l,0}^{*}+\prod_{j=i}^{0}{\lambda}_{k}(j)\boldsymbol{\Pi}. \end{aligned} $$
(103)
Recalling Assumption 1, we know that the correlation matrix of the input vector is invariant over time, as a result, the correlation matrix \(\mathbf {R}_{u_{l,i}}\) can be represented as \(\mathbf {R}_{u_{l}}\). Therefore, by taking expectations on both sides of (103), we obtain the following result
$$ {{\begin{aligned} E\left[\mathbf{P}_{k,i}^{-1}\right]&=\sum_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}+E[\lambda_{k}(i)]\sum\limits_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}+E\left[\lambda_{k}(i)\right.\\ &\quad\left.\times\lambda_{k}(i-1)\right]\sum\limits_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}\,+\,\ldots\,+\,E\left[\!\prod\limits_{j=i}^{1}\lambda_{k}(j)\right]\!\sum_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}\\ &\quad+E\left[\prod\limits_{j=i}^{0}\lambda_{k}(j)\right]\boldsymbol{\Pi}. \end{aligned}}} $$
(104)
In view of Assumption 2, (104) can be approximately rewritten as
$$ {{\begin{aligned} E\left[\mathbf{P}_{k,i}^{-1}\right]&\approx\left(1+E[\lambda_{k}(i)]+\ldots+E[\lambda_{k}(i)]^{i-N_{i}+1}\right)\sum\limits_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}\\ &\quad+E[\lambda_{k}(i)\lambda_{k}(i-1)\ldots\lambda_{k}(N_{i})]E\left[{\vphantom{\prod\limits_{j=N_{i}-1}^{1}}}\lambda_{k}(N_{i}-1)\right.\\ &\left.\quad+\lambda_{k}(N_{i}-1)\lambda_{k}(N_{i}-2)+\ldots+\prod\limits_{j=N_{i}-1}^{1}\lambda_{k}(j)\right]\sum\limits_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}\\ &\quad+E\left[\prod\limits_{j=i}^{N_{i}}\lambda_{k}(j)\right]E\left[\prod\limits_{j=N_{i}-1}^{0}\lambda_{k}(j)\right]\boldsymbol{\Pi}\\ &\approx\left(1+E[\lambda_{k}(i)]+\ldots+E[\lambda_{k}(i)]^{i-N_{i}+1}\right)\sum\limits_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}}\\ &\quad+E[\lambda_{k}(i)]^{i-N_{i}+1}(\xi+\chi) \end{aligned}}} $$
(105)
where ξ and χ can be expressed as follows, respectively:
$$ \begin{aligned} \xi&=E\left[{\vphantom{\prod_{j=N_{i}-1}^{1}}}\lambda_{k}(N_{i}-1)+\lambda_{k}(N_{i}-1)\lambda_{k}(N_{i}-2)\right.\\ &\quad\left.+\ldots+\prod_{j=N_{i}-1}^{1}\lambda_{k}(j)\right]\sum_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}} \end{aligned} $$
(106)
and
$$ \chi=E\left[\prod_{j=N_{i}-1}^{0}\lambda_{k}(j)\right]\boldsymbol{\Pi}. $$
(107)
Since n i is a finite positive number, ξ and χ are two deterministic values. In addition, note that λ k (i) does not exceed its upper bound λ +, which is smaller than but close to unity. Therefore, we have 0<E[λ k (i)]<λ +<1, and \(E[\lambda _{k} (i)]^{i-N_{i}+1}<\lambda _{+}^{i-N_{i}+1}\). When i is large enough, \(\lambda _{+}^{i-N_{i}+1}\) approaches zero, and, of course, \(\phantom {\dot {i}\!}E[\lambda _{k} (i)]^{i-N_{i}+1}\) also approaches zero. As a result, the last term in (105) vanishes. Then, we obtain the following result:
$$ {\lim}_{i\to\infty}E\left[\mathbf{P}_{k,i}^{-1}\right]=\frac{1}{1-E[\lambda_{k}(\infty)]}\sum_{l=1}^{N}\frac{C_{l,k}}{\sigma_{v,l}^{2}}\mathbf{R}_{u_{l}} $$
(108)

where the values of λ k () is given in (24) for the LTVFF mechanism and in (35) for the LCTVFF mechanism, respectively. Hence, we obtain (39). Note that, by setting appropriate truncation bounds for λ k (i), the steady-state forgetting factor value will not be influenced by the truncation. Hence, the result (39) always holds true despite the truncation employed to the VFF mechanisms. Indeed, the truncation mechanism only plays a role during the process of converging. Once the algorithms reach the steady state, the values of the forgetting factor are not affected by the truncation mechanism any longer.

Declarations

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61471319, the Scientific Research Project of Zhejiang Provincial Education Department under Grant Y201122655, and the Fundamental Research Funds for the Central Universities.

Authors’ contributions

YC and RCdL proposed the original idea. LZ carried out the experiment. In addition, LZ and YC wrote the paper. CL and RCdL supervised and reviewed the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

(1)
College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, People’s Republic of China
(2)
CETUC-PUC-Rio, Rio de Janeiro, Brazil

References

  1. P Corke, T Wark, R Jurdak, W Hu, P Valencia, D Moore, Environmental wireless sensor networks. Proc. IEEE.98(11), 1903–1917 (2010).View ArticleGoogle Scholar
  2. JG Ko, C Lu, MB Srivastava, JA Stankovic, A Terzis, M Welsh, Wireless sensor networks for healthcare. Proc. IEEE.98(11), 1947–1960 (2010).View ArticleGoogle Scholar
  3. R Abdolee, B Champagne, AH Sayed, in Proc. IEEE Statistical Signal Processing Workshop. Diffusion LMS for Source and Process Estimation in Sensor Networks (IEEEAnn Arbor, 2012).Google Scholar
  4. R Abdolee, B Champagne, AH Sayed, in Proc. IEEE ICASSP. Diffusion LMS Localization and Tracking Algorithm for Wireless Cellular Networks (IEEEVancouver, 2013).Google Scholar
  5. R Abdolee, B Champagne, AH Sayed, Diffusion adaptation over multi-agent networks with wireless link impairments. IEEE Trans. Mob. Comput. 15(6), 1362–1376 (2016).View ArticleGoogle Scholar
  6. FS Cattiveli, CG Lopes, AH Sayed, in Proc. IEEE Workshop Signal Process. Advances Wireless Commun. (SPAWC). A Diffusion RLS Scheme for Distributed Estimation over Adaptive Networks (IEEEHelsinki, 2007), pp. 1–5.Google Scholar
  7. FS Cattiveli, CG Lopes, AH Sayed, Diffusion recursive least-squares for distributed estimation over adaptive networks. IEEE Trans. Signal Process.56(5), 1865–1877 (2008).MathSciNetView ArticleGoogle Scholar
  8. FS Cattiveli, AH Sayed, Diffusion LMS strategies for distributed estimation. IEEE Trans. Signal Process.58(3), 1035–1048 (2010).MathSciNetView ArticleGoogle Scholar
  9. CG Lopes, AH Sayed, Diffusion least-mean squares over distributed networks: formulation and performance analysis. IEEE Trans. Signal Process.56(7), 3122–3136 (2008).MathSciNetView ArticleGoogle Scholar
  10. Y Liu, C Li, Z Zhang, Diffusion sparse least-mean squares over networks. IEEE Trans. Signal Process.60(8), 4480–4485 (2012).MathSciNetView ArticleGoogle Scholar
  11. S Xu, RC de Lamare, HV Poor, in Proc. IEEE ICASSP. Adaptive link selection strategies for distributed estimation in diffusion wireless networks (IEEEVancouver, 2013).Google Scholar
  12. S Xu, RC de Lamare, HV Poor, Distributed compressed estimation based on compressive sensing. IEEE Signal Process. Lett.22(9), 1311–1315 (2015).View ArticleGoogle Scholar
  13. MOB Saeed, A Zerguine, SA Zummo, A variable step-size strategy for distributed estimation over adaptive networks. EURASIP J. Adv Signal Process. 2013(1), 1–14 (2013).View ArticleGoogle Scholar
  14. H Lee, S Kim, J Lee, W Song, A variable step-size diffusion LMS algorithm for distributed estimation. IEEE Trans. Signal Process.63(7), 1808–1820 (2015).MathSciNetView ArticleGoogle Scholar
  15. Z Liu, Y Liu, C Li, Distributed sparse recursive least-squares over networks. IEEE Trans. Signal Process.62(6), 1386–1395 (2014).MathSciNetView ArticleGoogle Scholar
  16. S Huang, C Li, Distributed sparse total least-squares over networks. IEEE Trans. Signal Process.63(11), 2986–2998 (2015).MathSciNetView ArticleGoogle Scholar
  17. C Li, P Shen, Y Liu, Z Zhang, Diffusion information theoretic learning for distributed estimation over network. IEEE Trans. Signal Process.61(16), 4011–4024 (2013).MathSciNetView ArticleGoogle Scholar
  18. Z Liu, C Li, Y Liu, Distributed censored regression over networks. IEEE Trans. Signal Process.63(20), 5437–5449 (2015).MathSciNetView ArticleGoogle Scholar
  19. S Haykin, Adaptive Filter Theory, 4th edn (Prentic-Hall, Englewood cliffs, 2000).MATHGoogle Scholar
  20. S Leung, CF So, Gradient-based variable forgetting factor RLS algorithm in time-varying environments. IEEE Trans. Signal Process. 53(8), 3141–3150 (2005).MathSciNetView ArticleGoogle Scholar
  21. CF So, SH Leung, Variable forgetting factor RLS algorithm based on dynamic equation of gradient of mean square error. Electron. Lett.37(3), 202–203 (2011).View ArticleGoogle Scholar
  22. S Song, J Lim, S Baek, K Sung, Gauss Newton variable forgetting factor recursive least squares for time varying parameter tracking. Electron. Lett.36(11), 988–990 (2000).View ArticleGoogle Scholar
  23. S Song, J Lim, SJ Baek, K Sung, Variable forgetting factor linear least squares algorithm for frequency selective fading channel estimation. IEEE Trans. Vehi. Techonol. 51(3), 613–616 (2002).View ArticleGoogle Scholar
  24. F Albu, in Proc. of ICARCV 2012. Improved Variable Forgetting Factor Recursive Least Square Algorithm (IEEEGuangzhou, 2012).Google Scholar
  25. Y Cai, RC de Lamare, M Zhao, J Zhong, Low-complexity variable forgetting factor mechanisms for blind adaptive constrained constant modulus algorithms. IEEE Trans. Signal Process.60(8), 3988–4002 (2012).MathSciNetView ArticleGoogle Scholar
  26. L Qiu, Y Cai, M Zhao, Low-complexity variable forgetting factor mechanisms for adaptive linearly constrained minimum variance beamforming algorithms. IET Signal Process. 9(2), 154–165 (2015).View ArticleGoogle Scholar
  27. R Arablouei, K Dogancay, S Werner, Y Huang, Adaptive distributed estimation based on recursive least-squares and partial diffusion. IET Signal Process. 62(14), 1198–1208 (2014).MathSciNetGoogle Scholar
  28. DS Tracy, RP Singh, A new matrix product and its applications in partitioned matrix differentiation. Statistica Neerlandica. 51(3), 639–652 (2003).Google Scholar
  29. H Shin, AH Sayed, in Proc. IEEE ICASSP. Transient Behavior of Affine Projection Algorithms (IEEEHong Kong, 2003).Google Scholar
  30. JH Husoy, MSE Abadi, in IEEE MELECON 2004. A Common Framework for Transient Analysis of Adaptive Filters (IEEEDubrovnik, 2004).Google Scholar
  31. AH Sayed, Adaptive filters (Wiley, 2011).Google Scholar
  32. JA Bazerque, GB Giannakis, Distributed spectrum sensing for cognitive radio networks by exploiting sparsity. IEEE Trans. Signal Process.58(3), 1847–1862 (2010).MathSciNetView ArticleGoogle Scholar
  33. S Chen, DL Donoho, MA Saunders, Atomic decomposition by basis pursuit. SIAM J. Sci Comput. 20:, 33–61 (1998).MathSciNetView ArticleMATHGoogle Scholar
  34. Y Zakharov, T Tozer, J Adlard, Polynomial splines-approximation of Clarke’s model. IEEE Trans. Signal Process.52(5), 1198–128 (2004).MathSciNetView ArticleGoogle Scholar
  35. PD Lorenzo, S Barbarossa, A Sayed, Distributed spectrum estimation for small cell networks based on sparse diffusion adaptation. IEEE Signal Process. Lett.20(123), 1261–1265 (2013).View ArticleGoogle Scholar
  36. ID Schizas, G Mateos, GB Giannakis, Distributed LMS for consensus-based in-network adaptive processing. IEEE Trans. Signal Process.57(6), 2365–2382 (2009).MathSciNetView ArticleGoogle Scholar

Copyright

© The Author(s) 2017

Advertisement