In its original definition, the nodal load observer is considered with a simple dynamic model for the deviations of the pseudomeasurements [14]:
$$\begin{array}{@{}rcl@{}} \Delta S(k+1) = \gamma \Delta S(k), \end{array} $$
(12)
with γ∈(0,1). However, as already mentioned in [11], more flexible models are required in order to improve the estimation quality of the nodal load observer. Autoregressive moving average models (ARMA) provide a versatile representation of stochastic processes in a sequential way, which is ideal for the use with a Kalmanlike filter method. Thus, we here propose their application with the NLO as a means to model the errors in the pseudomeasurements. The benefits of this approach are the possible incorporating of correlation between buses and the improved flexibility for the state estimator.
An ARMA (p, q) model of order (p, q) consists of two parts, an autoregressive (AR) part of order p and a movingaverage (MA) part of order q:
$$\begin{array}{@{}rcl@{}} x(k) = \mu + \sum_{i=1}^{p} \varphi_{i} x(ki) + \sum_{i=1}^{q} \theta_{i} w(ki) + w(k), \end{array} $$
(13)
where w(k) is a white noise stochastic process with known variance E[w^{2}(k)]=σ^{2}(k), μ is the expected value of x(t), and φ_{i},θ_{i} are the parameters of the AR and MA models, respectively. The original dynamical model Eq. 12 is thus an ARMA (1,0) type model, which can also be written as AR(1).
Integration of an AR model with the NLO is possible in a straightforward way as part of the statespace model. For instance, considering an AR(2) model in state space for the disturbance ΔS(k) and substituting it into the system model Eqs. 9 and 10 with an appropriate replacement, the following system equations are obtained:
$$\begin{array}{@{}rcl@{}} \left(\! \begin{array}{c} \Delta S(k+1)\\ \Delta S(k) \end{array} \!\right) &=& \left(\! \begin{array}{cc} \varphi_{1} & \varphi_{2}\\ 1 & 0 \end{array} \!\right) \left(\! \begin{array}{c} \Delta S(k)\\ \Delta S(k1) \end{array} \!\right) + \\ &+ & \left(\! \begin{array}{c} \text{const}\\ 0 \end{array} \!\right) + \left(\! \begin{array}{c} w(k)\\ 0 \end{array} \!\right) \\ V(k) &=& h(\Delta S(k),S(k)) + v(k). \end{array} $$
(14)
It is well known that for linear statespace models, the Kalman filter produces optimal estimates for ARlike state evolution models [20]. In principle, the extended Kalman filter can be applied in the same way for nonlinear measurements with ARlike state evolutions. However, care has to be taken regarding the convergence of the resulting Kalman filter estimation, e.g., by means of carefully choosing suitable AR coefficients. That is, for certain sets of AR coefficients, the state estimation result can diverge indefinitely. In practice, this may require a repeated application of the extended Kalman filter with different choices for the AR coefficients. However, only significant deviations from an optimal choice of coefficient could be determined in such a way as the true value remains unknown. This would make the practical application of the NLO with AR dynamic model almost impossible. Therefore, we here propose the incorporation of recently developed online learning algorithms for AR processes with the extended Kalman filter of the NLO.
3.1 Online parameter learning
Online or recursive estimation methods for parametric stochastic processes appeared in the middle of the 20th century to replace the socalled offline methods in which first all data is collected and then the model parameters are estimated. Offline estimation approaches are less efficient due to expensive computation time costs, power, and memory. This is why algorithms which allow estimating the model parameters when new data is available during the operation are applied in different areas of engineering (see for example [21, 22]). These methods are typically called online.
An investigation of various recursive estimation methods available from the literature showed the recursive maximum likelihood (RML) method originally presented by [23] to be most suitable for an application with the NLO. Several modifications of this method can be found in the literature [15, 24]. Our analysis is based on the algorithms introduced in [15], where also a proof of convergence is given.
Consider realvalued observations {x_{t}; t=1,⋯,N} of an ARMA (p, q) process as defined in Eq. 13 and let β=(φ_{1},⋯,φ_{p},θ_{1},⋯,θ_{q})^{T} be the vector of sought coefficients with the corresponding estimate at time t being: \({\beta _{t}} = (\hat {\varphi }_{1,t}, \cdots, \hat {\varphi }_{p,t}, \hat {\theta }_{1,t}, \cdots, \hat {\theta }_{q,t})^{T}\). For a given β_{t}, the forecast \(\hat {x}_{t}(\beta _{t})\) can be computed from the ARMA model definition by using the residuals \(\varepsilon _{t1} = x_{t1}  \hat {x}_{t1}(\beta _{t1})\) as the driving noise w_{t}.
Let denote the opposite of the derivative of the residuals ε_{t}(β_{t}) with respect to β_{t} by the vector \(\psi _{t}({\beta _{t}})=\left [\frac {\partial \varepsilon _{t}({\beta _{t}})}{\partial {\beta _{t}}^{T}}\right ]^{T}\). The elements of this vector can be calculated analytically from the definition of ε and β_{t} [15].
Adopting the algorithms from [15], we derive the algorithm for the estimation of β as follows:

1.
With \({\phi _{t}^{T}} = (x_{t}, \cdots, x_{tp+1}, w_{t}, \cdots, w_{tq+1})\) update the gradient by
$$\begin{array}{@{}rcl@{}} \psi_{t} &=& \sum_{k=1}^{q} \hat{\theta}_{k,t1} \psi_{tk} + \phi_{t1}; \end{array} $$
(15)

2.
Calculate the forecasting error as
$$\begin{array}{@{}rcl@{}} \varepsilon_{t} &=& x_{t}  {\beta_{t1}}^{T} \phi_{t1} ; \end{array} $$
(16)

3.
Update the estimate of β_{t} using the quasiNewton step
$$\begin{array}{@{}rcl@{}} {\beta_{t}} &=& {\beta_{t1}} + \gamma_{t} \hat{\sigma}_{t}^{2}I\psi_{t}\varepsilon_{t}, \end{array} $$
(17)
with I denoting the identity matrix.

4.
Update the estimate of the ARMA noise process variance σ^{2} by
$$\begin{array}{@{}rcl@{}} \hat{\sigma}_{t+1}^{2} &=& \hat{\sigma}_{t}^{2} + \gamma_{t} ({\varepsilon_{t}^{2}}  \hat{\sigma}_{t}^{2}). \end{array} $$
(18)
Figure 1 illustrates the application of the above estimation algorithm for an AR(2) process with true parameters being φ_{1}=1.15, φ_{2}=−0.15, and σ^{2}=0.01. For this result, we used the following initial values: \(\hat {\varphi }_{1,0}=1.5\), \(\hat {\varphi }_{2,0}=0.5\), and \(\hat {\sigma }^{2}_{0} = 10\). We generated 10,000 realizations of length t=100 and computed estimates of φ with \(\gamma _{t} = \frac {1}{t}\) for each realization. The obtained results clearly illustrate the convergence of the method to the true value, which empirically confirms the proof of convergence from [15].
3.2 NLO with online learning technique
In this section, we present the final algorithm, which is obtained by combining the basic procedure of the extended Kalman filter, the nodal load observer for power distribution grids, and the considered online learning technique for AR process parameters.
Initial values \(\hat {x}(0)\), P(0), \(\hat {V}(0)\), β(0), and σ(0) are chosen, and the matrix C_{m} set up with one entry equal to 1 in every row such that V_{m}=C_{m}V, with V as the vector of all nodal voltages in rectangular coordinates and V_{m} as the corresponding measured values. The NLO algorithm [14] with adaptive dynamic model [15] can then be written as follows.
The Kalman prediction step is given by
$$\begin{array}{@{}rcl@{}} \hat{x}(k,k1)&=& \left(\begin{array}{cc} \varphi_{1} & \varphi_{2} \\ 1 & 0\end{array}\right) \left(\begin{array}{c} \hat{x}(k1) \\ \hat{x}(k2)\end{array}\right)\\ P(k,k1) &=& \left(\begin{array}{cc} \varphi_{1} & \varphi_{2} \\ 1 & 0\end{array}\right)P(k1) \left(\begin{array}{cc} \varphi_{1} & 1 \\ \varphi_{2} & 0\end{array}\right)+\\ &+&Q(k,\sigma). \end{array} $$
With initial values \(\eta (0) = \hat {x}(k,k1)\), \(\nu (0)=\hat {V}(k1),\) and j=0 for the measurement update iteration, an estimate of nodal voltages is then obtained from the power flow (PF) calculation:
$$\nu(j+1)= \text{PF} (D_{m}S_{m}(k)+D_{nm}(S^{\text{pm}}(k)+\eta(j)), V(k), \nu(j)). $$
With H(k, j)
$$\begin{array}{@{}rcl@{}} H(k,j) =C_{m} \left(\frac{\partial}{\partial v} \text{diag}(v)Yv \vert_{v=\nu(j+1)}\right)^{1} D_{nm} \end{array} $$
the Kalman gain is calculated as
$$\begin{aligned} K(k,j) &= P(k,k1)H^{T}(k,j) \times \\ &\quad \times (H(k,j)P(k,k1)H^{T}(k,j)+R(k))^{1}, \end{aligned} $$
which yields the updated estimate of the errors in the pseudomeasurements as
$${\begin{aligned} \hat{x}(k) &= \hat{x}(k,k1) + K(k,j) \times \\ &\quad \times [\! V_{m}(k)\,\, C_{m}\nu(j+1)\,\, H(k,j) (\hat{x}(k,k1) \,\, \eta(j))], \end{aligned}} $$
with updated covariance matrix given by
$$\begin{array}{*{20}l} P(k) = (IK(k,j1)H(k,j1))P(k,k1). \end{array} $$
The updated estimates of bus power and nodal voltage are then calculated as
$$\begin{array}{@{}rcl@{}} \hat{S}(k)&=& D_{m}S_{m}(k)+D_{nm}(\hat{x}(k)+S^{\text{pm}}(k))\\ \hat{V}(k) &=& \nu(k). \end{array} $$
Utilizing the updated state estimates, the AR model parameters are then updated as follows:
$$\begin{array}{@{}rcl@{}} \psi(k) &=& (\hat{x}(k1),\hat{x}(k2))^{T} \\ \varepsilon(k) &=& \hat{x}(k) \hat{\beta}(k1)^{T} \left[\hat{x}(k1),\hat{x}(k2)\right]^{T}, \\ \hat{\beta}(k) &=& \hat{\beta}(k1) + \gamma(k) \hat{\sigma}^{2}(k)I_{2}\psi(k)\varepsilon(k),\\ \hat{\sigma}^{2}(k+1)&=& \hat{\sigma}^{2}(k) + \gamma(k) (\varepsilon^{2}(k)  \hat{\sigma}^{2}(k)), \end{array} $$
with damping sequence \(\gamma (k)=\frac {1}{k}\).
The selection of initial values for the online learning of AR parameters is an important aspect in the estimation algorithm. For example, the initial value for the noise variance \({\sigma _{t}^{2}}\) is preferable to be taken bigger than needed with expectation to convert to the innovation variance \(\frac {1}{t}\sum _{k=1}^{t}{\varepsilon _{t}^{2}}(\beta)\). The second significant issue is the choice of γ. Theoretically, it is recommended to choose \(\gamma _{t}=\frac {1}{t}\), although in practice, it should be selected such in a way to improve the convergence. The authors in [15] proposed to define γ_{t} through the “forgetting” factor λ_{t} as follows:
$$\begin{array}{@{}rcl@{}} \gamma_{t} = \frac{\gamma_{t1}}{\lambda_{t} + \gamma_{t1}} \end{array} $$
(19)
with
$$\begin{array}{@{}rcl@{}} \lambda_{t} = \lambda^{0} \lambda_{t1} +(1\lambda^{0}), \end{array} $$
(20)
while different values of λ^{0} can be chosen. In the example shown in Section 4, γ_{0}=1.0 and λ_{0}=1.0 with λ^{0}=0.99 were identified as good choice based on validation tests made for different φ values.
In order to obtain initial estimates of suitable AR parameters, we fitted an AR(2) process to the difference ΔS(k)=S^{true}(k)−S^{pm}(k) at a bus with measured bus power available.