Iterative Unbiased Fir State Estimation: a Review of Algorithms

In this paper, we develop in part and review various iterative unbiased finite impulse response (UFIR) algorithms (both direct and two-stage) for the filtering, smoothing, and prediction of time-varying and time-invariant discrete state-space models in white Gaussian noise environments. The distinctive property of UFIR algorithms is that noise statistics are completely ignored. Instead, an optimal window size is required for optimal performance. We show that the optimal window size can be determined via measurements with no reference. UFIR algorithms are computationally more demanding than Kalman filters, but this extra computational effort can be alleviated with parallel computing, and the extra memory that is required is not a problem for modern computers. Under real-world operating conditions with uncertainties, non-Gaussian noise, and unknown noise statistics, the UFIR estimator generally demonstrates better robustness than the Kalman filter, even with suboptimal window size. In applications requiring large window size, the UFIR estimator is also superior to the best previously known optimal FIR estimators. 1 Review 1.1 Introduction In optimal estimation theory, unbiasedness is a key condition that is used to derive linear and nonlinear estimators. A classical example is the ordinary least squares (OLS) estimator proposed by Gauss in 1795 [1]. The Gauss-Markov theorem states that if the noise is white and has the same variance at each time step, the OLS esti-mator is also the best linear unbiased estimator (BLUE) [2]. In convolution-based optimal filtering, the unbiased-ness constraint [2] leads to the unbiased finite impulse response (UFIR) estimator [3,4]. An extremely useful property of the BLUE and UFIR is that noise statistics are not required. Another example is the maximum likelihood estimator (MLE), which obtains the estimate at an extremum of the density function of the state conditioned on the measurements [5]. Like the BLUE and UFIR, the MLE is suboptimal for finite data. However, if the sample size (memory) increases to infinity, each of them are optimal. For finite data, unbiasedness does not guarantee the minimum mean square error (MSE), which is comprised of standard deviation and bias: MSE = Var + Bias 2 , (1) where 'Var' is the error variance. Since the minimum MSE is required by many applications, a minimization of (1) is often desired at the expense of a small increase in bias. That leads to different kinds of optimal solutions such as the minimum variance unbiased estimator (MVUE), the recursive Kalman filter [6], and the optimal FIR …


Introduction
In optimal estimation theory, unbiasedness is a key condition that is used to derive linear and nonlinear estimators. A classical example is the ordinary least squares (OLS) estimator proposed by Gauss in 1795 [1]. The Gauss-Markov theorem states that if the noise is white and has the same variance at each time step, the OLS estimator is also the best linear unbiased estimator (BLUE) [2]. In convolution-based optimal filtering, the unbiasedness constraint [2] leads to the unbiased finite impulse response (UFIR) estimator [3,4]. An extremely useful property of the BLUE and UFIR is that noise statistics are not required. Another example is the maximum likelihood estimator (MLE), which obtains the estimate at an extremum of the density function of the state conditioned on the measurements [5]. Like the BLUE and UFIR, the MLE is suboptimal for finite data. However, if the sample size (memory) increases to infinity, each of them are optimal.
where 'Var' is the error variance. Since the minimum MSE is required by many applications, a minimization of (1) is often desired at the expense of a small increase in bias. That leads to different kinds of optimal solutions such as the minimum variance unbiased estimator (MVUE), the recursive Kalman filter [6], and the optimal FIR (OFIR) filter [7,8]. The common disadvantage of these filters is that noise statistics and initial errors are required. In view of the fact that noise statistics and initial errors are commonly not well known, especially for time-variant models, theoretically optimal estimators end up being suboptimal in practical applications. In this regard, engineering experience says the following [9]: Practical implementation of the Kalman filter is often difficult due to the inability in getting a good estimate of the noise covariance matrices.
That means that due to insufficient knowledge about noise statistics, optimal estimators that minimize (1) may be less accurate than unbiased ones that are derived under http://asp.eurasipjournals.com/content/2013/1/113 the invariant E{x n −x n } = 0, which leads to the unbiasedness condition E{x n } = E{x n } , ( 2 ) where x n indicates a state variable at discrete time step n, x n its estimate, and E{x} is the expected value of x. Note that the cost of equipment that is required for the characterization of noise statistics cannot commonly be afforded by users, and methods for the estimation of noise covariance matrices via measurements are not well developed.
On the other hand, noise statistics are not always necessary to get a good estimate that we illustrate below based on an example.

Example 1.
A linear signal x n is measured as z n = x n + v n in the presence of zero mean white Gaussian noise v n having variance σ 2 v = 1. The p-shift ramp UFIR estimator matches this signal and is given by the convolution-based estimatê where the impulse response function is (eq. (89) of [17]) p = 0 corresponds to filtering, p < 0 corresponds to |p|lag smoothing, and p > 0 corresponds to p-step prediction.
The estimation variance is defined as σ 2 = G(N, p)σ 2 v = G(N, p), where G(N, p) = N−1 i=0 h 2 1i (N, p) is the noise power gain (NPG) [11], The estimation variance is sketched in Figure 1 as a function of N. Here, the 1/N bound is obtained by simple averaging that is optimal in the sense of error variance, although with a 50% bias for a linear x n . The case of p = 0 corresponds to the ramp UFIR filter with h 1n (N, 0) given by (4), and we notice that denoising is inherently less efficient in this case. If we set p = −N/2, the (N/2)-lag smoother estimation error variance rapidly converges to that of simple averaging. A similar effect can be observed in the one-step prediction error variance with p = 1. Here, a large error variance for small values of N reduces to that of the ramp filter as N increases. A common feature of these plots is that the error variances decrease with the reciprocal of N. That means that noise in UFIR estimators with large memory N 1 may be very low and the estimate may be almost optimal. This leads to the following statement: There is no need to use optimal estimators in many applications. UFIR structures that ignore noise statistics and initial estimation error statistics are able to produce acceptable suboptimal estimates. UFIR estimators have attracted researcher's attention for decades, beginning with the work of Johnson [12] and others, in which they extended the Wiener filter theory to discrete finite time. Further, the ability of UFIR estimators to produce nearly optimal estimates while ignoring noise statistics was greatly regarded in the development of estimators for polynomial signals [13][14][15]. Most recently UFIR methods were extended to state space in batch form [3,4,[16][17][18] and in an iterative Kalman-like form [19,20]. The latter has made the UFIR estimator a significant rival of the Kalman filter and its applications can be found in [10,[21][22][23][24]. Even so, UFIR estimators still remain somewhat beyond the typical range of traditional signal processing techniques.
The basic operating principles of the optimal Kalman and UFIR filters are summarized in Figure 2. At time n, the Kalman filter requires the noise statistics at time n − 1, such as the process and measurement noise covariance matrices Q n−1 and R n−1 respectively, as well as the estimation error covariance P n−1 . The optimal UFIR filter ignores these statistics. Instead, it requires the optimal averaging interval of N opt points in order to be optimal.
In this paper, we develop in part the results achieved in the field of UFIR filtering and review a family of iterative UFIR algorithms for filtering, smoothing, and prediction of time-varying (TV) and time-invariant (TI) discrete state-space models in white Gaussian noise environments. The following definitions will be used: UFIR estimator satisfies the unbiasedness condition (2), OFIR estimator minimizes the MSE (1), and Optimal UFIR (OUFIR) estimator minimizes the MSE in the UFIR estimator by using a window size N opt . http://asp.eurasipjournals.com/content/2013/1/113 Figure 2 Basic operating diagrams of the optimal Kalman and UFIR filters. Section 1.2 presents the linear state-space model, formulates the problem, and considers the batch p-shift UFIR estimator along with the generalized NPG. Section 1.3 presents two forms of the p-shift iterative UFIR algorithm. Section 1.4 discusses the estimation errors of the UFIR estimators. Sections 1.5, 1.6, and 1.7 give the reader a number of practical algorithms for filtering, smoothing, and prediction. Section 1.8 considers an extension to nonlinear systems. Section 1.9 discusses methods for the determination of the optimal memory size N opt . Finally, section 1.10 concludes with some useful generalizations.

Linear model and batch UFIR estimator
Consider a class of discrete TV linear models represented in state space with the state and observation equations as follows: where x n ∈ K and z n ∈ M are the state and observation vectors, respectively. Here, F n ∈ K ×K , B n ∈ K ×P , and H n ∈ M×K . Let us suppose that the state noise vector w n ∈ P and measurement noise vector v n ∈ M have zero mean white Gaussian components, E{w n } = 0 and E{v n } = 0. We also assume that these vectors are mutually uncorrelated, E{w i v T j } = 0, for all i and j, and have the following covariances: where Q n and R n may be unknown to the engineer. Now suppose that the p-shift estimate ax n+p|n of x n is provided at time n + p with the UFIR estimator proposed in [19,20]. We would like to modify this estimator and review engineering algorithms for different kinds of filtering, q-lag smoothing, and p-step prediction. We also wish to estimate the estimation errors and generalize the properties to facilitate a comparison with the OFIR and Kalman algorithms.

Time-variant models
In convolution-based filtering (3), we suppose that measurements z n are available on a time horizon of N points (memory b ), from time m = n − N + 1 to time n, that the estimator is causal, and that m 0. In order to findx n+p in state space, the batch p-shift UFIR estimator [8,20] can be applied. For TV models, the p-shift UFIR estimator was derived in [8], assuming that the negative shift p is no smaller than −N + 1. Below, we modify this estimator for arbitrary p, which is needed for one of the smoother forms.
Let p = −N + 1 and consider the estimate (eq. (21)) of [19]) at the initial point m that gives uŝ where H −1 n,m = (H T n,m H n,m ) −1 H T n,m is the generalized left inverse, and One can notice that (10) is reminiscent of the familiar OLS or BLUE, although the matrices are different.
To provide the estimate for any p, we find the state transition matrix B n,m (N, p) by writing (10) asx n+p|n = B n,m (N, p)x m|n . By combining the forward-time and backward-time solutions [26], B n,m (p) B n,m (N, p) becomes where N 1 = −N + 1. The most general batch form of the p-shift UFIR estimator for TV models is thuŝ where A n,m (p) is the UFIR estimator gain; and p can be arbitrary, −∞ p ∞. In the case of −N + 1 < p < 0, one may also use a particular form of (17b) shown in [19], (21) with B n,m (p) = F m+1 n+p,0 . If we now observe that the filter estimate with p = 0 iŝ then (17b) can alternatively be written aŝ This suggests that prediction and smoothing can be organized based on the filtering estimate (18) if we use an auxiliary p-shift gain matrix. We will show below that (19) plays an important part in the design of UFIR algorithms.

Time-invariant models
In the special TI case, we have B n,m (p) = F n−m+p = F N−1+p and the estimator becomeŝ Following (19), the estimate (20b) can alternatively be written as follows: where the TI filter estimate is given bŷ A distinctive feature of both TV and TI batch UFIR estimators is that they can be applied to models with noise having arbitrary distributions and covariances. They can also be represented in fast iterative Kalman-like forms using an auxiliary matrix called the generalized NPG (GNPG), which will be discussed next.

Generalized noise power gain
It follows from (5) that the NPG is a measure of how much the measurement noise is suppressed at the FIR estimator output. In state space, the NPG is defined via the MSE [27], with the assumption that B n = 0: where In view of the fact that the estimate is unbiased, two first-two terms in the brackets of (26) are zero by (2), which gives where E{V n,m V T n,m } is the measurement noise covariance on the averaging interval. A simplification follows instantly if one lets p = 0 and supposes that the model is one-state and time-invariant one. That leads to the estimation variance as follows: where the product A(N)A T (N) is known as the NPG [11]. More generally, the GNPG can thus be written as to characterize the noise strength at the estimator output.
In particular, if the GNPG is an identity matrix, then no noise reduction is provided by the estimator. If the GNPG has components that are equal to zero, then the noise is fully suppressed by the estimator. By transforming (31) and utilizing (17b), (18), and (19), one can find two equivalent GNPG forms corresponding to TV models: where the GNPG G n,m = G n,m (0) associated with filtering is given as follows: Similarly, using (20a), (24), and (25), the GNPG for TI models can be represented as follows: where the GNPG G(N) = G(N, 0) for filtering is Summarizing the generalizations provided for the batch UFIR estimator, we notice again that this estimator ignores noise statistics and initial errors in solving the problems of smoothing, filtering, and prediction in a unified scheme. Its important applied property is that the estimate becomes virtually optimal when N 1 [20]. On the other hand, large N leads to computational problems owing to the large dimensions of the augmented matrices and vectors. For fast computation, iterative Kalman-like UFIR forms can be used, which will be discussed next. http://asp.eurasipjournals.com/content/2013/1/113

Iterative Kalman-like UFIR estimation
Similar to the recursive OLS [28], the UFIR estimator can also be represented in a fast iterative form similar to the Kalman filter as shown in [19,20]. The iterative UFIR estimator requires that we start with initial values that are available from the batch algorithm, which typically requires matrix computations on the order of K × K dimensions, and then we iteratively update the estimator output. The state estimate is taken when an iterative variable reaches the current time n.

Time-varying models
For TV models, the estimates (17b) and (19) suggest two forms of iterative UFIR computation.
The direct form Following the derivations given in Appendices I and II of [20], the direct form of the iterative algorithm corresponding to (17b) is the following: where The bias correction gain K l K l,m (p) is given here as where the GNPG G l G l,m (p) is computed iteratively by The initial values,x s+p and G s , are computed in short batch forms aŝ where s = m + K − 1; and the iterative variable l ranges from m + K to n. The estimator output is taken when l = n. Since the UFIR estimate does not require initial conditions, one may approximately set (40) to zero and let (41) be the identity when N 1. However, this simplification may not always lead to good estimates for smaller values of N.
The estimate at time n + p appears in (36) from an iterative update beginning with time step m + K + p − 1. Its flaw is that |p| past points before the N-point estimator window are formally required for smoothing. This disadvantage is overcome in the two-stage form, which will be discussed next.
The two-stage form The batch estimate (19) suggests that another iterative UFIR form can be found if we first set p = 0 in (36) and find the filter estimate: in which and the initial values are given as follows: Here, s = m + K − 1 and l ranges from m + K to n, as before.
Givenx n from (42) with l = n, the p-shift estimate can then be computed utilizing (19) aŝ As can be seen, this second form available does not require extra data points before the filtering window. However, it requires two computational steps, unlike the direct form (36).

Time-invariant models
Employing (20b) and (24), the p-shift estimate for TI models can also be found in two equivalent iterative forms.

The direct form
If all of the model matrices are TI, we have Y l = F 1−p . Accordingly, (36) becomeŝ and the bias correction gain (38) attains the form where G l is computed iteratively as The initial values are computed aŝ where s = m + K − 1 and l ranges from m + K to n. The desired state estimate is taken at l = n.
The two-stage form Provided the TI filtering estimatê with and the initial conditionŝ where s = m + K − 1 and the iterative variable l ranges from m + K to n, the p-shift estimate can alternatively be computed by (24) as follows: One may conclude that the algorithm of (53) and (58) is very simple from a programming perspective. As was shown in [8,19,20] and in many other studies, the UFIR estimator is a strong rival to the Kalman filter if the noise covariances are not known exactly.

Estimation errors
Next, we discuss errors in UFIR estimators assuming white Gaussian noise. Given the instantaneous error where the estimatex n+p comes from either a TV or TI model, the MSE P n+p at time n + p can be defined as In spite of the fact that the UFIR estimator has two equivalent forms (batch and iterative), the MSE can rigorously be determined only via the batch form. Finding closed analytical solutions for the MSE via (19) and (25) implies a large mathematical burden and is still a challenging problem. On the other hand, a rigorous error computation may be unnecessary since estimation error covariances are not used in the UFIR algorithms, and so reasonable approximations can serve us well in practical applications. Such an approximation provided following [23] is given in the Appendix.
The MSE upper bound (UB) P UB n+p can be obtained from an iterative computation of (114) for the general TV model. Equation (114) implies that process noise covariances are accumulated at each iteration. Therefore, the predicted value from (114) is a bit larger than the actual estimation error covariance for small N and significantly larger for N 1. For the same reason, the estimate of (114) also diverges as p increases. The UB can thus be very useful for filtering (p = 0) when N is not large and for smoothing with small lags. In the case of prediction, the future noise is neglected in (114) so it can serve as a tight upper bound even for very large p.
The MSE lower bound (LB) can be found if we take into consideration the fact that if N N opt the UFIR estimator order fits the system order. Therefore the system noise can be neglected in (114) and the LB P LB n+p can be found by iterating until l reaches n. For TI models, (61) becomes Equations (61) and (62) correspond to the direct estimator forms of (36) and (48) respectively.
If one employs the two-stage forms of (47) and (58) and the MSE LB for filtering (p = 0), this is defined using (61) as follows: then the p-shift LB for TV and TI models can be computed, respectively, as follows: where P LB n is provided from (63) with l = n. Note that the LB is associated with the NPG and serves well in the three-sigma sense [27].

Filtering
Filtering is used when the state estimate is required at the current time point n. It can also be projected to the future (prediction), or smoothed by combining several filtering estimates from the past. By letting p = 0 in (36), the UFIR filtering estimate becomeŝ where the bias correction gain is and the GNPG G l is computed iteratively by The initial values for (66) and (67) are respectively specified in short batch forms as follows: where s = m + K − 1, F m+1 the iteration index l ranges from m + K to n, and the estimate of the current state is taken when l = n.
The MSE UB can be found for (66) by setting p = 0 in (114), which gives where the predicted (a priori) estimate covariance P − l P l|l−1 is given by with the initial value P LB l−1 specified as for the Kalman filter.
The LB appears from (72) by neglecting Q l , which gives where the initial value P LB l−1 can also be specified as in the Kalman filter.
It can easily be shown that (71) is the Kalman a posteriori estimate covariance, if we substitute K l with the Kalman gain. However, unlike the Kalman filter, (66) can be applied to deterministic models. If that is the case (R l = 0 and Q l = 0), then the estimation error is zero.
Several particular filtering solutions can now be discussed, which will be done in the following sections.

Fixed-horizon filtering
The fixed-horizon (fixed memory size N) iterative UFIR filtering algorithm is summarized for TV models in Table 1.
It is implied that measurements are available beginning at time index 0, and the time index n starts at N − 1. The initial valuesx s and G s are computed using (69) and (70), respectively. For each n, the iterative variable l increments from m + K to n, and the desired estimate is taken when l = n. Note that the estimation error computed by (71) is minimal if one sets N = N opt . A simplification for the TI model is straightforward. One must just let all of the matrices be TI in Table 1.

Full-horizon filtering
Full-horizon filtering can be applied to highly stable or highly predictable systems such as those in astronomy, precise clocks and oscillators [27], and others associated with near deterministic state-space models. The full-horizon TV algorithm is given in Table 2.
This algorithm is the most simple. It requires only the number of the states K since the filter memory window Table 1 Fixed-horizon TV UFIR filtering algorithm Update:

Instruction:
Use the estimate when l = n. Set:x K−1 by (69) and G K−1 by (70) for m = 0. Update: size changes with time; so, N = n+1. No additional information is needed, and the algorithm thus has extremely desirable engineering features. A natural extension of the TV algorithm (Table 2) to the TI case is provided by removing the time dependencies from the matrices. The MSE UB and LB can be computed by (71) and (73) if we substitute l with n. Note that the full-horizon UFIR filter may demonstrate substantial decrease in the output noise as n becomes large.

Tricky-horizon filtering
The tricky-horizon (time-variant memory size N) algorithm can be used in adaptive filtering [29,30] and whenever some reference information about the process behavior is available. It implies an individual N opt at each time index n. Such flexibility allows better system tracking with minimum residuals [19]. To implement tricky-horizon filtering, the algorithm (Table 1) can be used if we allow N to be variable.

Smoothing
Smoothing is commonly associated with a lag q > 0 relating the estimate at a given time index to measurements up to and including some past index. By combining 'future' and past estimates, it becomes possible to obtain better noise reduction for many practical applications. Note that an infinity of smoother solutions exists [31]. We will discuss two basic schemes for UFIR smoothers in this section.
The direct form By letting q = −p > 0 in (36), the smoothing estimate at n − q can be found iteratively as follows [23]: where Y l = F l−q l,0 = q i=0 F l−i and l ranges from m + K to n. The estimatex n−q is traditionally taken at l = n in each iterative cycle. The bias correction gain can be computed here using the following: l G l H T l , http://asp.eurasipjournals.com/content/2013/1/113 whereȲ l = Y l F −1 l−q and G l is given by (39b). The initial valuesx s and G l can be defined at s = m + K − 1 as, respectively, Following (114), the MSE UB for (74) can be found to be where P − l−q is given by (72) and The MSE LB is obtained by neglecting Q n as The two-stage form Provided the filtering estimate (66), the second form for the TV and TI UFIR smoothers become by (19) and (24) respectivelŷ whereB n,m (q) B n,m (N, q) is given by (77).
The relevant estimation error covariance LBs become, respectively, where P LB n is provided by (63) at l = n. As in filtering, here, the LB can serve well in the three-sigma sense [27].

Fixed-interval smoothing
Among various smoothing problems, the fixed-interval one is basic and often associated with smoothing [25,[32][33][34]. The fixed-interval UFIR smoother is intended to provide an estimatex n−q|n with any lag 0 < q < M utilizing measurement on a fixed interval of M points, from time index n − M + 1 to n. Although M may not be equal to N opt , UFIR smoothing is most efficient when M = N opt . In fact, If M > N opt , smoothing is inefficient when N opt < q < M, because q exceeds the length of the averaging interval and smoothing virtually provides the backward prediction, which has an estimation error larger than in filtering. On the other hand, N opt should not be larger than M, because M is commonly associated with an available database.
Provided M = N opt , two traditional forms can be suggested for fixed-interval UFIR smoothing. Table 3. A special peculiarity is that n starts at N − 1 + q in order for the smoother to process only nonnegative time indices. For TI models, the modification of this algorithm is straightforward.

The direct form This form implements (74) as listed in
The two-stage form The two-stage form implementing (82) can be used as shown in Table 4. To apply this algorithm to TI models, one must computex n−q = F −qx n . A common feature of this algorithm is that two stages are required: first filtering must be provided to get the estimate at time n, then the obtained filter estimate is projected to time n − q.

Fixed-lag smoothing
Fixed-lag smoothing is commonly used for denoising if a time delay of q points is allowed [31,32,35,36]. Two basic fixed-lag algorithms can be designed based on the UFIR technique.
Fixed-lag OUFIR smoothing Provided N opt , the fixed lag q can be specified based on the process properties to obtain the best denoising. Intuition indicates that smoothing is best if the estimation time is the center of the observation interval. This holds true if the polynomial describing the process Table 3 Direct fixed-interval TV OUFIR smoothing algorithm Stage Given: Set:x s by (75) and G s by (76) . Update: Set:x s by (75) and G s by (76) . Update: Usex n when l = n and computê behavior on the observation interval is of odd degree. Otherwise, if the degree is even, denoising may be better with shorter lags as shown in Figure eight in [17]. The fixed-lag OUFIR smoothing algorithm is listed in Table 4 if one sets N = N opt and q = constant. Its extension to the TI case can be provided by replacing thex n−q equation in Table 4 withx n−q = F −qx n .

Fixed-lag full-horizon UFIR smoothing
This approach implies that the filter window includes all the available data, but the lag is fixed. Examples can be found in highly predictable or quasi deterministic slow dynamics, for which the estimates at time n and n − q do not significantly vary from each other in terms of bias. The relevant algorithm for TV models is listed in Table 5. Its extension to the TI case can be obtained by replacing thex n−q equation in Table 5 withx n−q asx n−q = F −qx n .

Fixed-point smoothing
Fixed-point smoothing implies that measurements are available from time index 0 up to the current time point n, but the estimate is required at some fixed past Table 5 Fixed-lag full-horizon TV UFIR smoothing algorithm

Stage
Given: K, q = constant, n K.
Set:x K−1 by (75) and G K−1 by (76) for m = 0. Update: Computex n−q for n q aŝ where v is a constant [32,37]. The timevarying lag is thus q = n − v. In such a formulation, the UFIR smoother is always full-horizon as shown by the TV algorithm in Table 6. By replacing thex n−q equation witĥ x n−q = F −qx n , it becomes applicable for TI models. Probably the most interesting application for such algorithms is initial state estimation with v = 0. Note that the same problem can be solved using the fixed-interval smoother ( Table 4) if we set the initial interval point to zero.

Prediction
State prediction plays a key role in many applications. The one-step predictor is fundamental for digital feedback control systems [38]. It is also commonly provided when measurements are unavailable at some points [39] and when estimates of long-term future behavior are required [40]. Predictive estimation is necessary for global positioning system (GPS)-based applications when the GPS receiver temporarily fails or when a signal is temporarily unavailable [27]. Predictive estimation is required for holdover in digital communication networks [41], for maintaining normal functioning of certain systems during down time [42,43], and for astronomy and climate forecasting. The predictor goal is thus to compensate for unavailable information. In many cases, linear predictors do perform better than nonlinear ones [44].
To develop UFIR prediction, two algorithms [27] can be used as shown in Figure 3. It is supposed that measurements at each point are either available (•) or not (×). Utilizing N opt available points from the immediate past, the estimator provides a one-step ahead projection (•) from each point of this interval: from point 1 to 2a, from 2 to 3a, etc. At point 4, the measurement is unavailable. Therefore, the predicted value 4a is utilized at point 4. Further predictor equations can be organized either with fixed steps or variable steps in the direct and two-stage forms. It has been shown in [27] that the variable-step approach is more precise in the short term, and that there is not a large difference between the estimates in the long term.

Fixed-step prediction
In the fixed-step case shown in Figure 3a, p is often unity, but in general may be arbitrary (p > 0). With p = 1, prediction can permanently substitute for unavailable measurements with predicted values.
The direct form The one-step predictor appears from (36) by setting p = 1: where G l is computed iteratively by and the initial values are determined aŝ where F m+1 s+1,0 = K −1 i=0 F s+1−i , s = m + K − 1, and l ranges from m + K to n. The desired estimate is obtained when l = n.
For TI models, the one-step predictor becomes: Both predictors can be implemented in the algorithm (Figure 3a) to satisfy the following condition: if z n is unavailable at time n, then set z n = H nxn for a TV model and z n = Hx n for a TI one.
The two-stage form By (47) and (58), the second form of the one-step predictor for TV and TI models, respectively, are the following: wherex n is the filter estimate. This is the most widely used prediction scheme.

Variable-step prediction
In the variable-step case illustrated in Figure 3b, the predicted estimates still compensate for unavailable measurements (points 4, 5, 6), but they are not involved to produce predictions, which is unlike the case of Figure 3a. Instead, p continues to increment until the measurement becomes available. At point 7, all measured and predicted values on a horizon of N opt past points are used to produce a prediction at point 8a. There are no other differences between fixed-step and variable-step prediction, and the estimates (36), (47), (48), and (58) can be used in a straightforward manner, along with the relevant error bounds.

Nonlinear models and extended filtering
For many applications, nonlinear systems can be modeled in additive white Gaussian noise environments with the state and observation equations as follows: x n = f n (x n−1 ) + B n w n , where f n (x n−1 ) and h n (x n ) are time-varying nonlinear vector functions and all other notations are given in (6) and (7). If f n (x n−1 ) and h n (x n ) are smooth enough, then the first-order Taylor series expansion is often applied to make the model approximately linear between two neighboring points.
An expansion of f n (x n−1 ) aroundx n−1 and h n (x n ) around the prior estimatex − n =x n|n−1 leads to where F n = ∂f n ∂x x n−1 and H n = ∂h n ∂x x − n are Jacobians and ε − n = x n −x − n is the prior estimation error. Unbiasedness implies E{ε n } = 0, and a first-order approximation of the expectation of f n (x n−1 ) leads to the prior estimatê The expectation of the prior error can be written as With (96) and (97) linearized in this way, UFIR filtering can be applied as shown below.

Iterative EFIR filtering
Following the Kalman filter extension [45], the extended unbiased FIR (EUFIR) filter estimate is shown in [46] to bê where the prior estimate isx − n = f n (x n−1 ), the bias correction gain is K l = G l H T l , and G l is computed iteratively as The iterative EUFIR filtering algorithm is given in Table 7.
As can be seen, it has the same structure as Table 1, except Remark: Use the estimate when l = n.
for the nonlinear functions in the estimate. Although the EUFIR algorithm traditionally does not use noise statistics or initial error statistics, the estimation error covariance may be required to characterize the performance. An analysis of error covariances is given in [46]. Note that, in contrast to the first-order expansion, (98) and (99), the second-order expansion involves noise statistics. However, as in the extended Kalman filter [28], the higher order expansion typically does not lead to a definitive advantage [46].

Memory for OUFIR estimators
The window size N plays an important role in UFIR estimators. If N < N opt , denoising appears to be inefficient: the random error dominates, although bias is negligible. If N > N opt , the random error is small, but bias affects the estimate.
Estimation of N opt is still a challenging mathematical problem that requires finding the derivative of the estimate with respect to the convolution length N. Even so, there are several available approaches. For low-degree polynomial models, N opt was found analytically in [47]. A more general approach employing the bandlimited property of signals was developed in [20]. Finally, an efficient algorithm exploiting measurements was recently proposed in [48]. In any case, it is much simpler to estimate a scalar N opt , rather than accurately estimating matrices Q n and R n as is required in the Kalman filter.

Bandlimited signals
In real applications, a measured signal is causal and bandlimited with some maximum frequency W. By the Shannon theorem, the maximum sampling interval that prevents aliasing is T = 1/2W . If measurements are obtained with sampling interval T, then only N = K points are available for averaging by the K -state FIR estimator. If we use larger N, then the estimate will be biased. In order to avoid bias, we would need the model to be represented with a larger number of states, and such a model may not be acceptable or available.
Typically, measurements are provided at time steps τ < T or even τ T and N opt can be calculated as follows [20]: where x means the maximum integer that is less than or equal to x. A simple analysis shows that if N > N opt , aliasing causes a bias. If N < N opt , noise reduction is inefficient.

Known reference model
If the reference model for x n is known, then the fullhorizon UFIR filter (Table 2) with window size N = n + 1 http://asp.eurasipjournals.com/content/2013/1/113 can be applied to produce the estimatex n x n|n via measurements taken from time index 0 to n. Following (60), the N-variant MSE P n can thus be defined by where n = x n −x n . The MSE (105) will be minimal if we minimize it to obtain n opt and let N opt = n opt +1. In doing so, one can either minimize the trace tr(P n ) if N opt needs to be applied to all of the states, or the (kk)th component P (kk)n of P n corresponding to the kth state, respectively, It has been shown in [48] that by increasing n, the first minimum in both (106) and (107) corresponds to N opt . The problem, however, arises when the reference model x n is unknown, as it usually is.

Unknown reference model
The case of unknown model for x n is typical. In this case, we estimate N opt via the mean square value (MSV): in which z n andx n are both known. It has been shown in [48] that the estimateN opt of N opt can be found via (108) to minimize the estimation error of all of the states or the kth one as, respectively, where V (kk)n is the (kk)th component of V n . The minimization is performed by increasing n, starting with K −1, until the first minimum. To avoid ambiguities when minimizing these functions, the number of points used in the expected value must be sufficiently large, and smoothing of the objective function may be desirable.

Some generalizations and conclusions
Based on extensive investigations provided by many authors, now we provide some generalizations; compare the trade-off between the OUFIR, OFIR, and Kalman filters; and summarize the results in Table 8.

OUFIR vs. OFIR
Beginning with the early limited memory filter of Jazwinski [5], OFIR filtering has been under development for several decades. In control theory, fundamental progress was achieved by Kwon et al. and his followers [7,35,[49][50][51][52][53]. In signal processing, solutions were found by Shmaliy et al. [8,20,27]. It was shown in [52] that different kinds of limited memory filters [5,54] are equivalent to the OFIR one. The most serious flaws of this technique are high computational complexity and high memory consumption. With such poor engineering features, OFIR estimators still have not gained currency and their development remains mostly at a theoretical level.
On the other hand, OFIR estimators do not result in estimation errors that are substantially smaller than OUFIR ones, especially when N 1. The rule of thumb here is as shown in Figure 4: The error difference between the OFIR and OUFIR estimates diminishes as N increases. Note that the boundary value 10 . . . 30 in Figure 4 is flexible and depends on the model. However, recalling that FIR filters require a large order (window size N 1) to guarantee good performance, we obtain the following conclusion: Fast-and low-complexity iterative OUFIR algorithms that ignore noise statistics and initial error statistics are practically superior to the best-known OFIR ones.
Note that this deduction often holds even if N is small. But in some applications, OFIR filters can be more appropriate because of their better accuracy.

OUFIR vs. Kalman filter
The well-known features of the Kalman filter are optimality, fast computation, and low memory consumption. However, the Kalman filter requires a priori initial condition and noise statistics, and this is recognized as the most annoying flaw of the Kalman filter. Because of this requirement, the Kalman filter is suboptimal for all practical purposes. Moreover, its optimality is guaranteed only if the noise sources are white, which is not the case for many applications. Finally, the Kalman filter applies only to stochastic models.
In turn, the iterative OUFIR filter ignores noise statistics (except for the zero-mean assumption), allows  the noise to have any distribution and covariance, exhibits BIBO stability, and serves for both stochastic and deterministic models. However, it does not guarantee optimality, especially when N opt is small. It requires (N opt − 1)-times more computational time and needs about N opt times more memory than the Kalman filter.
The Kalman filter is thus best when the noise is white and its statistics are exactly known. Otherwise, one may follow the rule of thumb sketched in Figure 5. As can be seen, it is only within a narrow range around the actual noise covariances that the OUFIR filter falls a bit short of the Kalman filter. Otherwise, the OUFIR filter demonstrates smaller errors. The Kalman filter is also the best filter under the ideal conditions. Otherwise, its error grows more rapidly than the OUFIR, meaning that the latter is more robust in real-world applications ( Figure 6).
Note that the error difference between the two filters decreases with increasing N opt . These observations by diverse authors who have investigated uncertainties, different kinds of noise sources, and other irregular perturbations result in the following important inference: Under the real-world operating conditions, and when noise statistics and initial error statistics are not known exactly, the OUFIR estimator is able to outperform the Kalman filter even if N opt is not large.

Conclusions
The UFIR algorithms discussed in this paper cover many applied problems associated with filtering, smoothing, and prediction of discrete-time state-space models. The most general conclusions one may arrive at by analyzing these estimators are the following: 1) UFIR algorithms are able to provide nice suboptimal estimates that are acceptable for many applications; 2) The optimal window size N opt can easily be estimated experimentally; 3) The extra time required by the UFIR iterations can be alleviated with parallel computing; and 4) The extra memory required by the UFIR estimators is not a problem for modern computers. So, we conclude that UFIR algorithms are strong rivals to the Kalman filter for real-world applications. The iterative UFIR estimator commonly outperforms the OFIR one even if N opt is not large, and it is able to outperform the Kalman filter under real-world operating conditions and when the noise statistics are not known exactly. That makes UFIR algorithms highly attractive for applications. We see the following main trends in further developments of FIR estimators. Optimal FIR filtering and smoothing strongly require fast Kalman-like algorithms which are similar to those developed for UFIR estimators and considered in this paper. Such algorithms are required for small N opt . In turn, iterative UFIR algorithms need further optimization and robustification in non-Gaussian environments and under the uncertainties. Special attention should also be paid to fast algorithms for the determination of N opt . Provided such modifications, one may expect new efficient FIR solutions.
Endnotes ax n+p|n means the estimate at time n + p given measurements up to and including time n. Here, p = 0 corresponds to filtering, p > 0 corresponds to p-step prediction, and p < 0 corresponds to q-lag smoothing, where q = −p. We simplify notation by usinĝ x n+p x n+p|n . b In different applications, the FIR estimator memory is also called the receding horizon [53], sliding window [55], averaging interval [56], etc.