Our objective is to sequentially infer the evolution of a latent time-series with correlated innovations, as we acquire new observations. In Bayesian terminology, one is interested in updating the filtering density from f(x
t
|y
1:t
) to f(x
t+1|y
1:t+1) as new data are observed. We do so by using Eq. (2).
As previously pointed out, the analytical solution to such equation is intractable for the most interesting cases (models with nonlinearities and non-Gaussianities) and thus, we resort to sequential Monte Carlo methods. We briefly provide an overview of SMC methods in subsection 4.1, before explaining in detail our proposed method in subsection 4.2.
4.1 Sequential Monte Carlo
Monte Carlo methods are a class of computational algorithms that numerically approximate functions of interest by random sampling. In particular, sequential Monte Carlo methods recursively compute approximations to relevant probability densities, by replacing the true densities with discrete random probability measures
$$ f(x) \approx f^{M}(x) = \sum_{m=1}^{M} w^{(m)} \delta\left(x-x^{(m)}\right) \;, $$
(18)
where δ(·) is the Dirac delta function.
The points x
(m) represent the support of the random measure and are called particles. These particles are assigned weights w
(m), which are interpreted as probability masses. The random measure is thus a weighted sum of M particles and their weights.
The key to SMC methods is the sequential computation of Eq. (2), which is done by updating the approximating random measures at time instant t to the next time instant t+1. Let f
M(x
t
) be the approximation of f(x
t
|y
1:t
) at time instant t. The update of f
M(x
t
) to f
M(x
t+1) is done in two steps.
First, one propagates the particles \(x_{t}^{(m)}\) to \(x_{t+1}^{(m)}\) via a so-called proposal density π(·),
$$ x_{t+1}^{(m)}\sim \pi\left(x_{t+1}|x_{1:t}^{(m)},y_{1:t+1}\right), $$
(19)
where one may use all or part of the available information (that is, the history of observations and previous states). Then, one computes the weights of each candidate sample \(x_{t+1}^{(m)}\) according to
$$ w_{t+1}^{(m)} \propto w_{t}^{(m)} \frac{f\left(y_{t+1}|x_{t+1}^{(m)}\right)f\left(x_{t+1}^{(m)}|x_{1:t}^{(m)}\right)}{\pi\left(x_{t+1}^{(m)}|x_{1:t}^{(m)},y_{1:t+1}\right)}, $$
(20)
where \(f\left (y_{t+1}|x_{t+1}^{(m)}\right)\) is the likelihood of the new observation given sample \(x^{(m)}_{t+1}\), and \(f\left (x_{t+1}^{(m)}|x_{1:t}^{(m)}\right)\) is the transition density of the latent state. The computation of the weights is followed by their normalization so that they sum up to one and form a proper probability random measure.
SMC methods require an additional third step called resampling [40]. If one proceeds with propagation and weight computation only, the approximation f
M(x
t
) degenerates quickly, as only few of the particles are assigned non-negligible weights. Resampling consists on deciding which particles to propagate by selecting those with higher probability, i.e., bigger weights \(w_{t}^{(m)}\). One prevents the quick deterioration of the SMC method by resorting to resampling methods (see [40] for an overview of the most common techniques). These are often triggered based on the effective sample size of the SMC approximation at every time instant [41].
The choice of the proposal density is critical for any SMC method. It has been shown that the optimal importance function is f(x
t+1|x
t
,y
1:t+1), which minimizes the variance of the resulting random measure. However, this density is analytically intractable in our problem of interest. We adopt the simpler, but yet effective, alternative known as Sequential Importance Resampling (SIR) [23].
In summary, we sample new particle candidates from the transition density of the latent states f(x
t+1|x
1:t
). Such proposal function entails that the weighting of the particles is proportional to their likelihood function, i.e., \(w_{t+1}^{(m)} \propto w_{t}^{(m)} f\left (y_{t+1}|x_{t+1}^{(m)}\right)\). Details of the proposed SMC method for inference of latent ARMA models with correlated innovations follows.
4.2 Proposed SMC method
We first present an SMC method for inference of an ARMA process with correlated innovations, when the ARMA parameters, i.e., θ=(a
1
a
2 ⋯ a
p
b
1
b
2 ⋯ b
q
)⊤, are known. We later relax the assumptions for the case when these parameters are unknown. In all the cases, the normalized autocovariance values ρ
u
(τ) for lags τ=0,1,⋯,t−1, of the correlated innovation process must be known.
4.2.1 Proposed SMC method: known ARMA parameters
Let us consider at time instant t the following probability random measure approximation of the filtering density f(x
t
|y
1:t
):
$$ f^{M}(x_{t}) = \sum_{m=1}^{M} w_{t}^{(m)} \delta\left(x_{t}-x_{t}^{(m)}\right). $$
(21)
Upon reception of a new observation y
t+1, the algorithm proceeds as follows:
-
1.
Compute the joint normalized covariance matrix Σ
t+1 at time instant t+1.
$$ \Sigma_{t+1}=A_{t+1}^{-1}B_{t+1} R_{u_{t}} B_{t+1}^{\top} \left(A_{t_{1}}^{-1}\right)^{\top} = \left(\begin{array}{ll} h_{t+1} & \lambda_{t} \\ \lambda_{t}^{\top} & \Sigma_{t} \end{array}\right). $$
(22)
-
2.
Perform resampling of the state’s genealogical line by drawing from a categorical distribution defined by the random measure f
M(x
t
).
$$ \overline{x}_{1:t}^{(m)} \sim \left\{x_{t}^{(m)}, w_{t}^{(m)}\right\}, \text{where}\ m=1,\cdots, M. $$
(23)
-
3.
Propagate the state particles by sampling from the transition density, conditioned on the available resampled streams \(\overline {x}_{1:t}^{(m)}\).
-
4.
Compute the non-normalized weights for the drawn particles according to
$$ \widetilde{w}_{t+1}^{(m)} \propto f\left(y_{t+1}|x_{t+1}^{(m)}\right), $$
(26)
and normalize them to obtain a new random measure
$$ f^{M}(x_{t+1}) = \sum_{m=1}^{M} w_{t+1}^{(m)} \delta\left(x_{t+1}-x_{t+1}^{(m)}\right). $$
(27)
For the above method to be applicable, one needs to have full knowledge of the parameters in the transition density. That is, the matrices A
t+1, B
t+1, and \(R_{u_{t+1}}\) must be known for the covariance matrix Σ
t+1 to be computed for propagation of the state particles.
One can efficiently compute Σ
t+1 by leveraging algebraic tricks prompted by the structural properties of the involved matrices (Toeplitz and upper triangular). On the one hand, the upper triangular nature of A
t
and B
t
simplify the number of computations considerably (the inverse of an upper triangular matrix is also upper triangular). The product \(A_{t}^{-1} B_{t}\) is a matrix with a structure similar to A
t
and B
t
: an upper triangular matrix with elements of its first row shifted to the right. On the other hand, due to the Toeplitz structure of the \(R_{u_{t}}\) matrix, one can resort to a Levinson-Durbin type technique [42] to recursively compute the necessary matrix product operations.
The assumption that knowledge of the parameters within A
t+1 and B
t+1 exists, however, is often not substantiated. Therefore, we resort to a parameter sampling scheme when the ARMA parameters are not known. We augment the state vector with the unknown parameters ρ
t
=(x
t
θ
t
)⊤, similar to the work in [43, 44]. Note that the subscript t in θ
t
does not imply that the parameter evolves over time. It is there only to signify that we obtain samples of the unknowns at time t.
The full parameter posterior for the model in Eq. (1) is analytically intractable, and thus, we cannot draw samples from the true parameter posterior. Furthermore, as the parameters do not change over time, their particle propagation becomes troublesome and various methodologies have been suggested to overcome these challenges. Some include the use of artificial parameter evolution [23], while others resort to kernel smoothing [43] or density-assisted (DA) particle filtering techniques [45].
In this paper, we explore and compare two sampling alternatives, one based on the principles of DA-SMC methods and another where importance sampling (IS) of the parameters is carried out. In the former, one approximates the posterior of the unknown parameters with a density of choice; in the latter, one draws from a proposal density for the parameters and later adjusts by computing the appropriate weights.
These proposed methods are the first approximation to dealing with unknown ARMA parameters. We acknowledge that any of the advanced SMC techniques that mitigate the challenges of estimating constant parameters (e.g., parameter smoothing [29, 46, 47] or nested SMC methods [30, 31]) can only improve the accuracy of the proposed SMC methods.
4.2.2 Proposed SMC method: DA-SMC for unknown ARMA parameters
We now propose an SMC method for the case where the parameters of the latent ARMA model, i.e., θ=(a
1
a
2 ⋯ a
p
b
1
b
2 ⋯ b
q
)⊤, are unknown. This first alternative follows the principles of density-assisted SMC methods. Because the true posterior of the unknown parameters is analytically intractable, it approximates such posterior with a density of choice.
In particular, we propose to approximate the posterior of the unknown parameter θ, given the current time-series x
1:t
, with a Gaussian distribution, i.e.,
$$ f\left(\theta_{t+1}^{(m)}|x_{1:t}^{(m)}\right) \approx \mathcal{N}\left(\theta_{t+1}|\mu_{\theta_{t}}, \Sigma_{\theta_{t}}\right), $$
(28)
where the sufficient statistics are computed based on samples and weights available at this time instant
$$ \begin{array}{ll} \mu_{\theta_{t}} &= \sum_{i=1}^{M} w_{t}^{(m)} \theta^{(m)}_{t}, \\ \Sigma_{\theta_{t}} &= \sum_{i=1}^{M} w_{t}^{(m)}\left(\theta^{(m)}_{t} - \mu_{\theta_{t}}\right)\left(\theta^{(m)}_{t} - \mu_{\theta_{t}}\right)^{\top}. \end{array} $$
(29)
One uses this approximation to propagate parameter samples from this time instant to the next. As a result, the overall weight computation of the SMC method simplifies to
$$ \begin{array}{ll} \widetilde{w}_{t+1}^{(m)} &\propto f\left(y_{t+1}|x_{t+1}^{(m)}\right) \cdot \frac{f\left(x_{t+1}^{(m)}|x_{1:t}^{(m)},\theta_{t+1}^{(m)}\right)}{\pi(x_{t+1})} \cdot \frac{f\left(\theta_{t+1}^{(m)}|x_{1:t}^{(m)}\right)}{\pi(\theta_{t+1})} \\ &\propto f\left(y_{t+1}|x_{t+1}^{(m)}\right) \;. \end{array} $$
(30)
In summary, the proposed DA-SMC for the unknown parameter case considers a joint state and parameter random measure at time instant t of the following form
$$ f^{M}(\rho_{t}) = \sum_{m=1}^{M}w_{t}^{(m)} \delta\left(\rho_{t}-\rho_{t}^{(m)}\right), $$
(31)
and, upon reception of a new observation y
t+1, proceeds as follows:
-
1.
Estimate the sample mean and covariance of the parameter vector θ
t
.
$$ \left\{\begin{array}{l} \mu_{\theta_{t}} = \sum_{i=1}^{M} \theta_{t}^{(m)} w_{t}^{(m)}, \\ \Sigma_{\theta_{t}} = \sum_{i=1}^{M} \left(\theta_{t}^{(m)} - \mu_{\theta_{t}}\right)\left(\theta_{t}^{(m)} - \mu_{\theta_{t}}\right)^{\top} w_{t}^{(m)} \;. \end{array}\right. $$
(32)
-
2.
Draw new parameter samples from the Gaussian approximation to the posterior density with the newly computed sufficient statistics.
$$ \theta_{t+1}^{(m)} \sim f\left(\theta_{t}|x_{1:t}^{(m)}\right) \approx \mathcal{N}\left(\theta_{t+1}|\mu_{\theta_{t}}, \Sigma_{\theta_{t}}\right) \;. $$
(33)
-
3.
Compute the joint covariance matrix for each parameter sample \(\theta _{t+1}^{(m)}\).
$$ {}\Sigma_{t+1}^{(m)}=A_{t+1}^{(m)^{-1}}B_{t+1}^{(m)} R_{u_{t}} B_{t+1}^{(m)^{\top}} A_{t_{1}}^{(m)^{-1^{\top}}} = \left(\begin{array}{ll} h_{t+1}^{(m)} & \lambda_{t}^{(m)} \\ \lambda_{t}^{(m)^{\top}} & \Sigma_{t}^{(m)} \end{array}\right). $$
(34)
-
4.
Perform resampling of the state’s genealogical line by drawing from a categorical distribution defined by the random measure f
M(x
t
).
$$ \overline{x}_{1:t}^{(m)} \sim \left\{x_{t}^{(m)}, w_{t}^{(m)}\right\}, \text{where}\ m=1,\cdots, M. $$
(35)
-
5.
Propagate the state particles by sampling from the transition density, conditioned on available resampled streams \(\overline {x}_{1:t}^{(m)}\).
-
6.
Compute the non-normalized weights for the drawn particles according to
$$ \widetilde{w}_{t+1}^{(m)} \propto f\left(y_{t+1}|x_{t+1}^{(m)}\right), $$
(38)
and normalize them to obtain a new probability random measure
$$ f^{M}(\rho_{t+1}) = \sum_{m=1}^{M}w_{t+1}^{(m)} \delta\left(\rho_{t+1}-\rho_{t+1}^{(m)}\right). $$
(39)
4.2.3 Proposed SMC method: IS-SMC for unknown ARMA parameters
We now propose an alternative SMC method for the unknown ARMA parameter case too, based on importance sampling principles. Instead of approximating the analytically intractable parameter posterior, one can choose a proposal density and apply IS to jointly adjust the state and parameter samples.
Specifically, we use a Gaussian proposal density to draw samples for the unknown ARMA parameters θ. At every time instant, one propagates parameter particles by sampling from the proposal
$$ \pi(\theta_{t+1}) = \mathcal{N}\left(\theta_{t+1}|\mu_{\theta_{t}}, \Sigma_{\theta_{t}}\right), $$
(40)
with sufficient statistics as in Eq. (29). The corresponding weight computation results in
$$ \begin{array}{ll} \widetilde{w}_{t+1}^{(m)} &\propto f\left(y_{t+1}|x_{t+1}^{(m)}\right) \cdot \frac{f\left(x_{t+1}^{(m)}|x_{1:t}^{(m)}\theta_{t+1}^{(m)}\right)}{\pi(x_{t+1})} \cdot \frac{f\left(\theta_{t+1}^{(m)}|x_{1:t}^{(m)}\right)}{\pi(\theta_{t+1})} \\ &=f\left(y_{t+1}|x_{t+1}^{(m)}\right) \cdot \frac{f\left(\theta_{t+1}^{(m)}|x_{1:t}^{(m)}\right)}{\mathcal{N}\left(\theta_{t+1}|\mu_{\theta_{t}}, \Sigma_{\theta_{t}}\right)} \;. \end{array} $$
(41)
Since the posterior of the parameters is analytically intractable, we have
$$ {}f\left(\theta_{t+1}^{(m)}|x_{1:t}^{(m)}\right) = \frac{f\left(x_{1:t}^{(m)}|\theta_{t+1}^{(m)}\right)f\left(\theta_{t+1}^{(m)}\right)}{f\left(x_{1:t}^{(m)}\right)} \propto \frac{f\left(x_{1:t}^{(m)}|\theta_{t+1}^{(m)}\right)}{f\left(x_{1:t}^{(m)}\right)} \;, $$
(42)
which results in
$$ {}\left\{\begin{array}{ll} f\left(\theta_{t+1}^{(m)}|x_{1:t}^{(m)}\right) \propto \frac{\mathcal{N}\left(x_{1:t}^{(m)}\left|0, \sigma_{u}^{2}\Sigma_{t}^{(m)}\right.\right)}{\mathcal{N}\left(x_{1:t}^{(m)}\left|0, \sigma_{u}^{2}\Sigma_{t}^{\left(\mu_{\theta_{t}}\right)}\right.\right)}, &\text{if }\ \sigma_{u}^{2}\ \text{is known,}\\ f\left(\!\theta_{t+1}^{(m)}|x_{1:t}^{(m)}\!\right) \propto \frac{\mathcal{T}_{\nu_{0}}\left(x_{1:t}^{(m)}\left|0, \sigma_{0}^{2}\Sigma_{t}^{(m)}\right.\right)}{\mathcal{T}_{\nu_{0}}\left(\!x_{1:t}^{(m)}\left|\nu_{0},0,\sigma_{0}^{2}\Sigma_{t}^{\left(\mu_{\theta_{t}}\!\right)}\!\right.\right)}, &\text{if }\ \sigma_{u}^{2}\ \text{is unknown.} \end{array}\right. $$
(43)
With \(\Sigma _{t}^{\left (\mu _{\theta _{t}}\right)}\), we describe the covariance matrix computed using the parameter estimates \(\mu _{\theta _{t}}\) as in Eq. (29), while with \(\Sigma _{t}^{(m)}\), we refer to the covariance matrix evaluated per drawn parameter sample \(\theta _{t+1}^{(m)}\).
Therefore, the proposed IS-SMC for the unknown parameter case at time instant t starts with a joint state and parameter random measure
$$ f^{M}(\rho_{t}) = \sum_{m=1}^{M}w_{t}^{(m)} \delta\left(\rho_{t}-\rho_{t}^{(m)}\right), $$
(44)
and, upon reception of a new observation y
t+1, proceeds as follows:
-
1.
Estimate the sample mean and covariance of the parameter vector θ
t
.
$$ \begin{array}{l} \mu_{\theta_{t}} = \sum_{i=1}^{M} \theta_{t}^{(m)} w_{t}^{(m)}, \\ \Sigma_{\theta_{t}} = \sum_{i=1}^{M} \left(\theta_{t}^{(m)} - \mu_{\theta_{t}}\right)\left(\theta_{t}^{(m)} - \mu_{\theta_{t}}\right)^{\top} w_{t}^{(m)} \;. \end{array} $$
(45)
-
2.
Draw new parameter samples from the Gaussian proposal with the newly computed sufficient statistics.
$$ \theta_{t+1}^{(m)} \sim \pi(\theta_{t+1}) = \mathcal{N}\left(\theta_{t+1}|\mu_{\theta_{t}}, \Sigma_{\theta_{t}}\right) \;. $$
(46)
-
3.
Compute the joint normalized covariance matrix for each parameter sample \(\theta _{t+1}^{(m)}\).
$$ {}\Sigma_{t+1}^{(m)}=A_{t+1}^{(m)^{-1}}B_{t+1}^{(m)} R_{u_{t}} B_{t+1}^{(m)^{\top}} A_{t_{1}}^{(m)^{-1^{\top}}} = \left(\begin{array}{ll} h_{t+1}^{(m)} & \lambda_{t}^{(m)} \\ \lambda_{t}^{(m)^{\top}} & \Sigma_{t}^{(m)} \end{array}\right). $$
(47)
-
4.
Perform resampling of the state’s genealogical line by drawing from a categorical distribution defined by the random measure f
M(x
t
).
$$ \overline{x}_{1:t}^{(m)} \sim \left\{x_{t}^{(m)}, w_{t}^{(m)}\right\}, \text{where}\ m=1,\cdots, M. $$
(48)
-
5.
Propagate the state particles by sampling from the transition density, conditioned on available resampled streams \(\overline {x}_{1:t}^{(m)}\).
-
6.
Compute the non-normalized weights for the drawn particles.
and normalize them to obtain a new probability random measure
$$ f^{M}(\rho_{t+1}) = \sum_{m=1}^{M}w_{t+1}^{(m)} \delta\left(\rho_{t+1}-\rho_{t+1}^{(m)}\right). $$
(53)