A Particle Filtering Approach to Change Detection for Nonlinear Systems

We present a change detection method for nonlinear stochastic systems based on particle ﬁltering. We assume that the parameters of the system before and after change are known. The statistic for this method is chosen in such a way that it can be calculated recursively while the computational complexity of the method remains constant with respect to time. We present simulation results that show the advantages of this method compared to linearization techniques.


INTRODUCTION
Page states the change detection problem as follows [1]: "Whenever observations are taken in order it can happen that the whole set of observations can be divided into subsets, each of which can be regarded as a random sample from a common distribution, each subset corresponding to a different parameter value of the distribution. The problems to be considered in this paper are concerned with the identification of the subsamples and the detection of changes in the parameter value. " We refer to a change or an abrupt change as any change in the parameters of the system that happens either instantaneously or much faster than any change that the nominal bandwidth of the system allows.
The key difficulty of all change detection methods is that of detecting intrinsic changes that are not necessarily directly observed but are measured together with other types of perturbations [2].
The change detection could be offline or online. In online change detection, we are only interested in detecting the change as quickly as possible (e.g., to minimize the detection delay with fixed mean time between false alarms), and the estimate of the time when the change occurs is not of importance. In offline change detection, we assume that the whole observation sequence is available at once and finding the estimate of the time of change could be one of the goals of the detection method. In this paper, we limit our concern to online detection of abrupt changes.
The change detection methods that we consider here can be classified under the general name of likelihood ratio (LR) methods. Cumulative sum (CUSUM) and generalized LR (GLR) tests are among these methods. CUSUM was first proposed by Page [1]. The most basic CUSUM algorithm assumes that the observation signal is a sequence of stochastic variables which are independent and identically distributed (i.i.d.) with known common probability density function before the change time, and i.i.d. with another known probability density after the change time. In the CUSUM algorithm, the log-likelihood ratio for the observation from time i to time k is calculated and its difference with its current minimum is compared with a certain threshold. If this difference exceeds the threshold, an alarm is issued. Properties of the CUSUM algorithm have been studied extensively. Its most important property is the asymptotic optimality, which was first proven in [3]. More precisely, CUSUM is optimal, with respect to the worst mean delay, when the mean time between false alarms goes to infinity. This asymptotic point of view is convenient in practice because a low rate of false alarms is always desirable.
In the case of unknown system parameters after change, the GLR algorithm can be used as a generalization of the CUSUM algorithm. Since, in this algorithm, the exact information of the change pattern is not known, the LR is maximized over all possible change patterns. 1 For stochastic systems with linear dynamics and linear observations, the observation sequence is not i.i.d. Therefore, the regular CUSUM algorithm cannot be applied for detection of changes in such systems. However, if such systems are driven by Gaussian noise, the innovation process associated with the system is known to be a sequence of independent random variables. The regular CUSUM algorithm or its more general counterpart, GLR, can be applied to this innovation process [2,4].
In this paper, we are interested in the change detection problem for stochastic systems with nonlinear dynamics and observations. We show that for such systems, the complexity of the CUSUM algorithm grows with respect to time. This growth in complexity cannot be tolerated in practical problems. Therefore, instead of the statistic used in the CUSUM algorithm, we introduce an alternative statistic. We show that with this statistic, the calculation of the LR can be done recursively and the computational complexity of the method stays constant with respect to time.
Unlike the linear case, change detection for nonlinear stochastic systems has not been investigated in any depth. In the cases where a nonlinear system experiences a sudden change, linearization and change detection methods for linear systems are the main tools for solving the change detection problem (see, e.g., [5]). The reason this subject has not been pursued is clear; even when there is no change, the estimation of the state of the system, given the observations, results in an infinite-dimensional nonlinear filter [6], and the change in the system can only make the estimation harder.
In the last decade, there has been an increasing interest in simulation-based nonlinear filtering methods. These filtering methods are based on a gridless approximation of the conditional density of the state, given the observations. Gridless simulation-based filtering, now known by many different names such as particle filtering (PAF) [7,8], the condensation algorithm [9], the sequential Monte Carlo (SMC) method [10], and Bayesian bootstrap filtering [11], was first introduced by Gordon et al. [11] and then it was rediscovered independently by Isard and Blake [9] and Kitagawa [12].
The theoretical results regarding the convergence of the approximate conditional density given by PAF to the true conditional density (in some proper sense) suggest that this method is a strong alternative for nonlinear filtering [7]. The advantage of this method over the nonlinear filter is that PAF is a finite-dimensional filter. The authors believe that PAF and its modifications are a starting point to study change detection for nonlinear stochastic systems. In this paper, we use the results in [13] and we develop a new change detection method for nonlinear stochastic systems.
In [13], we showed that when the number of satellites is below a critical number, linearization methods such as extended Kalman filtering (EKF) result in an unacceptable position error for an integrated inertial navigation system/global positioning system (INS/GPS). We also showed that the approximate nonlinear filtering methods, the projection particle filter [13], in particular, are capable of providing an acceptable estimate of the position in the same situation.
If the carrier phase is used for position information in an integrated INS/GPS, one sudden change that happens rather often is the cycle slip. A cycle slip happens when the phase of the received signal estimated by the phase lock loop in the receiver has a sudden jump. An integrated INS/GPS with carrier phase receiver is used as an application for the method introduced in this paper for detection of a cycle slip with known strength.
In Section 2, we state the approximate nonlinear filtering method used in this paper. In Section 3, we briefly define the change detection problem. In Section 4, we review the CUSUM algorithm for linear systems with additive changes. Then, in Sections 5 and 6, we present a new change detection method for nonlinear stochastic systems. In Section 7, we lay out the formulation for an integrated INS/GPS. In Section 8, we present some simulation results. In Section 9, we summarize the results and lay out the future work.

APPROXIMATE NONLINEAR FILTERING
Consider the dynamical system where the distribution of x 0 is given, x k ∈ R n , y k ∈ R d , and w k ∈ R q , and v k ∈ R d are white noise processes with known statistics, and the functions f k (·) and h k (·) and the matrix G k (·) have the proper dimensions. The noise processes w k , v k , k = 0, 1, . . . , and the initial condition x 0 are assumed independent. We assume the initial distribution for x 0 is given. The goal is to find the conditional distribution of the state, given the observation, that is, P k (dx k |Y k 1 ), where Y k 1 = {y 1 , y 2 , . . . , y k } is the observation up to and including time k. The propagation of the conditional distribution, at least conceptually, can be expressed as follows [6].
The conditional distribution given by the above steps is exact, but in general, it can be viewed as an infinitedimensional filter, thus not implementable. PAF, in brief, Step (1). Initialization.
Step (5). k ← k + 1; go to Step (2). is an approximation method that mimics the above calculations with a finite number of operations using the Monte Carlo method. Algorithm 1 shows one manifestation of PAF [7,11].
It is customary to call x 1 k , . . . , x N k particles. The key idea in PAF is to eliminate the particles that have low importance weights p(y k |x k ) and to multiply particles having high importance weights [11,14]. The surviving particles are thus approximately distributed according to P N k (dx). This automatically makes the approximation one of better resolution in the areas where the probability is higher.
In the simulations in this paper, we use a modified version of the classical PAF method called projection PAF. For completeness sake, we repeat the algorithm that was given in [13]. In projection PAF, we assume that the conditional density of the state of the system, given the observation, is close to an exponential family of densities S defined as follows. 2 Definition 1 (Brigo [15]). Let {c 1 (·), . . . , c p (·)} be affinely independent 3 scalar functions defined on k dimensional Euclidean space R k . Assume that where Θ ⊆ Θ 0 is open, is called an exponential family of probability densities.
With this definition for the exponential family of densities, the projection PAF algorithm is stated as in Algorithm 2.

CHANGE DETECTION: PROBLEM DEFINITION
Online detection of a change can be formulated as follows [2]. Let Y k 1 be a sequence of observed random variables with conditional densities p θ (y k |y k−1 , . . . , y 1 ). Before the unknown change time t 0 , the parameter of the conditional density θ is constant and equal to θ 0 . After the change, this parameter is equal to θ 1 . In online change detection, one is interested in detecting the occurrence of such a change. The exact time and the estimation of the parameters before and after the change are not required. In the case of multiple changes, we assume that the changes are detected fast enough so that in each time instance, only one change can be considered. Online change detection is performed by a stopping rule [2] where λ is a threshold, {g k } k≥1 is a family of functions, and t a is the alarm time, that is, the time when change is detected. If t a < t 0 , then a false alarm has occurred. The criterion for choosing the parameter λ and the family of functions {g k } k≥1 is to minimize the detection delay for the fixed mean time between false alarms.

ADDITIVE CHANGES IN LINEAR DYNAMICAL SYSTEMS
Consider the system where x k ∈ R n , y k ∈ R d , and w k ∈ R q and v k ∈ R d are white noise processes with known statistics. F k , G k , H K , Γ k , and Ξ k are matrices of proper dimensions, and Υ x (k, t 0 ) and Υ y (k, t 0 ) are the dynamic profiles of the assumed changes, of dimensionsñ ≤ n andd ≤ d, respectively. w k and v k are white Gaussian noise processes, independent of the initial condition x 0 . It is assumed that Υ x (k, t 0 ) = 0 and Υ y (k, t 0 ) = 0 for k < t 0 , but we do not necessarily have the exact knowledge of the dynamic profile and the gain matrices Γ k and Ξ k . For the case of known parameters before and after change, the CUSUM [2] algorithm can be used, and it is well known that the change detection method has the following form: where i is the innovation process calculated using Kalman filtering assuming that no change occurred, and ρ(i, j) is the mean of the innovation process at time j conditioned on the change occurring at the time i. p 0 and p ρ(·,·) are Gaussian densities with means 0 and ρ(·, ·), respectively. The covariance matrix for these two densities is the same and is calculated using Kalman filtering. S K j is the LR between two hypotheses: change occurrence at j and no change occurrence.
When the parameter after change is not known, GLR can be used to calculate g k [4]: The solution for (10) is well known and can be found in many references (see [2]). Similar to nonlinear filtering, change detection for nonlinear stochastic systems results in an algorithm that is infinite dimensional. Linearization techniques, whenever applicable, are the main approximation tool for studying the change detection problem for nonlinear systems. Although linearization techniques are computationally efficient, they are not always applicable. In the sections to come, we propose a new method based on nonlinear PAF that can be used for change detection for nonlinear stochastic systems.

NONLINEAR CHANGE DETECTION: PROBLEM SETUP
Consider the nonlinear system where and the functions f 0 k (·), f 1 k (·), h 0 k (·), h 1 k (·) and the matrices G 0 k (·), G 1 k (·) have the proper dimensions. The sudden change occurs when i k changes from 0 to 1.
In this setup, S k j can be written as follows: Writing (13) in a recursive form, we get where p(y i |Y i−1 1 , t 0 = j) can be written as follows: (15), one needs to find an approximation for the corresponding nonlinear filter. We assume that this approximation is done using either PAF or projection PAF [13].
To calculate the LR in (13), we must calculate the conditional densities of the state, given the observation for two hypotheses (change occurrence at j and change occurrence after k). This means that two nonlinear filters should be implemented just to compare these two hypotheses. Therefore, it is clear that to use an algorithm similar to (9), k parallel nonlinear filters should be implemented. In Figure 1, we see that the computational complexity of the CUSUM algorithm grows linearly with respect to time. In most applications, this growth is not desirable. One possible way to approximate the CUSUM algorithm is to truncate the branches that fork from the main branch in Figure 1. We will explain this truncation procedure and its technical difficulties in the next few lines.
Recall that the main branch (horizontal) and the branches forked from it in Figure 1 are representing a series of nonlinear filters with specific assumptions on the change time. The dynamic and observation equations for all forked branches are the same and the only difference is the initial density. If the conditional density of the state, given the observation for a nonlinear system, with the wrong initial density converges (in some meaningful way) to the true conditional density (initialized by the true initial density), we say that the corresponding nonlinear filter is asymptotically stable [17].
For asymptotically stable nonlinear filters, the forked branches in Figure 1 converge to a single branch, therefore, there is no need to implement several parallel nonlinear filters. In other words, after each branching, the independent nonlinear filter is used for a period of time and then this branch converges to the branches that have forked earlier, that is, joins them. The time needed for the branch of the independent nonlinear filter to join the other forked branches depends on the convergence rate and the target accuracy of the approximation. Although the procedure mentioned above can be used for asymptotically stable nonlinear filters, there are several problems associated with this method. The known theoretical results for identifying asymptotically stable filters are limited to either requiring ergodicity and the compactness of the state space [18,19,20], or very special cases of the observation equation [17]. The rate of convergence of the filters in different branches is another potential shortcoming of the mentioned procedure. If the convergence rate is low in comparison to the rate of parameter change in the system, then the algorithm cannot take advantage of this convergence.

NONLINEAR CHANGE DETECTION: NONGROWING COMPUTATIONAL COMPLEXITY
In this section, we introduce a new statistic to overcome the problem of growing computational complexity for the change detection method. We emphasize that the parameters of the system before and after change are assumed to be known. Therefore, the conditional density of the state of the system, given the observation, can be calculated using a nonlinear filter. We show that this statistic can be calculated recursively.
Consider the following statistic: The change detection algorithm based on statistic T k j can be presented as where j is the last time when T k j ≥ λ or T k j ≤ −α, and λ > 0 and α > 0 are chosen such that the detection delay is minimized for a fixed mean time between two false alarms.
For the rest of this paper, we assume that the probability of change at time i condition on no change before i is q, that is, Without loss of generality and for simplifying the notation, we assume that j = 1. To calculate the statistic T k 1 , it is sufficient to find P(dx k , t 0 ≤ k|Y k 1 ) and P(dx k , t 0 > k|Y k 1 ). Then T k 1 is given by where W 0 k = P(dx k , t 0 > k|Y k 1 ) and W 1 k = P(dx k , t 0 ≤ k|Y k 1 ). Therefore, to calculate T k 1 recursively, it is sufficient to calculate P(dx k , t 0 ≤ k|Y k 1 ) and P(dx k , t 0 > k|Y k 1 ) recursively. P(dx k+1 , t 0 ≤ k + 1|Y k+1 1 ) can be written as follows: where p 1 (y k+1 |x k+1 ) is the conditional density of the state x k+1 , given the observation y k+1 , under the hypothesis that change has occurred and p 0 (y k+1 |x k+1 ) is the same condi-tional density under the hypothesis that no change has occurred. Similarly, for P(dx k+1 , t 0 > k + 1|Y k+1 1 ), we have the following: Also, we have In (22) and (23), P k+1 0 (dx k+1 |x k ) and P k+1 1 (dx k+1 |x k ) are the Markov transition kernel under the hypothesis that no change has occurred before k + 1 and change has occurred before k, respectively. P(dx k+1 |x k , t 0 = k + 1) is the Markov transition kernel under the hypothesis that the change has occurred at k + 1. For the dynamics in (11), we have P(dx k+1 |x k , t 0 = k + 1) = P k+1 0 (dx k+1 |x k ). (23) show that the statistic T k 1 can be calculated recursively. They also show that in the prediction step of the nonlinear filter at each time instance, only two conditional distributions should be calculated, P(dx k+1 , t 0 ≤ k + 1|Y k 1 ) and P(dx k+1 , t 0 > k + 1|Y k 1 ). Therefore, if a PAF method is used to approximate (22) and (23), we only need two sets of particles to approximate these two conditional distributions. In the Bayes' rule update step and the resampling step of the PAF, (20) and (21) are used. One possible way of implementing (19) in (23) using a PAF method is as follows.

Equations (19)-
In Algorithm 3, the same number of particles is used to find the conditional distribution before and after change. This guarantees that always enough numbers of particles are available for approximating the conditional densities before and after change.
In the remaining sections, we use the introduced statistic T k j for detecting a sudden change in the phase measurement of an integrated INS/GPS.

INTEGRATED INS/GPS
GPS provides worldwide accurate positioning if four or more satellites are in view of the receiver. Although the satellite Step (1). Initialization.
Algorithm 3: Change detection using particle filtering. constellation guarantees availability of four or more satellites worldwide, natural or man-made obstacles can block the satellite signals easily. Integrating dead reckoning or INS with GPS [21,22,23,24] is a method to overcome this vulnerability. Here, INS or the dead reckoning provides positioning that is calibrated by the GPS. In this section, we consider the case of an integrated INS/GPS. In [25], we showed that using nonlinear filtering for positioning is essential. We compared the proposed PAF with regular PAF and EKF.  (a − b))/a Ellipsoid eccentricity Using carrier phase measurements enables the differential GPS to reach centimeter-level accuracy. A phase lock loop cannot measure the full-cycle part of the carrier phase. This unmeasured part is called integer ambiguity that requires to be resolved through other means. In this paper, we assume that the integer ambiguity is resolved (see, e.g., [26]). However, the measured phase can experience a sudden change undetected by the phase lock loop. This sudden change is called the cycle slip and if it is undetected by the integrated INS/GPS, it results in an error in the estimated position. We will use the method introduced in this paper to detect such changes.
We consider the observation equation provided by the ith GPS satellite at time k to have the form where [p x , p y , p z ] T and [b x , b y , b z ] T are the rover and (known) base coordinates at time k, respectively, ρ i (x 1 , x 2 , x 3 ) is the distance from point [x 1 , x 2 , x 3 ] T to satellite i, δ is a combination of the receiver clock bias in the base and the rover, c is the speed of light, v i k is the measurement noise for the ith satellite signal, t 0 is the unknown moment of the cycle slip, and n i (t, t 0 ) = 0 for t < t 0 and n i (t, t 0 ) = n i for t ≥ t 0 , where n i is the change in phase measurement due to the cycle slip.
The main goal in the simulations is to detect the change in the phase measurement as soon as it happens.
Here we point out that the nonlinearity in measurement is not solely due to the function ρ. Integrated INS/GPS requires coordinate transformations between INS parameters and GPS parameters, which contributes to the nonlinearity of the measurement.
Parameters of an integrated INS/GPS are expressed in different coordinate systems. The GPS measurements are given in an earth-centered earth-fixed (ECEF) frame [27,28]. The GPS position is given either in the rectangular coordinate system (see (24)) or in the geodetic coordinate system with the familiar latitude, longitude, and height coordinate vector [p λ , p φ , p h ] T . For the latter, the earth's geoid is approximated by an ellipsoid. Table 1 shows the defining parameters for the geoid according to the WGS84 reference frame. The parameters and measurements of INS are given in the local geographical frame or in the body frame system, where the velocity is given by the north-east-down velocity vec- The transformation matrix from the body frame to the local geographical frame is given by the matrix R b2g . In this paper, we assume the estimation problem for the gyro measurements is solved, hence R b2g is known.
The GPS clock drift and the INS equations constitute key dynamics in integrated INS/GPS. The INS dynamic equation can be expressed as follows (see [25] for details): where g = 9.780327 m/s 2 is the gravitational acceleration, R λ = a(1 − e 2 )/(1 − e 2 sin 2 (p λ )) 3/2 , R φ = a/(1 − e 2 sin 2 (p λ )) 1/2 , [ã u ,ã v ,ã w ] T and [b u , b v , b w ] T are the accelerometer measurement and the accelerometer measurement bias, respectively, both expressed in the body frame, and w v is a vector-valued Brownian motion process with zero mean and known covariance matrix. The measurement bias where w b t is a vector-valued Brownian motion with zero mean and known covariance matrix, and a b is a small positive constant. The receiver clock drift δ t is represented by the integration of an exponentially correlated random process ρ t [24]: where w ρ t is a zero-mean Brownian motion process with variance σ 2 ρ = 10 −24 . This dynamic model is typical for a quartz TCXO with frequency drift rate 10 −9 s/s [24].

SIMULATIONS AND RESULTS
The dimension of the dynamical system in the simulation is eleven. The state of the dynamical system x is given as follows: The differential equation describing the dynamics of the system is the combination of the differential equations in (25), (26), and (27). Here we assume that a b = 0.001 and that the covariance matrices for the Brownian motions in the INS dynamic equations Σ b and Σ v are diagonal. To be more specific, Σ b = 10 −6 I and Σ v = 10 −4 I, where I is the identity matrix of the proper size. The observation equation is given in (24), where y i is one component of the observation vector. The dimension of the observation vector is the same as the number of available satellites. In (24), the observation is given as a function of the position in the ECEF rectangular coordinate system that is then transformed to the geodetic coordinate system [25].
For the simulation, we simply chose an eleven-dimensional Gaussian density for the projection PAF. This choice of density makes the random vector generation easy and computationally affordable. To be able to use the projection PAF, we used the maximum likelihood criteria to estimate the parameters of the Gaussian density before and after Bayes' correction.
We used two Novatel GPS receivers to collect navigation data ( April 2, 2000). From this data, we extracted position information for the satellites, the pseudorange, and the carrier phase measurement noise powers for the L1 frequency. From the collected information we generated the pseudorange and the carrier phase data for one static and one moving receiver (base and rover, respectively). We assume that for the carrier phase measurement, the integer ambiguity problem is already solved (see [26] for details). The movement of the INS/GPS platform was simulation based and it was the basis for the measurement data measured by the accelerometers, the gyros, the GPS pseudorange, and the GPS carrier phase data.
As a precursor, we note that in the simulation in [13] we showed that for an integrated INS/GPS when the number of satellites is less than a critical number, projection PAF provides a very accurate estimate of the position, while the position solution given by EKF is unacceptable. In Figures 2  and 3, a comparison of the position estimation error in the rectangular coordinate system for one typical run of each method is shown. For that simulation, we assumed that the GPS receiver starts with six satellites. At time t = 100, the receiver loses the signal from three satellites, and it gains one satellite signal back at t = 400. From these two figures, it is clear that when the number of satellites in view is below a certain number (here four satellites), the EKF is unable to provide a reasonable estimate of the position for the integrated INS/GPS. Since the error of the position estimate of linearization methods is unacceptable even when no change in the phase measurement occurs, using these methods in the presence of an abrupt change is fruitless as well. To apply our method described in (17) for sudden phase change detection in an integrated INS/GPS, we use projection PAF as our nonlinear filtering method. We use the CUSUM algorithm to evaluate the proposed changed detection scheme. We compare the statistic in (17) with that of the CUSUM algorithm. We wish to emphasize that in the example given in this section, we assume that the parameter of change, before and after the change, is known and the only unknown parameter is the change time. In the future, we will address the more general problem of unknown change parameters.
For the simulation in this paper, we assumed that the phase lock loop associated to satellite one experiences a cycle slip at time t = 20 and the phase changes suddenly. The size of the change is one cycle. We assumed that the GPS receiver starts with six satellites. At time t = 15, the receiver loses three satellites (we eliminate them from the data). We used Algorithm 2 to calculate the statistic in the CUSUM algorithm g k and Algorithm 3 and projection PAF to calculate T k j . In Figure 4, we have plotted T k j with respect to time for 100 independent runs. In Figure 5, we have plotted the statistic g k for the same 100 independent runs. The figures show that there are sudden changes both in T k j and g k when a cycle slip occurs, and this is true for all 100 runs. These figures also show that for this simulation, the performance of the algorithm given in this paper is comparable to the performance of the CUSUM algorithm. This simulation shows that T k j can be used successfully to detect the cycle slip with known strength.

CONCLUSION AND FUTURE WORK
In this paper, we developed a new method for the detection of abrupt changes for known parameters after change. We showed that unlike the CUSUM algorithm, the statistic in this method can be calculated recursively for nonlinear stochastic systems. In the future, we intend to extend our results to the case where the parameters after change are unknown. The major obstacle in this extension is the complexity of the change detection method.