Optimal measurement budget allocation for Kalman prediction over a finite time horizon by genetic algorithms

In this paper, we address the problem of optimal measurement budget allocation to estimate the state of a linear discrete-time dynamical system over a finite horizon. More precisely, our aim is to select the measurement times in order to minimize the variance of the estimation error over a finite horizon. In addition, we investigate the closely related problem of finding a trade-off between number of measurements and signal to noise ratio.First, the optimal measurement budget allocation problem is reduced to a deterministic combinatorial program. Then, we propose a genetic algorithm implementing a count preserving crossover to solve it. On the theoretical side, we provide a one-dimensional analysis that indicates that the benefit of using irregular measurements grows when the system is unstable or when the process noise becomes important. Then, using the duality between estimation and control, we show that the problem of selecting optimal control times for a linear quadratic regulator can be reduced to our initial problem.Finally, numerical implementations demonstrate that using measurement times optimized by our genetic algorithm gives better estimate than regularly spaced measurements. Our method is applied to a discrete version of a continuous-time system and the impact of the discretization time step is studied. It reveals good convergence properties, showing that our method is well suited to both continuous-time and discrete-time setups.


Introduction
Kalman filtering [1] is an algorithm that provides state estimation of a stochastic linear system over time from noisy measurements. It has been applied for many estimation or prediction problems [2][3][4]. For example, a Kalman filter is used in [5] to estimate the position of a camera when the noise statistics are unknown. They propose an adaptive scheme (called adaptative non-linear Kalman filtering) to estimate the noise statistics via sampling the noises. They progressively adjust the filter parameters thanks to these samples. The estimation of the parameters improves with the number of samples. On the other hand, the computational complexity increases with the number of samples. The Kalman filter is also used for compressed sensing. In [6], the authors use an extended linearized Kalman filter and the Steffensen's acceleration method to optimally reconstruct sparse data from noisy measurements. They show that their Kalman-based approach gives similar or better results than traditional methods for 1 minimization such as the primal-dual algorithm or the Orthogonal Matching Pursuit method. The interest of their method is particularly important in the case of high dimensional signals. However, this method is restricted to the case where the signal is sufficiently sparse (the sparsity must be less than half the dimension of the measurements). In most of these applications, measurements are acquired in a regular way, i.e., measurements are equally spaced in time.
Several authors have studied the optimization of Kalman prediction in the presence of a variable rate of measurements or in the presence of unknown statistical properties of the measurement noise. For example, [7] have proposed solutions for the case of irregular sampling rate based on a track to track Kalman filter fusion method. They combine the information from two sensors, the first one is fast rate, regular, delay free but less accurate, and the second one is slow rate, irregular, delayed but more accurate. Two Kalman filters are used to estimate the states based on each type of measurement. Then, the estimates are fused. They show on simulations and on a laboratory experiment that the fused estimation is more precise than the individual ones. In these previous works, variable rate and unknown noise statistics were considered as constraints to be solved while optimizing Kalman prediction. In this paper, on the contrary, we aim to optimally choice the measurement sampling times and their noise level to minimize a measurement budget for operating an optimal Kalman prediction. Sampling times and measurement noise levels are, for our case, parameters to be optimized.
Reducing the number of measurements often has significant advantages. The first one is to reduce the energy consumption that is linked to measurement acquisition. This is essential for mobile applications. A second advantage could be economic if, for example, each measurement acquisition is expensive. A third advantage is linked to acquisition safety issues. That is the reason for relying on Kalman prediction to manage the radiotherapy of mobile tumors, in which the goal is to target a tumor with ionizing radiation. In the case of lung tumors, the target is constantly moving due to the patient's breathing. One option is to track the tumor with imaging X-rays. Unfortunately, with this option each image acquisition also irradiates the patient, including health tissues [8]. The total number of X-ray acquisitions used for tumor tracking must thus be kept below a cutoff to prevent secondary cancer induction. In most cases, it is possible to optimize the measurement times in order to maximize healing while staying below the maximal total irradiation dose.
The claim of this paper is that when the number of measurements is restricted, Kalman prediction will provide better results if the measurements can be selected at the optimal times instead of a regular sampling. This paper addresses the problem of selecting the optimal measurement times to predict the state of a stochastic linear dynamical system from noisy measurements, over a finite time horizon, when the number of measurements is fixed. The optimal measurement times are obtained by minimizing the mean prediction error variance over the complete horizon. This paper focuses on prediction instead of filtering and addresses real-time applications. Indeed, in our targeted applications, prediction allows real-time tracking by compensating for the measurement delay.
However, the proposed method can be adapted straightforwardly for filtering or smoothing applications.
The optimal measurement times can be formulated as the solution of a combinatorial optimization problem. Evolutionary algorithms are a potential effective approach to solve that kind of problem [9]. We propose different variants of genetic algorithms (GA) to solve this problem and demonstrate their efficiency compared to a random trial approach. GA provides an effective approach because they are able to sample a population broadly.

Related works
There is a quite large body of work in the literature, focusing on measurements failures arriving randomly according to a Bernouilli distribution [10][11][12][13][14][15]. In other papers, the missing measurements can only arrive according to certain patterns, and the objective is to design an estimator that is robust to all these patterns [16]. Under these assumptions, different problems have been studied. For example, [11] is interested in estimating the state of a multi-rate multi-sensors system with random missing measurements. They assume that they do not know when a dropout has occurred (when there is one dropout, the received measurement is pure noise). Consequently, they propose a method to detect dropouts. In [17], a system with multiple sensors is subject to denial-of-service (DoS) and false data injection (deception) attacks. The proposed method first detects the attack and isolates the concerned subsystems, then identifies the kind of attack, and uses a resilient observer subsystem.
In [12], a distributed Kalman filter is used in the case of large-scale power systems with random missing measurements. Kalman filtering with random missing measurements has also been studied when the noise variances are only approximately known [13,14]. Using a Lyapunov-based approach, they provide guarantees on the estimation variance.
In the above cited papers, measurement times (or equivalently, the dropout times) are not design variables: they are suffered and not chosen. On the contrary, sensor scheduling problems consist in designing the behavior of the sensors. Typically, a set of sensors is available but only a few of them can be used at the same time. This problem has been studied in the case of an infinite horizon in both the discrete [18][19][20][21][22][23] and the continuous-time settings [24]. Most of the time, the objective is to minimize the average of the variance of the estimation error. In the continuous-time case, Ha et al. show in [24] that the infinite horizon optimal cost and optimal scheduling are independent of the initial error covariance. In addition, they prove that a periodic scheduling can approximate arbitrarily the cost of an optimal scheduling. Similar results are obtained in the discrete-time case in [20,22]. Some works consider a cost associated to the use of the sensors [19,25]. Other works dealing with periodic scheduling associate a usage budget per period to each sensor [21,23].
The finite horizon problem has also received attention in both discrete [25,26] and continuous-time [27][28][29] settings. In 1972, Athans [27] studied the best prediction at the end of an horizon (and not on average over the horizon). In their framework, different sensors are available and a cost is associated with each of them. The goal is to select sensors (one at a time) in order to minimize a trade-off between the sensors' cost and the prediction error. In [28], Lee [29] published in 2010, different sensors are available but only one can be chosen at the same time. There is no budget constraint on the use of each measurement process. In 2012, Vitus et al. [26] proposed an efficient algorithm to choose a sensor among a given set at each discrete time step. No measurement cost or budget constraint on the number of uses of each sensor is considered. Due to the diverse potential applications mentioned above, several contributions related to the problem of optimal selection of the measurement times have appeared in different fields of the engineering literature. In 1970, Sano et al. [30] proposed a solution to the problem of selecting optimal measurement times in the continuous-time one-dimensional case. They proposed an explicit formula when there was only one measurement and proposed a numerical method when the number of measurements was greater than one. Still in the one-dimensional case, Tanaka et al. [31] selected the measurement times to minimize the maximum over time of the error variance. They provided an explicit formula for the measurement times. More recently, Aksenov et al. studied the Brownian motion over similar assumptions [32,33].

Contributions
Despite the attention paid to these related problems for decades, the problem of selecting a given number of measurement times to minimize the average over a finite horizon of the estimation error variance is, to the best of our knowledge, not resolved in the multivariate case. This paper covers that topic. To summarize, the contributions of this paper are the following: (i) three different GAs are proposed and compared to efficiently solve the combinatorial optimization problem; (ii) an analysis of the impact of the model (stability, process noise and measurement noise variances) on the obtained solutions in the one-dimensional case is proposed and interpreted; (iii) links are illustrated between the solutions of the problem and the rank of the observability matrix of the system; (iv) the discretization of a continuous-time system is considered and numerical experiments show that our method is well suited to both discrete-time and continuous-time frameworks; (v) a related problem concerning the trade-off between quality and quantity of measurements is mathematically formalized and illustrated by an example; and (vi) the optimal intermittent linear quadratic regulator (LQR) problem is handled through the duality between estimation and control.
In the conference proceedings [34], we proposed an initial method for optimal intermittent prediction. However, contributions (ii)-(vi) are completely new. In addition, we propose here two new GAs and one of them significantly outperforms the previous one. Finally, a more general definition of the model is used: measurements are not directly related to the quantity to be estimated.

Paper outline
The rest of the paper is organized as follows: Section 2 formalizes the problem mathematically (Sections 2.1 to 2.3) and presents different algorithms to solve it (Section 2.4). Then, a continuous-time version of the problem is presented (Section 2.5). The problem of optimal intermittent LQR is addressed and we show that it can be solved by reduction to the problem of the optimal intermittent Kalman predictor (Section 2.6). Finally, a related problem concerning the compromise between the quality and the quantity of measurements is presented (Section 2.7). Section 3 compares the different proposed ; numerical examples are extensively studied (Section 3.2); links with the observability matrix are illustrated (Section 3.3); a continuous-time example is presented with a particular focus on the impact of the discretization time step (Section 3.4); then, the impact of the system characteristics (stability, variances of the process noise and the measurement noise) on the obtained solution is illustrated (Section 3.5); and finally, the compromise between quality and quantity is illustrated by an example (Section 3.6). Finally, Section 4 presents our conclusions and proposes further avenues of work.

Problem description
We consider the discrete-time framework: t = 0, . . . , T. The set of measurement times is denoted M ⊂ {0, . . . , T − 1} and is constrained to contain only N measurements, i.e., |M| = N with N ≤ T. The evolution dynamic, quantity to estimate and measurements are described by the following equations, where w(t) and v(t) are independent zero-mean white Gaussian noises with covariances  (4) indicates that the initial state follows a known normal distribution with meanx 0 and covariance matrixP 0 . Let us introduce the best a priori mean squared estimator of y(t) according to M, The set M is chosen to minimize the variance of the prediction error norm averaged for each t from 1 to T, it is where · is the Euclidean norm.

Intermittent Kalman predictor
In this section, we show that problem (5) can be reformulated in a more explicit way, thanks to the Kalman formalism. The latter provides explicit formula to compute, for a given measurement set M, the best mean squared estimatorx as the best a priori mean squared estimator of as the best a posteriori mean squared estimator of x(t). In addition, define the a priori and the a posteriori covariance matrices as P(t|t−1) and P(t|t) , respectively. The classical Kalman filtering theory [1] states how to update these four quantities recursively in the case where a measurement is acquired at each time step, i.e., when N = T. In addition, by the linear relation (2) between x(t) and y(t), it holds that We consider the intermittent case by replacing the measurement matrix C with the null matrix when no measurement is available, i.e., when t / ∈ M. In view of Eq. (3), it models the fact that z(t) does not contain any information about the state x(t). Then the equations of the Intermittent Kalman predictor are the time update equations for t = 0, . . . , T − 1. In addition, the measurement update equations are for t = 0, . . . , T. Note that when t / ∈ M, z(t) is not defined in Eq. (11) but can be set to any arbitrary value because it is multiplied by K(t) = 0. In addition, one can see from (9)-(11) that when no measurement is available, i.e., when t / ∈ M, each a posteriori quantity is simply the a priori one, which could have been expected intuitively. The initialization of these recurrence equations are The intermittent Kalman predictor is summarized by Eqs. (6) to (13). We call a Kalman predictor for which the N measurement times are selected as equally spaced as possible a Regular Kalman predictor. It is where Round[ ·] is the rounding operator.

Optimal intermittent Kalman predictor
The optimal intermittent Kalman predictor is the intermittent Kalman predictor for which the set of measurement times M is chosen to minimize the mean error variance, i.e., it is the solution of problem (5). Let us denote the a priori errorỹ(t) := y(t) −ŷ M (t|t − 1). Its covariance matrix can be written Then, the variance of the prediction error norm ỹ(t) can be written in terms of P(t|t−1) as where Tr[ ·] is the trace operator.

Remark 1
The equations that govern covariance matrices, i.e., Eqs. (6), (9) and (10), are independent of the measurements z(t). Consequently, the optimization problem (15) can be solved before measurements are made, i.e., offline. In addition, they are independent of b and d.

Remark 2 Contrary to Eqs. (1) through (4) which are stochastic, problem (15) is deterministic.
In addition, the equations involvingx M andŷ M can be ignored during selection of the measurement times, i.e., the resolution of problem (15), and used only for prediction. Consequently, using the optimal intermittent Kalman predictor consists in firstly computing the optimal measurement times offline and then, doing online prediction. (2). However an optimal set of measurement times given for estimating the y(t), i.e., an optimal solution of problem (15), is not necessarily optimal for estimating the state x(t). Indeed, the objective function of the problem depends on the matrix B that connects y(t) to x(t).

Remark 4 It is possible to compute the distribution of the norm of the squared prediction error ỹ(t) 2 . First, consider the following eigendecomposition, S(t) =
, where is a diagonal matrix whose diagonal entries are the eigenvalues λ 1 (t), . . . , λ n (t) of the symmetric matrix S(t) = BP(t|t)B , and where is a unitary matrix, i.e., = I.
The random variable ỹ(t) 2 can be expressed in terms of ζ , yielding where all ζ 2 i follow an independent chi-squared distribution with one degree of freedom, i.e., ζ 2 i ∼ χ 2 . It shows that ỹ(t) 2 is a linear combination with non-negative coefficients of independent random variables following a chi-squared distribution with one degree of freedom. Then, a closed form formula can be obtained for the cumulative distribution function P[ ỹ(t) 2 < ξ] using the theorem of [35]. From a practical point of view, this quantity can be efficiently estimated using the method proposed in [36].

Optimization algorithms
Solving the combinatorial optimization problem (15) by means of an exhaustive search would require evaluating T! /(N! (T − N)! ) times the cost function. That is computationally intractable. In this section, we propose a random trial (RT) algorithm and various genetic algorithms (GAs) to tackle this problem.
The RT algorithm randomly samples a given number of admissible solutions M, computes their cost and keeps the best one.
In the GA nomenclature, a feasible solution M = {t 1 , . . . , t N } of the optimization problem is called an individual and its corresponding measurement times t i are called its genes. A set of several individuals is called a population.
The GA [37] implements the following steps. crossover are considered: the shuffle crossover (SC), replace crossover (RC) and count preserving crossover (CPC). They are described below. 5. Mutation: Each gene of each individual mutates according to a probability. When a mutation occurs, some measurement times are replaced by random times. This step could create duplicates, i.e., an individual could have repeated times t i = t j with i = j. To avoid this situation, the random times are selected uniformly in {0, . . . , T − 1}\M.
6. Repeat: Come back to step 2 until a convergence criterion is reached.
One pass of steps 2 to 5 is called a generation. Let us present the three different crossover operators. Each of them gives a variant of the genetic algorithm. They are described and a brief example is given.

Shuffle crossover (SC)
For each parent pair, the SC [38] method picks the genes to exchange (each gene has the same probability to be selected) randomly. For example, let two parents (that is, two sets of measurement times M), P 1 = 0 1 3 5 6 7 and P 2 = 0 1 2 3 5 8, where indicates the location of genes selected for the crossover. The obtained offspring are First, one can observe that O 1 contains a duplication of 3, which corresponds to choosing the same measurement time twice. That is not allowed in problem (15) and has to be considered a wasted measurement. In other words, fewer measurements are acquired, which is suboptimal, because adding a new distinct measurement can only decrease the cost.
A second observation is that the second offspring O 2 does not contain a 3, while both parents P 1 and P 2 contain one. That means that common heritage is lost during the crossover, which possibly deteriorates the convergence of the algorithm.
These two observations motivate a careful choice of the crossover method and leads naturally to the next two crossover methods.

Replace crossover (RC)
The RC method is an SC followed by a random replacement of duplicates, taking care to avoid new duplicates. Duplicates are replaced by picking genes uniformly at random in {0, . . . , T − 1}\M. The offspring (17)  With this crossover, the obtained offspring have no duplicates. However, the second offspring O 2 does not contain a 3 while this gene was common to both parents. This illustrates that, like SC, this RC can also suffer from a loss of common heritage.

Count preserving crossover (CPC)
The CPC [38,39] represents each individual by a set (unordered) instead of a vector (ordered). It implements an SC between subsets P 1 \P 2 and P 2 \P 1 . It allows transmission of the common measurement times P 1 ∩ P 2 to both (2021) 2021:39 Page 10 of 24

Fig. 1
Venn diagram representing the count preserving crossover on parents P 1 and P 2 described in (16). The crossover allows gene exchanges between the two filled regions. The underlined genes are exchanged, the result is given in (18) offspring. For parents in example (16), the SC is executed between subsets P 1 \P 2 = {6, 7} and P 2 \P 1 = {2, 8} (see Fig. 1). Possible offspring are then This crossover method makes it possible both to transmit the common genes to all the offspring and to avoid duplicates.
All GA implementations use sigma scaling [37] with the sigma factor equal to 1 (standard value). The crossover operation is applied systematically and mutation occurs with a probability by gene of 0.003 (standard value). The population size is set to 100 individuals and the algorithm stops after 100 generations.

Discretization of a continuous-time system
Many real-world problems are modeled as continuous-time systems. For this reason, we study the continuous-time equivalent of problem (5). It is 1 , 1 An infimum is used because the admissible set of the optimization problem is not compact due to constraint τ k < τ k+1 . This constraint models the fact that two measurements can not be acquired at the same time. When the time is discretized at time step δ, this constraint becomes τ k + δ ≤ τ k+1 which makes the set compact. Then, the corresponding infimum exists and is a minimum. where w c (τ ) ∼ N (0, Q c ) and v(τ k ) ∼ N (0, R) are independent Gaussian white noises. Finally,ŷ(τ ) = E[ y(τ )|z(τ k ) : τ k < τ] is the best mean squared estimator of y(τ ). The exponent c emphasizes that time is continuous. Note that this is a stochastic differential equation for which derivatives have a non-classical sense. The reader is referred to [40] for details. This problem can be exactly discretized at a time step δ =τ /T to fit the formalism of Eqs. (1) through (5). The corresponding quantities are [41], where e · is the matrix exponential operator. In Section 3.4, we study the impact of the discretization time step on the obtained solution.

Optimal intermittent linear quadratic regulator
Duality between estimation and control problems has been known for decades [42]. Roughly, it states that instead of solving a control problem directly, one can solve a related estimation problem and deduce the solution to the control problem from the solution of the estimation problem. The converse is also possible.
In this section, we formalize the problem of optimal intermittent LQR and use duality to show that it can be handled by computing an optimal intermittent Kalman predictor.
Let us consider a controlled linear system with known initial state, whereσ (t) ∈ {0, 1}. This control signalσ (·) determines the times at which the system is controlled. Note thatÃ andB can be time-dependent. The optimal intermittent LQR problem consists of the following minimization, with subject to (23) and (24), whereQ andR can be time-dependent. This problem has been studied in [43] where a closed form formula is provided under some restrictive conditions. The following theorem formalizes how to compute this optimal LQR by computing an optimal intermittent Kalman predictor. Note that at each optimal set of measurement times M can be associated with a signal σ (·) such that σ (t) = 1 if t ∈ M and 0 otherwise.
Proof We want to use Theorem 2.1 from [42]. However, these authors' formalism has two differences from ours.
First, they do not consider missing control times (or missing measurement times). However, this is not restrictive because for a givenσ (t), Eq. (23) can reduce to the form of [42] by consideringσ (t)B as a known time-varying control matrixB(t). Similarly, the measurement matrix C in Eq. (3) can be seen as the null matrix when no measurement is acquired, i.e., when σ (t) = 0 or equivalently t / ∈ M. The second difference with [42] is that they limit the estimation problem to the case where B = I, i.e., y(t) = x(t). However, as shown by Eqs. (6), (9), and (10), the prediction error covariance matrix P(t|t − 1) does not depend on B. Consequently, results from Theorem 2.1 from [42] that concern P(t|t − 1) can be used.
Then, under transformation (28), the objective function of the estimation problem (15) where the trace operator has been removed because its argument is one-dimensional. Taking the minimum overall admissible σ (·) (on the left-hand side) and the corresponding σ (·) (on the right-hand side) gives equality between problems (15) and (25).
Once the signalσ * (·) is fixed, finding the optimal command u(t) is a classic LQR problem. Theorem 2.1 form [42] gives the relation (29).

Quality or quantity of measurements
Instead of considering a fixed number of measurements, an alternative problem is to reach a trade-off between many low-quality measurements and a small number of high-quality measurements. This problem can be handled in the presented formalism. Allowing the choice of the quantity of measurements corresponds to making the number of measurements N a variable to optimize and no longer a given of the problem. However, with only this modification, the optimal solution will always be to take a maximum number of measurements, i.e., N = T. To model the compromise between measurement quality and quantity, an increase in the number of measurements N implies an increase in the noise of measurements, i.e., an increase in R.
In other words, the measurement covariance matrix R must depends on N. We assume R = f (N) for a given function f : {1 where, as previously, constraint (6) holds for t = 0, . . . , T − 1 and constraints (9) and (10) hold for t = 0, . . . , T. This problem is studied in Section 3.6.

Results and discussion
Several results will be illustrated on a discretized spring-mass system. The derivation of these equations will be detailed in Section 3.4. For readability, we anticipate the system finally obtained; it is defined by where δ is the discretization time step.

Comparison of optimization methods
In this section, the different optimization algorithms presented in Section 2.4 are compared. The random trial (RT) method and the three variants of Genetic Algorithms (GA) are compared, i.e., Shuffle Crossover (SC), Replace Crossover (RC), and Count Preserving Crossover (CPC). Figure 2 presents the mean cost and the minimum cost obtained by the four algorithms with respect to the number of cost function evaluations in the case of the spring-mass system described in Section 3.4 by Eqs. (31) and (32) with a discretization time step δ = 0.1 [ s]. The number of measurements is set to N = 70 and the number of time steps is T = 100. The mean and minimum costs obtained with RT are also printed. In addition, the cost for regularly spaced measurement times is printed.
Firstly, one can observe that the minimum and average of the GA with SC has bad convergence behavior (its average cost quickly rises out of the figure). This confirms our previous comments in Section 2.4 about the flaws of this crossover operator (creation of duplicates and loss of common heritage). All other algorithms found a solution better than the regular one in few generations. The regular cost is close to the average RT cost, meaning that regular measurement times are 'typical' random measurement times in this example. The GAs with RC and CPC outperform the RT algorithm. Finally, the GA with CPC quickly outperforms all other proposed algorithms. In addition, the average cost converges to the minimal cost only for the CPC. This means that the entire population converges to a single individual-assumed to be optimal-which is the desired behavior for a GA. Table 1 prints the averages and standard deviations of the optimal costs found by the different algorithms over 100 resolutions. The advantage of the CPC crossover is confirmed and the small standard deviations indicate the reproducibility of the results. In the following, all experiments use the GA that implements CPC.

Numerical examples
To continue the analysis of our method, predictions using our method are compared with the ones obtained with a Kalman predictor with regularly spaced measurement times (see Eq. (14)). After an analysis on the spring-mass system, we will apply our method to a 50dimensional system. To conclude this section, some links with the observability matrix will be highlighted.

Table 1
Average and standard deviation of the optimal costs found by the different algorithms on 100 resolutions. The algorithms are the random trial (RT) method, the genetic algorithm (GA) with shuffle crossover (SC), the GA with replace crossover (RC), the GA with count preserving crossover (CPC). The cost obtained with regularly spaced measurements (see (14)) is also indicated

Spring-mass system
The set of measurement times found by the genetic algorithm is denoted M GA and the set of regularly spaced measurements is denoted M REG . The corresponding mean squared tracking errors are denoted, MSE(M GA ) and MSE(M REG ), respectively. For the same system as in the previous section, i.e., Eqs. (31) and (32) with a discretization time step δ = 0.1 [ s] and with T = 100 time steps and N = 5 measurements, Fig. 3a presents the real y(t) and the predictions obtained with the two predictors. In additions, the regular measurement times M REG and the GA measurement times M GA are printed. One can see that the main difference occurs at the beginning of the tracking. Indeed, our method gets closer faster. Intuitively, this can be understood by the fact that for the intermittent predictor, all measurement times are selected close to the beginning. Over the complete time horizon, the regular approach has a mean squared error of MSE(M REG ) = 0.46 whereas the mean squared error with our method is MSE(M GA ) = 0.36. This is a significant improvement. However, these results are valid only for this particular realization of the dynamic system.
On the contrary, Fig. 3b presents the squared error with respect to time over 100,000 realizations. The mean and the 95% quantile are indicated for the regular Kalman predictor and for our optimal intermittent Kalman predictor method. One can see that the remarks about Fig. 3a hold for the mean and the 95% quantile behavior: our method rapidly produces a better estimate than the Kalman predictor with regular measurements.
Let us introduce the benefit as B = MSE(M REG ) − MSE(M GA ). A positive benefit indicates that our method outperforms the regular Kalman predictor. Figure 4 presents the histogram of the benefit B computed over 100,000 realizations. It shows that the mean benefit is 0.12 and the benefit is positive in 64% of the cases, i.e., our method outperforms the regular Kalman predictor. The results are summarized in the second column of Table 2. Note that the optimal costs correspond to the average mean squared errors.
(a) (b) Fig. 3 a Evolution of a particular realization of y(t) and its estimates with both regular Kalman predictor (Regular) and optimal intermittent Kalman predictor (GA). b Mean squared prediction error with respect to time and 95% quantile over 100,000 realizations for the regular Kalman predictor (Regular) and for our optimal intermittent Kalman prediction method (GA). Regular measurement times and the GA measurement times are also printed on both graphs. Simulations are realized on the system described by Eqs. (31) and (32) with T = 100 time steps and N = 5 measurements

A large dimensional system
We propose here to apply our method to a 50-dimensional system. To this end, we generate a random system as follows: each entry of the matrix A ∈ R 50×50 is picked at random independently from a Gaussian distribution with standard deviation 0.25. We consider b = 0 and G = Q = I. Ten components of the state x(t) are uniformly picked at random to form the objective vector z(t). In other words, each row of the matrix B ∈ {0, 1} 10×50 contains only one 1 and all lines are different. The measurement matrix C ∈ {0, 1} 10×50 is picked at random in the same way but independently of B. We consider d = 0 and the covariance matrix of the process noise is the identity, i.e., R = I. Finally, we considerx 0 = 0 andP 0 = I. This system is studied over T = 50 time steps and N = 25 measurements are allowed.

Table 2
Results for the spring-mass system (Section 3.2.1) and the random system (Section 3.2.2). The cost using regularly spaced measurements and the optimal cost found by the GA are indicated. The mean square errors are indicated (mean and standard deviation over 100,000 realizations). The  The eigenvalues of the randomly sampled matrix A are depicted in Fig. 5a. Among the 50 eigenvalues, 17 are stable and 33 are unstable. Their modulus varies from 0.12 to 2.09.
The optimal measurement times for this problem are computed using the GA with the CPC. Then, the mean squared prediction error using the optimal intermittent Kalman predictor is compared to the one with regular measurements on 100,000 realizations. The average and standard deviation of the mean squared error in the regular case MSE(M REG ) are 10, 374.91 ± 1655.75. Using the optimal measurement times, the average and standard deviation of MSE(M GA ) are 8947.08 ± 1785.55. The benefit B = MSE(M REG ) − MSE(M GA ) is computed and its distribution is depicted in Fig. 5b. The mean benefit is 1427.83 and the benefit is positive for 76% of the realizations. These results are summarized in the third column of Table 2.

Links with observability matrix
To make the results more intuitive, let us present a simple system defined by and consider T = 20 time steps and N = 10 measurements. Jungers et al. show in [44] that the observability matrix of a linear system with missing measurements is given by where σ (t) = 1 if t ∈ M and 0 otherwise. For system (33), because the transition matrix A is a 90 degree rotation matrix, the observability matrix has the following structure,  (19) . (34) From definition (14), a regular measurement set is M REG = {0, 2, . . . , 18}, which leads to the following observability matrix With such regular set of measurement times, the observability matrix is rank deficient. That means that even in the absence of any noise, the state can not be completely determined.
Bearing in mind that the trace of a matrix is the sum of its eigenvalue, then the objective function of problem (15) can be rewritten T t=1 (λ 1 (t) + λ 2 (t))/T. Figure 6, presents the eigenvalues λ 1 (t) > λ 2 (t) of the prediction error covariance matrix S(t) = BP(t|t − 1)B with respect to time. They are presented for regular measurements M REG , for the optimal measurements found by the genetic algorithm but also when a measurement is acquired at each time, i.e., M = {0, . . . , T − 1} and when no measurement is acquired, i.e., M = ∅. In addition, the measurement times are printed for both the regular and the intermittent cases.
One can observe that the largest eigenvalue λ 1 has the same evolution with regular measurements as when no measurement are used. This shows that the regular measurements Fig. 6 The two eigenvalues λ 1 > λ 2 of the covariance matrix of the prediction error S(t) with respect to time t when no measurement is acquired, i.e., M = ∅ (No); when regular measurement times M REG are used (Regular); when optimal measurement times M GA are used (GA); and when there is measurement at each times, i.e., M = {0, . . . , T − 1} (All). Regular measurement times and the GA measurement times are also printed. The considered system is described by (33) with T = 20 time steps and N = 10 measurements for the regular and the GA cases (2021) 2021:39 Page 19 of 24 have no effect on the largest eigenvalue λ 1 . This observation is compatible with the rank deficiency of the observability matrix (35). On the contrary, in the case of the measurement times found by the genetic algorithm, both eigenvalues remain significantly smaller than the ones without measurements. In addition, the measurement times selected by the GA are composed of five pairs of successive measurements. The structure of the observability matrix (34) ensures that each line is non zero.

Continuous-time example
This subsection presents an example of the continuous-time case presented in Section 2.5. More precisely, we are interested in the impact of the number of discretization steps T (or equivalently, the impact of the discretization step size δ) on the obtained solution.
We want to solve the problem defined by Eqs. (19) and (20) for the spring-mass system subject to a random force w c , The corresponding discrete system can be obtained thanks to (21) and (22) and, in the particular case k = m, it gives Eq. (31) and δ − sin δ cos δ sin 2 δ − sin 2 δ δ+ sin δ cos δ .
We set m = 40 [ kg] and k = 40 [ N/m] and finally obtain Eqs. (31) and (32), which had been anticipated at the beginning of Section 3. In this subsection, the time horizon isτ = 100 [ s] and the number of measurements is fixed to N = 5.
The measurement times are computed by solving the discretized problem using the GA. Then, the quality of the obtained solution is assessed by looking at the cost function of the continuous-time problem (19). It is done thanks to the continuous-discrete Kalman filter [45]. The results are presented in Fig. 7. Figure 7a presents the measurement times found by the GA and the measurement times for the regular Kalman predictor. For the discrete-time system corresponding to the continuous system (36)- (39), graph a presents the regular measurement times M REG given by (14) and the times M GA found by the genetic algorithm (GA) with respect to the number of discretizarion steps T. Graph b presents the corresponding cost function values (19) Firstly, the measurement times found by the GA seem to converge when the number of discretization steps T increases (or equivalently, when δ decreases). In addition, they are more concentrated at the beginning. Furthermore, one can observe that even for the regular Kalman predictor, the measurement times vary with the number of discretization steps T. This is due to the fact that the measurement times are constraints to belong in the discrete-time set. Algebraically, it is the Round operator in Eq. (14). When the number of discretization steps T increases, the rounding effect decreases. Figure 7b depicts the corresponding cost values for the continuous-time problem (19). It shows that the cost for optimal intermittent measurements is systematically smaller than in the regular case. This should be expected because the regular measurement times are in the admissible domain of problem (19)- (20). If the regular measurements had smaller cost function value, the GA would probably have selected it. The roughly decreasing and converging behavior of the cost function with respect to the number of discretization steps shows that our method can be used for continuous-time systems using an appropriate discretization scheme.

Impact of stability, process noise variance and measurement noise variance
In this section, the impact of the system's stability, process noise variance Q and measurement noise variance R on the solutions are studied in the one-dimensional case. More precisely, we compare the intermittent Kalman predictor to the regular Kalman predictor for different values of Q and R. The experiments are restricted to one-dimensional system, for the stable case |A| < 1, the marginally stable case |A| = 1 and the unstable case |A| > 1. Finally, we comment on the performance in the different cases.
The studied system is a simple version of (1)- (4) where all quantities are onedimensional and with b = 0, G = 1, B = 1, C = 1, d = 0. This problem is studied for all A ∈ {1/2, 1, 2} and for Q and R in the range [ 0.01, 100]. Results are presented in Fig. 8. For graphs a-c, A = 1/2; for graphs d-f, A = 1, and for graphs g-i, A = 2. In graphs a, d, and g, Q and R vary. In graphs b, e, and h, R = 50 and Q varies. Finally, in graphs c, f, and i, Q = 50 and R varies. In all cases, the optimal intermittent Kalman predictor gives a smaller cost than with regular measurements. For the stable system, i.e., A = 1/2 in graphs a-c, the intermittent Kalman predictor gives results very similar to the regular Kalman predictor. In other cases, i.e., A ∈ {1, 2} in graphs d-i, the improvement due to the use of intermittent measurements instead of regular ones is greater. This improvement increases when R increases. The unstable case, i.e., A = 2 in graphs g-i, is the one in which the improvement is the most significant. This improvement increases when either Q or R increase.

Quality or quantity of measurements: an example
In this section, we study the compromise between few precise measurements and many more noisy measurements presented in Section 2.7. The problem (30) is studied for the dynamical system (31) and (32) for T = 100 time steps. The measurement noise variance is related to the number of measurements as R = f (N) = N α for different α > 0. The larger the α, the more an additional measurement increases the measurement noise. Figure 9 presents the cost (30) for all N ∈ {1, . . . , T} and for the regular measurements (14) and the irregular measurements found by the genetic algorithm. In each case, the minimum is indicated. Results are presented for all values of α ∈   (14) and the irregular measurements found by the genetic algorithm (GA). The minimum cost with respect to N is depicted in each case. The considered dynamic system is (31) and (32)  For the tested values of α, the optimal number of measurement times for the regular measurements is N = 1 when α ≥ 1.25 and N = 20 = T when α ≤ 1. In comparison, for the intermittent measurements the optimal number of measurements is N = 1 only for α ≥ 2 and increases progressively when α decreases. These observations illustrate that when measurements are expensive, i.e., α is high, only few measurements will be acquire. Conversely, when measurements are cheap, i.e., α is small, many measurements will be acquired.
Another observation is that the cost difference between the regular measurements and the intermittent varies with N. When N = T, the regular and intermittent measurements sets are the same because the only admissible set is the set with all measurement times, i.e., M = {0, . . . , T −1}. More generally, the size of the admissible domain of the optimization problem (30) is smaller when N is close to 1 or T. Using intermittent measurements is most justified when N is between 10 and 60.

Conclusion
This paper addresses the problem of selecting the optimal measurement times for Kalman prediction over a finite time horizon. A random trial algorithm and three variants of genetic algorithms are proposed to solve it and the genetic algorithm that implements a count preserving crossover is shown to outperform the others. The tracking performances are extensively demonstrated on a numerical example, showing an improvement in 64% of the cases in comparison with regularly spaced measurement times. Then, the case of continuous-time systems is considered through a spring-mass system. The