2.1 Problem description
We consider the discrete-time framework: \(t=0,\dots,T\). The set of measurement times is denoted \(\mathcal {M}\subset \{0,\dots,T-1\}\) and is constrained to contain only N measurements, i.e., \(|\mathcal {M}|=N\) with N≤T. The evolution dynamic, quantity to estimate and measurements are described by the following equations,
$$\begin{array}{*{20}l} x(t+1) &=\ Ax(t)+b+Gw(t) \qquad t=0,\dots,T-1, \end{array} $$
(1)
$$\begin{array}{*{20}l} y(t) &=\ Bx(t) \qquad\qquad\qquad\quad\,\,\, t=0,\dots,T, \end{array} $$
(2)
$$\begin{array}{*{20}l} z(t) &=\ Cx(t)+d+v(t) \ t\in\mathcal{M}, \end{array} $$
(3)
$$\begin{array}{*{20}l} x(0) &\sim \ \mathcal{N}(\bar{x}_{0},\bar{P}_{0}), \end{array} $$
(4)
where w(t) and v(t) are independent zero-mean white Gaussian noises with covariances \(\mathbb {E}[w(t_{1})w(t_{2})^{\top }]=Q\delta (t_{1}-t_{2})\) and \(\mathbb {E}[v(t_{1})v(t_{2})^{\top }]=R\delta (t_{1}-t_{2})\), and δ(·) is Kronecker’s delta. Column vectors x(t),y(t),z(t) and w(t) have sizes m, n, p and q respectively. The dimensions of other quantities are deduced from compatibility. Matrices and vectors A, b, G, B, C, d, \(\bar {x}_{0},\bar {P}_{0}\), Q and R are known and could be time-dependent. Equation (1) describes the dynamic of internal state x, Eq. (2) gives the quantity to estimate y and Eq. (3) expresses the measurement z. Note that the last holds only for \(t\in \mathcal {M}\), where \(\mathcal {M}\) is the set of measurement times. Relation (4) indicates that the initial state follows a known normal distribution with mean \(\bar {x}_{0}\) and covariance matrix \(\bar {P}_{0}\).
Let us introduce the best a priori mean squared estimator of y(t) according to \(\mathcal {M}\), namely, \(\hat {y}_{\mathcal {M}}(t|t-1) := \mathbb {E}[ y(t) | z(\tau): \tau \in \mathcal {M},\tau < t]\). The set \(\mathcal {M}\) is chosen to minimize the variance of the prediction error norm averaged for each t from 1 to T, it is
$$\begin{array}{*{20}l} \min_{\mathcal{M}\subset\{0,\cdots,T-1\}}\frac{1}{T}\sum_{t=1}^{T} \mathbb{E} \left[ \| y(t)-\hat{y}_{\mathcal{M}}(t|t-1) \|^{2} \right] \ \text{subject to}\ |\mathcal{M}|=N, \end{array} $$
(5)
where ∥·∥ is the Euclidean norm.
2.2 Intermittent Kalman predictor
In this section, we show that problem (5) can be reformulated in a more explicit way, thanks to the Kalman formalism. The latter provides explicit formula to compute, for a given measurement set \(\mathcal {M}\), the best mean squared estimator \(\hat {x}_{\mathcal {M}}(t|t-1)\) of x(t) from which one can compute \(\hat {y}_{\mathcal {M}}(t|t-1)\).
Let us introduce \(\hat {x}_{\mathcal {M}}(t|t-1):=\mathbb {E}[x(t)|z(\tau): \tau \in \mathcal {M},\tau < t]\) as the best a priori mean squared estimator of x(t) and \(\hat {x}_{\mathcal {M}}(t|t):=\mathbb {E}[x(t)|z(\tau): \tau \in \mathcal {M},\tau \leq t]\) as the best a posteriori mean squared estimator of x(t). In addition, define the a priori and the a posteriori covariance matrices as \(P(t|t-1) := \mathbb {E}[(x(t)-\hat {x}_{\mathcal {M}}(t|t-1))(x(t)-\hat {x}_{\mathcal {M}}(t|t-1))^{\top }]\) and \(P(t|t) := \mathbb {E}[(x(t)-\hat {x}_{\mathcal {M}}(t|t))(x(t)-\hat {x}_{\mathcal {M}}(t|t))^{\top }]\), respectively. The classical Kalman filtering theory [1] states how to update these four quantities recursively in the case where a measurement is acquired at each time step, i.e., when N=T. In addition, by the linear relation (2) between x(t) and y(t), it holds that
$$\begin{array}{*{20}l} \hat{y}_{\mathcal{M}}(t|t-1)=B\hat{x}_{\mathcal{M}}(t|t-1). \end{array} $$
We consider the intermittent case by replacing the measurement matrix C with the null matrix when no measurement is available, i.e., when \(t\notin \mathcal {M}\). In view of Eq. (3), it models the fact that z(t) does not contain any information about the state x(t). Then the equations of the Intermittent Kalman predictor are the time updateequations
$$\begin{array}{*{20}l} P(t+1|t) &= AP(t|t)A^{\top} + GQG^{\top}, \end{array} $$
(6)
$$\begin{array}{*{20}l} \hat{x}_{\mathcal{M}}(t+1|t) &= A\hat{x}_{\mathcal{M}}(t|t) + b, \end{array} $$
(7)
$$\begin{array}{*{20}l} \hat{y}_{\mathcal{M}}(t+1|t) &= B\hat{x}_{\mathcal{M}}(t+1|t), \end{array} $$
(8)
for \(t=0,\dots,T-1\). In addition, the measurement updateequations are
$$\begin{array}{*{20}l} K(t) &= \left\{\begin{array}{ll} P(t|t-1)C^{\top}[ CP(t|t-1)C^{\top}+R ]^{-1} & \text{if}\ t\in\mathcal{M} \\ 0 & \text{else}, \end{array}\right. \end{array} $$
(9)
$$\begin{array}{*{20}l} P(t|t) &= (I-K(t)C)P(t|t-1), \end{array} $$
(10)
$$\begin{array}{*{20}l} \hat{x}_{\mathcal{M}}(t|t) &= \hat{x}_{\mathcal{M}}(t|t-1)+K(t)[ z(t)-C\hat{x}_{\mathcal{M}}(t|t-1)-d ], \end{array} $$
(11)
$$\begin{array}{*{20}l} \hat{y}_{\mathcal{M}}(t|t) &= B\hat{x}_{\mathcal{M}}(t|t), \end{array} $$
(12)
for \(t=0,\dots,T\). Note that when \(t\notin \mathcal {M},z(t)\) is not defined in Eq. (11) but can be set to any arbitrary value because it is multiplied by K(t)=0. In addition, one can see from (9)-(11) that when no measurement is available, i.e., when \(t\notin \mathcal {M}\), each a posteriori quantity is simply the a priori one, which could have been expected intuitively. The initializationof these recurrence equations are
$$\begin{array}{*{20}l} \begin{array}{ccc} P(0|-1)=\bar{P}_{0}, & \hat{x}_{\mathcal{M}}(0|-1)=\bar{x}_{0}, & \hat{y}_{\mathcal{M}}(0|-1)=B\bar{x}_{0}. \end{array} \end{array} $$
(13)
The intermittent Kalman predictor is summarized by Eqs. (6) to (13).
We call a Kalman predictor for which the N measurement times are selected as equally spaced as possible a Regular Kalman predictor. It is
$$\begin{array}{*{20}l} \mathcal{M}_{\text{REG}}:=\left\{\left. \text{Round}\left[\frac{kT}{N}\right] \right| k=0,\dots,N-1 \right\}, \end{array} $$
(14)
where Round[·] is the rounding operator.
2.3 Optimal intermittent Kalman predictor
The optimal intermittent Kalman predictor is the intermittent Kalman predictor for which the set of measurement times \(\mathcal {M}\) is chosen to minimize the mean error variance, i.e., it is the solution of problem (5).
Let us denote the a priori error \(\tilde {y}(t):=y(t)-\hat {y}_{\mathcal {M}}(t|t-1)\). Its covariance matrix can be written
$$\begin{array}{*{20}l} S(t)&:= \mathbb{E}\left[\tilde{y}(t)\tilde{y}(t)^{\top}\right] \\ &=\mathbb{E}\left[ (y(t)-\hat{y}_{\mathcal{M}}(t|t-1)) (y(t)-\hat{y}_{\mathcal{M}}(t|t-1))^{\top}\right] \\ &=\mathbb{E}\left[ (Bx(t)-B\hat{x}_{\mathcal{M}}(t|t-1)) (Bx(t)-B\hat{x}_{\mathcal{M}}(t|t-1))^{\top} \right]\\ &=B\mathbb{E}\left[ (x(t)-\hat{x}_{\mathcal{M}}(t|t-1)) (x(t)-\hat{x}_{\mathcal{M}}(t|t-1))^{\top} \right]B^{\top}\\ &=BP(t|t-1)B^{\top}. \end{array} $$
Then, the variance of the prediction error norm \({\|\tilde {y}(t)\|}\) can be written in terms of P(t|t−1) as
$$\begin{array}{*{20}l} \mathbb{E}[\| \tilde{y}(t)\|^{2}] &=\text{Tr}[S(t)] = \text{Tr}[BP(t|t-1)B^{\top}], \end{array} $$
where Tr[·] is the trace operator.
Thanks to the previous equations, problem (5) can be reformulated as
$$\begin{array}{*{20}l} &\min_{\mathcal{M}\subset \{0,\dots,T-1\}}\frac{1}{T} \sum_{t=1}^{T}\text{Tr}[BP(t|t-1)B^{\top}] \ \text{subject to} \\ &|\mathcal{M}| = N, \text{Eqs.~(6),\ (9) and (10) with}\ P(0|-1)=\bar{P}_{0}, \end{array} $$
(15)
where constraint (6) holds for \(t=0,\dots,T-1\) and constraints (9) and (10) hold for \(t=0,\dots,T\).
Remark 1
The equations that govern covariance matrices, i.e., Eqs. (6), (9) and (10), are independent of the measurements z(t). Consequently, the optimization problem (15) can be solved before measurements are made, i.e., offline. In addition, they are independent of b and d.
Remark 2
Contrary to Eqs. (1) through (4) which are stochastic, problem (15) is deterministic.
In addition, the equations involving \(\hat {x}_{\mathcal {M}}\) and \(\hat {y}_{\mathcal {M}}\) can be ignored during selection of the measurement times, i.e., the resolution of problem (15), and used only for prediction. Consequently, using the optimal intermittent Kalman predictor consists in firstly computing the optimal measurement times offline and then, doing online prediction.
Remark 3
Note that x(t) and y(t) are linearly related by relation (2). However an optimal set of measurement times given for estimating the y(t), i.e., an optimal solution of problem (15), is not necessarily optimal for estimating the state x(t). Indeed, the objective function of the problem depends on the matrix B that connects y(t) to x(t).
Remark 4
It is possible to compute the distribution of the norm of the squared prediction error \(\|\tilde {y}(t)\|^{2}\). First, consider the following eigendecomposition, S(t)=ΣΛΣ⊤, where Λ is a diagonal matrix whose diagonal entries are the eigenvalues \(\lambda _{1}(t),\dots,\lambda _{n}(t)\) of the symmetric matrix S(t)=BP(t|t)B⊤, and where Σ is a unitary matrix, i.e., Σ⊤Σ=I.
Define \(\zeta :=\Lambda ^{-1/2}\Sigma ^{\top }\tilde {y}(t)\). Note that Σ,Λ and ζ depend on t but this dependency is omitted for clarity. Each random variable ζ follows a standard normal distribution \(\mathcal {N}(0,I)\). Indeed, it is a linear transformation of a centered Gaussian random variable so it is also a central Gaussian random variable. In addition, its covariance matrix is identity. It can be computed using the commutativity of diagonal matrices and the fact that Σ is unitary, so that we obtain,
$$\begin{array}{*{20}l} \mathbb{E}\left[\zeta \zeta^{\top}\right]&=\mathbb{E}\left[ \left(\Lambda^{-1/2}\Sigma^{\top} \tilde{y}(t) \right) \left(\Lambda^{-1/2}\Sigma^{\top} \tilde{y}(t) \right)^{\top}\right]\\ &= \Lambda^{-1/2}\Sigma^{\top} \mathbb{E}\left[ \tilde{y}(t) \tilde{y}(t)^{\top}\right] \Sigma \Lambda^{-1/2}\\ &= \Lambda^{-1/2}\Sigma^{\top} \Sigma \Lambda\Sigma^{\top} \Sigma \Lambda^{-1/2}\\ &=I. \end{array} $$
The random variable \(\|\tilde {y}(t)\|^{2}\) can be expressed in terms of ζ, yielding
$$\begin{array}{*{20}l} \|\tilde{y}(t)\|^{2}&=\tilde{y}(t)^{\top}\tilde{y}(t)\\ &=(\Sigma\Lambda^{1/2}\zeta)^{\top}(\Sigma\Lambda^{1/2}\zeta)\\ &=\zeta^{\top}\Lambda^{1/2}\Sigma^{\top}\Sigma\Lambda^{1/2}\zeta\\ &=\zeta^{\top}\Lambda\zeta\\ &=\sum_{i=1}^{n}\lambda_{i}(t)\zeta_{i}^{2}, \end{array} $$
where all \(\zeta _{i}^{2}\) follow an independent chi-squared distribution with one degree of freedom, i.e., \(\zeta _{i}^{2}\sim \chi ^{2}\). It shows that \(\|\tilde {y}(t)\|^{2}\) is a linear combination with non-negative coefficients of independent random variables following a chi-squared distribution with one degree of freedom. Then, a closed form formula can be obtained for the cumulative distribution function \(\mathbb {P}[\|\tilde {y}(t)\|^{2}<\xi ]\) using the theorem of [35]. From a practical point of view, this quantity can be efficiently estimated using the method proposed in [36].
2.4 Optimization algorithms
Solving the combinatorial optimization problem (15) by means of an exhaustive search would require evaluating T!/(N!(T−N)!) times the cost function. That is computationally intractable. In this section, we propose a random trial (RT) algorithm and various genetic algorithms (GAs) to tackle this problem.
The RT algorithm randomly samples a given number of admissible solutions \(\mathcal {M}\), computes their cost and keeps the best one.
In the GA nomenclature, a feasible solution \(\mathcal {M}=\{t_{1},\dots,t_{N}\}\) of the optimization problem is called an individual and its corresponding measurement times ti are called its genes. A set of several individuals is called a population.
The GA [37] implements the following steps.
-
1.
Initialization: An initial population is uniformly sampled on the admissible set of problem (15).
-
2.
Evaluation: For each individual in the population, the cost is evaluated according to the objective function of problem (15).
-
3.
Selection: Individuals with low costs are preferably selected according to stochastic universal sampling [37].
-
4.
Crossover: Selected individuals are mixed to produce new individuals called the offspring. This reconstitutes a complete population. Three different types of crossover are considered: the shuffle crossover (SC), replace crossover (RC) and count preserving crossover (CPC). They are described below.
-
5.
Mutation: Each gene of each individual mutates according to a probability. When a mutation occurs, some measurement times are replaced by random times. This step could create duplicates, i.e., an individual could have repeated times ti=tj with i≠j. To avoid this situation, the random times are selected uniformly in \(\{0,\dots,T-1\}\backslash \mathcal {M}\).
-
6.
Repeat: Come back to step 2 until a convergence criterion is reached.
One pass of steps 2 to 5 is called a generation.
Let us present the three different crossover operators. Each of them gives a variant of the genetic algorithm. They are described and a brief example is given.
Shuffle crossover (SC) For each parent pair, the SC [38] method picks the genes to exchange (each gene has the same probability to be selected) randomly. For example, let two parents (that is, two sets of measurement times \(\mathcal {M}\)),
$$ P_{1}=0\ \underline{1}\ 3\ \underline{5}\ \underline{6}\ 7\ \text{and}\ P_{2}=0\ \underline{1}\ 2\ \underline{3}\ \underline{5}\ 8, $$
(16)
where \(\underline {\ }\) indicates the location of genes selected for the crossover. The obtained offspring are
$$ O_{1}=0\ 1\ 3\ 3\ 5\ 7\ \text{and}\ O_{2}=0\ 1\ 2\ 5\ 6\ 8. $$
(17)
First, one can observe that O1 contains a duplication of 3, which corresponds to choosing the same measurement time twice. That is not allowed in problem (15) and has to be considered a wasted measurement. In other words, fewer measurements are acquired, which is suboptimal, because adding a new distinct measurement can only decrease the cost.
A second observation is that the second offspring O2 does not contain a 3, while both parents P1 and P2 contain one. That means that common heritage is lost during the crossover, which possibly deteriorates the convergence of the algorithm.
These two observations motivate a careful choice of the crossover method and leads naturally to the next two crossover methods.
Replace crossover (RC) The RC method is an SC followed by a random replacement of duplicates, taking care to avoid new duplicates. Duplicates are replaced by picking genes uniformly at random in \(\{0,\dots,T-1\}\backslash \mathcal {M}\). The offspring (17) become, for example
$$O_{1}=0\ 1\ 3\ 5\ 7\ \underline{8}\ \text{and}\ O_{2}=0\ 1\ 2\ 5\ 6\ 8, $$
where \(\underline {\ }\) indicates the new gene. Note that the second offspring remains unchanged because it does not contain duplicates.
With this crossover, the obtained offspring have no duplicates. However, the second offspring O2 does not contain a 3 while this gene was common to both parents. This illustrates that, like SC, this RC can also suffer from a loss of common heritage.
Count preserving crossover (CPC) The CPC [38, 39] represents each individual by a set (unordered) instead of a vector (ordered). It implements an SC between subsets P1∖P2 and P2∖P1. It allows transmission of the common measurement times P1∩P2 to both offspring. For parents in example (16), the SC is executed between subsets P1∖P2={6,7} and P2∖P1={2,8} (see Fig. 1). Possible offspring are then
$$ O_{1}=0\ 1\ \underline{2}\ 3\ 5\ 6\ \text{and}\ O_{2}=0\ 1\ 3\ 5\ \underline{7}\ 8. $$
(18)
This crossover method makes it possible both to transmit the common genes to all the offspring and to avoid duplicates.
All GA implementations use sigma scaling [37] with the sigma factor equal to 1 (standard value). The crossover operation is applied systematically and mutation occurs with a probability by gene of 0.003 (standard value). The population size is set to 100 individuals and the algorithm stops after 100 generations.
2.5 Discretization of a continuous-time system
Many real-world problems are modeled as continuous-time systems. For this reason, we study the continuous-time equivalent of problem (5). It isFootnote 1,
$$\begin{array}{*{20}l} \inf_{0\leq \tau_{0}<\tau_{1}<\cdots<\tau_{N-1}\leq\bar{\tau}} \frac{1}{\bar{\tau}} \int_{0}^{\bar{\tau}} \mathbb{E}[\| y(\tau')-\hat{y}(\tau') \|^{2}]d\tau', \end{array} $$
(19)
such that
$$\begin{array}{*{20}l} \begin{array}{rll} \frac{dx}{d\tau}(\tau) &= A^{c}x(\tau)+b^{c}+G^{c}w^{c}(\tau) & \tau\in[0,\bar{\tau}], \\ y(\tau) &= B^{c}x(\tau) & \tau\in[0,\bar{\tau}], \\ z(\tau_{k}) &= Cx(\tau_{k})+d+v(\tau_{k}) &k=0,\dots,N-1,\\ x(0) &\sim \mathcal{N}(\bar{x}_{0},\bar{P}_{0}),&\end{array} \end{array} $$
(20)
where \(w^{c}(\tau)\sim \mathcal {N}(0,Q^{c})\) and \(v(\tau _{k})\sim \mathcal {N}(0,R)\) are independent Gaussian white noises. Finally, \(\hat {y}(\tau)=\mathbb {E}[y(\tau)|z(\tau _{k}): \tau _{k}<\tau ]\) is the best mean squared estimator of y(τ). The exponent c emphasizes that time is continuous. Note that this is a stochastic differential equation for which derivatives have a non-classical sense. The reader is referred to [40] for details.
This problem can be exactly discretized at a time step \(\delta =\bar {\tau }/T\) to fit the formalism of Eqs. (1) through (5). The corresponding quantities are [41],
$$\begin{array}{*{20}l} A&=e^{A^{c}\delta},\ B=B^{c},\ b=\int_{0}^{\delta} e^{A^{c}\tau}d\tau b^{c}, \end{array} $$
(21)
$$\begin{array}{*{20}l} Q&=\int_{0}^{\delta} e^{A^{c}\tau} G^{c}Q^{c}(G^{c})^{\top} e^{(A^{c})^{\top} \tau }d\tau,\ G=1, \end{array} $$
(22)
where e· is the matrix exponential operator.
In Section 3.4, we study the impact of the discretization time step on the obtained solution.
2.6 Optimal intermittent linear quadratic regulator
Duality between estimation and control problems has been known for decades [42]. Roughly, it states that instead of solving a control problem directly, one can solve a related estimation problem and deduce the solution to the control problem from the solution of the estimation problem. The converse is also possible.
In this section, we formalize the problem of optimal intermittent LQR and use duality to show that it can be handled by computing an optimal intermittent Kalman predictor.
Let us consider a controlled linear system with known initial state,
$$\begin{array}{*{20}l} x(t+1)&=\tilde{A}x(t)+\tilde{\sigma}(t)\tilde{B}u(t),\ \text{for}\ t=0,\dots,T-1, \end{array} $$
(23)
$$\begin{array}{*{20}l} x(0)&=\tilde{x}_{0}, \end{array} $$
(24)
where \(\tilde {\sigma }(t)\in \{0,1\}\). This control signal \(\tilde {\sigma }(\cdot)\) determines the times at which the system is controlled. Note that \(\tilde {A}\) and \(\tilde {B}\) can be time-dependent.
The optimal intermittent LQR problem consists of the following minimization,
$$\begin{array}{*{20}l} &\tilde{\sigma}^{*}(\cdot)=\text{arg}\min_{\tilde{\sigma}(\cdot)}V(\tilde{\sigma}(\cdot)) \text{\ such\ that}\ \sum_{t=0}^{T-1}\sigma(t)=N, \end{array} $$
(25)
with
$$\begin{array}{*{20}l} V(\tilde{\sigma}(\cdot))=\min_{u(\cdot)}\left\{x(T)^{\top} \tilde{Q}_{f}x(T)+\sum_{t=0}^{T-1}x(t)^{\top} \tilde{Q}x(t) + u(t)^{\top} \tilde{R}u(t) \right\}\\ \text{subject\ to\ (23)\ and\ (24)}, \end{array} $$
where \(\tilde {Q}\) and \(\tilde {R}\) can be time-dependent. This problem has been studied in [43] where a closed form formula is provided under some restrictive conditions.
The following theorem formalizes how to compute this optimal LQR by computing an optimal intermittent Kalman predictor. Note that at each optimal set of measurement times \(\mathcal {M}\) can be associated with a signal σ(·) such that σ(t)=1 if \(t\in \mathcal {M}\) and 0 otherwise.
Theorem 1
The optimal solution \(\tilde {\sigma }^{*}(\cdot)\)of the problem (25) is given by \(\tilde {\sigma }^{*}(t)=\sigma ^{*}(T-t)\) where σ∗(·) is the optimal solution of the estimation problem (15) solved for parameters
$$\begin{array}{*{20}l} &A(t)=\tilde{A}(T-t)^{\top},\ Q(t)=\tilde{Q}(T-t),\ R(t)=\tilde{R}(T-t), \end{array} $$
(26)
$$\begin{array}{*{20}l} &b(t)=0,\ G=I,\ C(t)=\tilde{B}(T-t)^{\top},\ \bar{P}_{0}=\tilde{Q}_{f},\ d(t)=0,\ \text{and} \end{array} $$
(27)
$$\begin{array}{*{20}l} &B(t)=\delta(T-t)\tilde{x}_{0}^{\top}, \end{array} $$
(28)
where δ(T−t) is 1 if t=T, and 0 otherwise.
In addition, the optimal control law is
$$\begin{array}{*{20}l} u(t)=-K_{\sigma^{*}(\cdot)}(t)^{\top} A(t)^{\top} x(t), \end{array} $$
(29)
where \(\phantom {\dot {i}\!}K_{\sigma ^{*}(\cdot)}(t)\) is given by (9).
Proof
We want to use Theorem 2.1 from [42]. However, these authors’ formalism has two differences from ours.
First, they do not consider missing control times (or missing measurement times). However, this is not restrictive because for a given \(\tilde {\sigma }(t)\), Eq. (23) can reduce to the form of [42] by considering \(\tilde {\sigma }(t)\tilde {B}\) as a known time-varying control matrix \(\bar {B}(t)\). Similarly, the measurement matrix C in Eq. (3) can be seen as the null matrix when no measurement is acquired, i.e., when σ(t)=0 or equivalently \(t\notin \mathcal {M}\).
The second difference with [42] is that they limit the estimation problem to the case where B=I, i.e., y(t)=x(t). However, as shown by Eqs. (6), (9), and (10), the prediction error covariance matrix P(t|t−1) does not depend on B. Consequently, results from Theorem 2.1 from [42] that concern P(t|t−1) can be used.
Thus, for a given signal \(\tilde {\sigma }(\cdot)\), Theorem 2.1 from [42] states that \(V(\tilde {\sigma }(\cdot)) = \tilde {x}_{0}^{\top } \tilde {P}_{\tilde {\sigma }(\cdot)}\tilde {x}_{0}\) where \(\tilde {P}_{\tilde {\sigma }(\cdot)} = P_{\sigma (\cdot)}(T|T-1)\) and the last is given by Eqs. (6), (9), and (10), under transformations \(\sigma (t)=\tilde {\sigma }(T-t)\), (26) and (27).
Then, under transformation (28), the objective function of the estimation problem (15) is
$$\begin{array}{*{20}l} \sum_{t=1}^{T} \text{Tr}[B(t)P_{\sigma(\cdot)}(t|t-1)B(t)^{\top}] &=\sum_{t=1}^{T} \text{Tr}[\delta(T-t)\tilde{x}_{0}^{\top} P_{\sigma(\cdot)}(t|t-1)\tilde{x}_{0}]\\ &=\tilde{x}_{0}^{\top} P_{\sigma(\cdot)}(T|T-1)\tilde{x}_{0}\\ &=\tilde{x}_{0}^{\top} \tilde{P}_{\tilde{\sigma}(\cdot)}(0)\tilde{x}_{0}\\ &=V(\tilde{\sigma}(\cdot)), \end{array} $$
where the trace operator has been removed because its argument is one-dimensional. Taking the minimum overall admissible σ(·) (on the left-hand side) and the corresponding \(\tilde {\sigma }(\cdot)\) (on the right-hand side) gives equality between problems (15) and (25).
Once the signal \(\tilde {\sigma }^{*}(\cdot)\) is fixed, finding the optimal command u(t) is a classic LQR problem. Theorem 2.1 form [42] gives the relation (29). □
2.7 Quality or quantity of measurements
Instead of considering a fixed number of measurements, an alternative problem is to reach a trade-off between many low-quality measurements and a small number of high-quality measurements. This problem can be handled in the presented formalism.
Allowing the choice of the quantity of measurements corresponds to making the number of measurements N a variable to optimize and no longer a given of the problem. However, with only this modification, the optimal solution will always be to take a maximum number of measurements, i.e., N=T. To model the compromise between measurement quality and quantity, an increase in the number of measurements N implies an increase in the noise of measurements, i.e., an increase in R.
In other words, the measurement covariance matrix R must depends on N. We assume R=f(N) for a given function \(f:\{1,\dots,T\}\rightarrow \mathcal {S}^{p\times p}_{+}\) where \(\mathcal {S}^{p\times p}_{+}\) is the set of p×p positive semi-definite matrices. Then, the problem can be formalized similarly to (15), it gives,
$$\begin{array}{*{20}l} \min_{1\leq N\leq T}\min_{\mathcal{M}\subset \{0,\dots,T-1\}}\frac{1}{T} \sum_{t=1}^{T}\text{Tr}[BP(t|t-1)B^{\top}] \ \text{subject to} \\ |\mathcal{M}| = N, \text{Eqs.~(6),\ (9) and (10),}\ P(0|-1)=\bar{P}_{0}\ \text{and}\ R=f(N), \end{array} $$
(30)
where, as previously, constraint (6) holds for \(t=0,\dots,T-1\) and constraints (9) and (10) hold for \(t=0,\dots,T\).
This problem is studied in Section 3.6.