In this section, we describe the improved sensor selection methods. We formulate the sensor selection problem as a POMDP framework in conjunction with the CRLB of the target localization mean error for tackling the problem that the observers (e.g., sensor nodes) cannot reliably identify the underlying actual target states. Our method extends the POMDP framework by integrating the T-FoT approach to address the unknown target dynamic model.
3.1 POMDP framework based on T-FoT
The core idea of the POMDP is choosing the optimal selection command via minimizing the cost function or maximizing the reward function. At the time step k, the POMDP can be defined as
$$\begin{aligned} \psi = \{S,F(\cdot ;{C_{k}}),{Z^s},g(\cdot |{X_k},s),{\mu }(s;\cdot )\} \end{aligned}$$
(8)
where S is a finite set of the sensor selection commands, \({Z^s}\) is a finite set of the observations under the commands set S, \(g(\cdot |{X_k},s)\) is the measurement model conditioned on the command \(s\in S\) and the target state, \(F(\cdot ;{C_{k}})\) is the estimated T-FoT at time k, \({\mu }(s;\cdot )\) is the objective function by executing an action command \(s\in S\).
In the core of our POMDP framework, the objective function \({\mu }(s;\cdot )\) is defined as the CRLB \(u_\text {lb}(s_{k};{\hat{X}}_{k+1})\) of the pseudo-localization error of the target conditioned on the measurements from the activated sensors, which in turn depends on the selection command s (see Sect. 3.2). Here, the estimated/predicted state \(\hat{X}_{k+1} = F(k+1;{C_{k}})\) is obtained from the estimated T-FoT [19] rather than by a Markov-jump model (see Sect. 3.3) which is indispensable prior information in traditional methods. This leads to the key difference of our approach with existing POMDP approaches [6, 16].
Typically, the sensor selection needs to meet a specific constraint. In this paper, we consider two practical constraints, i.e., the number of sensors to be selected is deterministic, or the sensors selected correspond to a deterministic CRLB with the minimum number of sensors. For these two cases, the optimal selection command is given by (9) and (10), respectively.
$$\begin{aligned} &s_{k}^{*} = \mathop {\arg \min }\limits_{{s_{k} \in S_{k} }} u_{{{\text{lb}}}} (s_{k} ;\hat{X}_{{k + 1}} ) \\& {\text{s}}.{\text{t}}.\left| {s_{k}^{*} } \right| = n_{s} \\ \end{aligned}$$
(9)
where \(S_k \subseteq S\) denotes the candidate sensor set at time k, \(|{s_{k}^*}|\) denotes the number of selected sensors, \(n_s\) is the specified number of sensors to be selected.
$$\begin{aligned} &s_{k}^{*} = \mathop {\arg \min }\limits_{{s_{k} \in S_{k} }} \left| {s_{k} } \right| \hfill \\ &{\text{s}}.{\text{t}}.u_{{{\text{lb}}}} (s_{k} ;\hat{X}_{{k + 1}} ) \le T_{{{\text{lb}}}} \hfill \\ \end{aligned}$$
(10)
where \(T_\text {lb}\) is the required CRLB such that the selected sensors can meet.
3.2 CRLB with regard to DOA
The CRLB provides the lower bound of the variance of unbiased estimators of a deterministic parameter under specific measurement conditions, which can be used to evaluate the detection capability of different sensor node subsets.
For an unbiased estimator \({\hat{X}}_{k}(Z_{k})\) of a parameter vector \(X_{k}\) based on the measurement vector \(Z_{k}\), the CRLB for the error covariance matrix is defined to be the inverse of the Fisher Information Matrix (FIM), denoted by J, as follows
$$\begin{aligned} E\{ [{\hat{X}}_{k}(Z_{k}) - X_{k}]{[{\hat{X}}_{k}(Z_{k}) - X_{k}]^\text {T}}\} \ge {J_{k}^{ - 1}} \triangleq u_\text {lb}({\hat{X}}_{k}) \end{aligned}$$
(11)
where E denotes the mean value of the content and the inequality (11) means that the difference \(u_\text {lb}({\hat{X}}_{k}) - {J_{k}^{ - 1}}\) is positive semi-definite.
Now, consider the predicted target state \(\hat{X}_{k} = F(k;C_{k-1})\) obtained from the estimated T-FoT, a n-sensor extension of the DOA measurement function as in Eq. (1) is
$$\begin{aligned} \hat{Z}_{k} = H(\hat{X}_{k}) + V_{k} \end{aligned}$$
(12)
where
$$\begin{aligned} H(\hat{X}_{k}) = \left[ \begin{array}{ccc} {\tan ^{-1}}\left( \frac{{\hat{y}}_k-y_1}{{\hat{x}}_k-x_1} \right) \\ {\tan ^{-1}}\left( \frac{{\hat{y}}_k-y_2}{{\hat{x}}_k-x_2} \right) \\ \vdots \\ {\tan ^{-1}}\left( \frac{{\hat{y}}_k-y_n}{{\hat{x}}_k-x_n} \right) \\ \end{array} \right] \triangleq {\varvec{\Theta }}. \end{aligned}$$
(13)
Under the premise that \(z_{k}^{1}, \cdots , z_{k}^{n}\) are conditionally independent of each other, the PDF of the collected measurements \(Z_{k} = \left[ z_{k}^{i}\right] _{i=1}^{n} \sim {\mathcal {N}}(\theta ,R_{k})\) can be expressed as
$$\begin{aligned} p(Z_{k}) = \frac{1}{{{{(2\pi )}^{n/2}}{{\left| R_{k} \right| }^{n/2}}}}\exp \left[ { - \frac{{{{(Z_{k} - \theta )}^\text {T}}{R_{k}^{ - 1}}(Z_{k} -\theta )}}{2}} \right] \end{aligned}$$
(14)
where \(\theta\) is the mean value of measurements \({\varvec{\Theta }}\) and \(R_{k} = \mathrm {diag}(R_{k}^{1},R_{k}^{2},\dots ,R _{k}^{n})\).
Then, compute the second-order derivatives of the logarithm of the measurement PDF with respect to \({\hat{X}}_{k}\)
$$\begin{aligned} J({\hat{X}}_{k}) \triangleq E\left\{ {\frac{{\partial ^{2} \log p(Z_{k})}}{{\partial {\hat{X}}_{k}}{\partial {\hat{X}}_{k}}^\text {T}}} \right\}. \end{aligned}$$
(15)
Substitute Eqs. (14) to (15), the FIM based on DOA measurements \(J({\hat{X}}_{k})\) can be shown as follows
$$\begin{aligned} J({\hat{X}}_{k}) = \bigg [\frac{\partial H({\hat{X}}_{k})}{\partial {\hat{X}}_{k}} \bigg ]^\text {T} R_k^{-1} \bigg [\frac{\partial H({\hat{X}}_{k})}{\partial {\hat{X}}_{k}} \bigg ]. \end{aligned}$$
(16)
Expand the \(H({\hat{X}}_{k})\) and take the first-order partial derivative of \(\hat{X}_{k}\)
$$\begin{aligned} \bigg [\frac{\partial H({\hat{X}}_{k})}{\partial \hat{X}_k} \bigg ] = \left[ \begin{array}{ccc} -\frac{{\hat{y}}_k-y_1^{s_k}}{d_1^2} &{} \frac{{\hat{x}}_k-x_1^{s_k}}{d_1^2} \\ -\frac{{\hat{y}}_k-y_2^{s_k}}{d_2^2} &{} \frac{{\hat{x}}_k-x_2^{s_k}}{d_2^2} \\ \vdots &{} \vdots \\ -\frac{{\hat{y}}_k-y_n^{s_k}}{d_n^2} &{} \frac{{\hat{x}}_k-x_n^{s_k}}{d_n^2} \end{array} \right] \end{aligned}$$
(17)
where \((x_i^{s_k},y_i^{s_k})\) are the position coordinates of sensor i in the sensor set selected by command \(s_k\), \(d_i = \sqrt{{{({\hat{x}}_k - { x_i^{s_k}})}^2} + {{(y_k - {y_i^{s_k}})}^2}}\) is the distance between the sensor and target. Thus, \(J({\hat{X}}_{k})\) can be computed as
$$\begin{aligned} \begin{aligned} J({\hat{X}}_k) = \left[ \begin{array}{ccc} \sum \limits _{i=1}^{n}\frac{({\hat{x}}_k-x_i^{s_k})^2}{R_k^i d_i^4} &{} \sum \limits _{i=1}^{n}\frac{-({\hat{x}}_k-x_i^{s_k})({\hat{y}}_k-y_i^{s_k})}{R_k^i d_i^4} \\ \sum \limits _{i=1}^{n}\frac{-({\hat{y}}_k-y_i^{s_k})({\hat{x}}_k-x_i^{s_k})}{R_k^i d_i^4} &{} \sum \limits _{i=1}^{n}\frac{({\hat{y}}_k-y_i^{s_k})^2}{R_k^i d_i^4} \end{array} \right] \triangleq \left[ \begin{array}{ccc} J_{xx} &{} J_{xy}\\ J_{yx} &{} J_{yy} \end{array} \right]. \end{aligned} \end{aligned}$$
(18)
Finally, the CRLB is given as
$$\begin{aligned} u_\text {lb}({\hat{X}}_{k}) = \frac{J_{yy}+J_{xx}}{J_{xx}J_{yy}-J_{xy}J_{yx}}. \end{aligned}$$
(19)
3.3 T-FoT for tracking and prediction
As we mentioned before, the T-FoT fits the time series measurements in a sliding time-window up to the current time k denoted as \([{k'},k] \triangleq \{k', k'+1, ..., k\}\), where \({k'} = \max {(1,k - T)}\), T is the length of the time-window. Disregarding false and missing data issues temporally here, the parameter of T-FoT at time k can be estimated in the LS sense
$$\begin{aligned} {{\hat{C}}_k} = \mathop {\arg \min }\limits _{C} \sum \limits _{t = k'}^k {\left\| {X_t} - {F_k(t;C)} \right\| _{{\sum }_{e_{t}}^{-1}} ^2}. \end{aligned}$$
(20)
\({X_t}\) denotes the position of the target at time t, where the Mahalanobis distance is used, i.e.,
$$\begin{aligned} \left\| {{X_t} - {{{\hat{X}}}_t}} \right\| _{{\sum }_{e_{t}}^{-1}} ^2 = {({X_t} - {{\hat{X}}_t})^\text {T}}{{\sum }_{e_{t}}^{-1}}({X_t} - {{\hat{X}}_t}) \end{aligned}$$
(21)
where \({{\hat{X}}_t}\) denotes the estimates of the target state at time t and the fitting error is given as \({e_t}={X_t}-{\hat{X}_t}\) and \(\sum _{e_t}\) is the covariance of the fitting error, c.f., (5).
3.4 Algorithm summary
In summary, the proposed sensor selection algorithm can be summarized as Algorithm 1. Based on the POMDP framework, the interaction of the tracked target with the sensor selection strategy can be described as the following three steps (see also Fig. 2):
-
1
At any time step k, the estimated T-FoT has parameters \(C_k\) which can be used to predict the target state \(\hat{X}_{k+1} = F(k+1;{C_{k}})\) for time \(k+1\).
-
2
The sensor network perceives pseudo-observations \(\hat{Z}_{k+1}\) through the known stochastic observation model \(g(\cdot |{X_{k}},s_{k})\) and the predicted target state \(\hat{X}_{k+1}\). This will result in the expression of objective function \(u_\text {lb}(s_{k};{\hat{X}}_{k+1})\).
-
3
Find the optimal selection command \(s_k^*\) from the candidate set \(S_k \subseteq S\) by optimizing the objective function \(u_\text {lb}(s_{k};{\hat{X}}_{k+1})\) with respect to the potential constraints. Here, the candidate set can be defined as the subset of all sensors that lie within a limited distance to the target.