A robust power allocation strategy based on benefit–cost ratio for multiple target guidance in the C-MIMO radar system under blanket jamming

How to utilize the limited power budget to accurately track more targets plays a critical role for the radar system in air defense applications, especially in the weapon guidance application. In this paper, we propose a robust power allocation (RPA) strategy in the collocated multiple-input and multiple-output (C-MIMO) radar system for multiple target guidance (MTG) under blanket jamming. The optimization model is established with the aim of increasing the number of effective tracking targets (ETTs) and improving the overall tracking accuracy among those targets. Since the mutual information (MI) quantifies the parameter estimation performance and can be predicted, the MI under blanket jamming is derived and utilized as the optimization criterion. We then propose a two-step optimization algorithm based on benefit–cost ratio (BCR) to solve the non-convex problem. Finally, numerical results are provided to demonstrate the effectiveness of the proposed algorithm.

when combat missions are urgent, in order to guide more weapons to attack enemy objects accurately, the guidance radar system usually adopts in full power mode. In this case, the aforesaid PA strategies cannot be directly applied. In addition, for the guidance radar, the weapon launching conditions and enough guidance accuracy should be both satisfied [15]. Hence, the number of the effective tracking targets (ETTs) and the corresponding target tracking accuracy are both important performance indicators. Herein, the ETT denotes the target which satisfies the given tracking accuracy. However, to the best of our knowledge, the robust power allocation (RPA) strategy considering the above two optimization objectives is very limited. Moreover, most of the existing studies on PA in the MIMO radar system are carried out under ideal detection condition, which is not common seen on the modern battlefields. Since the posterior Cramer-Rao lower bound (PCRLB) can provide a tight lower bound for any unbiased estimator [16][17][18], the PCRLB is usually used as the optimization metric in the resource allocation problems for MTT [5][6][7][8][9][10][11][12][13]. However, under the condition of electromagnetic interference, the process of target parameter estimation can be often very complex, which may cause the PCRLB cannot accurately quantify tracking accuracy. The MI is an information theoretic criterion, it has been proved that the more MI between radar echo and the target impulse response implies better capability of radar to estimate the target parameters [19]. In this case, the MI has been used in [20][21][22] for power allocation among the MIMO radars and jammers.
In this paper, the RPA strategy in the C-MIMO radar for multiple target guidance (MTG) application is studied under the blanket jamming environment. The MI between the reflected target signal and the path gain matrix is adopted to be the performance criterion for target parameter estimation. Based on the predicted MI, the RPA model is established to consider both the number of ETTs and their corresponding tracking accuracy, which formulated as a non-convex problem. In order to solve this problem, a twostep optimization algorithm based on benefit-cost ratio (BCR) [23] is proposed.
The rest of this paper is organized as follows: The system model of the C-MIMO radar along with the target motion model is demonstrated in Sect. 2. Section 3 derives the MI in terms of the PA and establishes a cognitive tracking scheme. In Sect. 4, the optimization model is formulated and an efficient solving algorithm is proposed. The numerical results and analysis are given in Sect. 5. Section 6 concludes this paper.

System model
Consider that a narrowband C-MIMO radar is located at (x 0 , y 0 ). In order to track Q point-like enemy objects which carry with active oppressive jammers, a set of orthogonal and coherent pulse train signals are transmitted. In order to simplify the model analysis, we make the following assuptions: (1) The number and initial position of the tracked targets are known in advance as priori knwoledge; (2) Each target carries a self-defense jammer that continuously transmits jamming signal to the radar, which is modeled as Gaussian white noise; (3) The C-MIMO radar works in the SM pattern, and simultaneously transmits multiple orthogonal beams to track targets.

Radar signal model
Consider that the transmit signal for the qth target at the kth sample interval is normalized as s k,q (t) , and the number of transmit pulse trains in one measurement is L , thus the lth pulse is [5] where T p is the pulse repetition period, and t ′ is the slow time. Moreover, the lth pulse in the received baseband signal is given by where δ k,q denotes the attenuation of signal strength due to the path loss. h k,q,l = h R k,q,l + h I k,q,l is the radar cross-section (RCS), modeled as a zero-mean white complex noise with variance σ 2 k,q , denoted as h k,q,l ∼ CN (0, σ 2 k,q ) . P k,q is the transient transmit power, the terms of τ k,q and f d k,q are the time delay and the Doppler frequency, respectively. n k,q,l (t) represents the inherent environmental noise, modeled as n k,q,l (t) ∼ CN (0, α 2 k,q ) , and j k,q,l (t) denotes the oppressive jamming noise imposed by the jammer, distributed as j k,q,l (t) ∼ CN (0, β 2 k,q ). Aim at improving the echo signal-noise ratio (SNR), the coherent pulse accumulation technique is applied to the echo signal processing. Thus, the sampled signals of r k,q,l (t) are where ŝ k,q,l ∈ C M×1 denotes the sampling of s k,q,l (t) , and M indicates the sampling length. n k,q,l ∈ C M×1 represents the sampling of n k,q,l (t) , ĵ k,q,l ∈ C M×1 is the sampling of j k,q,l (t) , and g k,q,l dentoes the path gain coefficient, distributed as g k,q,l ∼ N (0, γ 2 k,q ) , with [7] where ∝ is the proportional notation, and R k,q is the distance from the qth target to the radar center at sample interval k.
Suppose that the C-MIMO radar continuously transmits N ≤ L pulses to the qth target at the kth sample interval, which means the number of coherent accumulation pulses in a detection is N. Then, the relevant echo signal can be denoted as where (1) s k,q,l (t) = s k,q t ′ + (l − 1)T p (2) r k,q,l (t) = δ k,q h k,q,l P k,q s k,q t − (l − 1)T p − τ k,q e −j2π f d k,q t + n k,q,l (t) + j k,q,l (t) (3) r k,q,l =ŝ k,q,l g k,q,l +n k,q,l +ĵ k,q,l (4) γ 2 k,q ∝ P k,q σ 2 k,q /R 4 k,q (5) R k,q = r k,q,1 ,r k,q,2, ...,r k,q,N =Ŝ k,qĜk,q +N k,q +Ĵ k,q In addition, to simplify the signal model, we assume that the transmit pulse waveforms are exactly the same. Thus, we have ŝ k,q,1 =ŝ k,q,2 = · · · =ŝ k,q,N =ŝ k,q .

Target motion model
Without loss of generality, we assume that the target motion model can be described by the constant velocity (CV) model [6]. In this case, the qth target state is denoted by x k,q = [x k,q ,ẋ k,q , y k,q ,ẏ k,q ] T , where [x k,q , y k,q ] T and [ẋ k,q ,ẏ k,q ] T represent the position and velocity at the kth sample interval in Cartesian coordinates. The target state transition model can be expressed by [7] where F denotes the state transition matrix of the CV model. The term of w k,q represents an uncorrelated process noise sequence and is assumed to be a zero-mean Gaussian noise with the covariance matrix Q k,q . Herein, F and Q k,q are given by [8] and where T s is the sample interval, I 2 denotes the second-order identity matrix, ⊗ represents the Kronecker product operator, and m k,q is the relevant process noise intensity [8].

Measurement model
According to the receive signal model in (2) and (5), the conditional probability density function (PDF) p(R k,q |ξ k,q ) is given by where ξ k,q = [R k,q , f d k,q , θ k,q ] T . Moreover, by adopting the maximum likelihood (ML) estimate method [24], the ML estimate of ξ k,q can be calculated as Therefore, the target information can be extracted from the receive signal, e.g., the time-delay, the Doppler frequency and the bearing angle. The measurement model can be expressed as where h(x k,q ) = [R k,q ,Ṙ k,q , θ k,q ] T , R k,q , Ṙ k,q and θ k,q denote the range, radial velocity, and bearing angle of the qth target at the kth sample interval, respectively, i.e., given by . In addition, the elements in ℜ k,q are given by [9] where β k,q denotes the effective bandwidth, T k,q is the effective time width, and B w is the null-to-null beam width of receive antennas [25]. Moreover, it should be noted that all the elements in (14) are inversely linear with P k,q [12], and thus, the measurement covariance can be rewritten as ℜ k,q = P −1 k,q χ k,q . In this case, it is theoretically possible to achieve higher tracking accuracy by increasing the transmit power allocation for a certain target.

Multi-beam target tracking with PSCKF
In the tracking process, the C-MIMO radar works in multi-beam mode and each beam operates in the "focused transmit focused receive" (FTFR) manner [13]. Therefore, the MTT task can be divided into a series of single target tracking [7]. In view of the nonlinear Gaussian state space model given in Sects. 2.2 and 2.3, the parallel square-root cubature Kalman filter (PSCKF) is adopted in target state estimation.
From the Bayesian theory perspective, based on recursive method, the PDF for target tracking is obtained by utilizing the known initial state probability density and system measurement value. In this case, after the system state transition model and the measurement model are probabilized, the state estimation of the nonlinear discrete system model (7) and (12) in the Gaussian domain can be normalized as (11)  where P k+1|k+1,q is the state transition covariance matrix, P k+1|k,q is the one-step predictive covariance matrix, L k+1,q is the Kalman gain, P zz k+1,q is the measurement autocorrelation covariance matrix, and P xz k+1,q is the state-measurement cross-correlation covariance matrix. To compute all the parameters in (15), the recursive calculation of PSCKF is given below.
(1) Time Update Assume that the state transition covariance matrix P k|k,q is known, and the posterior density at the kth sample interval satisfies that p(x k,q |z k,q ) ∼ N (x k|k,q , P k|k,q ) . Then, we have.
Step 1.2: Calculate each cubature point and make one-step prediction.
where i = 1, 2, ..., m , m = 2n , and n is the dimension of the state vector. Herein, is given by Step 1.3: Estimate the square-root coefficient of the covariance matrix of state prediction value and state prediction error.
where Tria( · ) and Chol( · ) denote the QR decomposition operator and the Cholesky decomposition operator, respectively.
(2) Measurement Update Step 2.1: Estimate each cubature point and the square-root coefficient of the new covariance.
Step 2.2: Calculate the new information covariance matrix and the cross-covariance matrix of state and measurement.
Step 2.3: Update the filter gain and compute the posterior state estimation.
Step 2.4: Recersion. Update the square-root coefficient of state error covariance matrix Then, let k = k + 1 , and return to step 1.1.

MI derivation under blanket jamming
Assume that the transmit signal of ŝ k,q is known, thus the pulse matrix Ŝ k,q can be obtained. Then, the MI between R k,q and Ĝ k,q can be expressed as [26] where I(R k,q ;Ĝ k,q |Ŝ k,q ) denotes the MI for the qth target. The term of H(·) represents the differential entropy operator, which satisfies that, i.e., H (x) = − p(x) log p(x)dx and H (x y ) = − p(x y ) log p(x y )dxdy . Herein, p(x) and p(x y ) are the PDF of x and the conditional PDF of x with respect to y , respectively. In this case, p(R k,q |Ŝ k,q ) and p(N k,q +Ĵ k,q ) are given by (20)  and Thus, H (R k,q |Ŝ k,q ) and H (N k,q +Ĵ k,q ) can be calculated as and Accordingly, combined with (24), (27) and (28), we have It can be seen from (4) and (29) that P k,q is the only variable of MI k,q under the condition that the electromagnetic environment is regular and the pulse number is constant.

Cognitive tracking scheme based on transmit power
After obtaining the measurement information of all the tracked targets, the powerlimited C-MIMO radar can calculate the power allocation results at the next tracking period according to the above steps. Hence, by feeding back the power allocation results to the C-MIMO radar transmitter, a close-loop feedback scheme for power allocation is established. The established cognitive tracking scheme is shown in Fig. 1.

Optimization model establishment and solution
In this section, we formulate the optimization problem of the RPA based on the MI. In order to solve the optimization problem, a two-step optimization scheme based on the idea of BCR is proposed.

Problem formulation
Since the MI between the echo signals and path gain matrix is closely related to the accuracy of parameter estimation [27,28] and has an inverse relationship with mean square error (MSE) [29], we adopt the MI as the criterion of target tracking accuracy. Moreover, in order to establish a closed-loop tracking recursive cycle, the predicted MI should be calculated to guide PA. To be specific, by adopting the PSCKF, the target state estimation x k−1,q can be obtained. Then, the predicted MI is obtained by combining with the predicted target state x k|k−1,q = Fx k−1,q . We consider finding the suboptimal RPA which combines with the effective tracking quantity and target tracking accuracy. Hence, the problem is formulated as where MI 0 is the predetermined threshold of MI when each target is effectively tracked, P total denotes the predefined total power budget, and the transient power bound of [P min , P max ] is set to keep the transmitter stay in an endurable interval.

Two-step optimization algorithm based on BCR
Since the existence of the binary variable u k,q , (30) is a non-convex optimization problem. In order to solve (20), we propose a two-step optimization algorithm based on BCR.
Step 1: Determine the targets in order of BCR when meet the ETT condition. Firstly, calculating the threshold power P min k,q , which satisfies that MI k,q (P min k,q ) = MI 0 . Then, by introducing the idea of BCR [23], the ratio of virtual power and the relative MI of each target is sorted in descending order. Thus, we have where IX k is the permutation vector of the BCR in terms of power from all the moving targets. Herein, E k,q denotes the BCR in power, which is expressed as where P set is a preset constant. Finally, calculating the maximum N k , which satisfies that where cel k records the indices of the targets in IX k .
Step 2: Maximize the MI for the given ETT information. After obtaining the quantity N k and the indices of all the effective tracking targets cel k , combining with the threshold power P min k,q , (30) can be converted into where P reset k,q denotes the allocation of the remaining transmit power to the qth target at sample interval k. Technically, (34) can be easily solved by the particle swarm optimization (PSO) [30], and the final solution set is the suboptimal power set that can satisfy the robust tracking of each target.

Parameter settings
In this section, numerous results are presented to demonstrate the effectiveness of the proposed RPA strategy. Consider that a C-MIMO radar is located at the origin point, and Q = 8 targets follow the CV model and are widely separated, whose initial motion parameters are shown in Table 1. Moreover, the joint configuration of the C-MIMO radar and the tracked targets is demonstrated in Fig. 2. The number of coherent accumulation pulses in one illumination is set as L = 200 , and the pulse repetition period is T p = 1 ms . The lower and upper bounds of power constraints for the tracked targets are P min = 0.05P total and P max = 0.8P total , respectively. In addition, the targets not assigned for tracking are monitored by radar, with allocated power P mon = 0.01P total . The preset power compensation is P min = 0.1P total , and the threshold of MI is set as MI 0 = 0.6 nats . It is assumed that the RCS of all targets is the Swerling I model [31] with mean being 1, and remains constant over the measurement interval. Suppose that all targets carry with (32) E k,q = MI k,q P min k,q + P set P min k,q + P set   In this scenario, the interference signal intensity of each jammer is assumed to be constant and the corresponding jam-signal ratio (JSR) is set as 12 dB. Hence, since the targets SNRs are only related to the radar-target distances and the transmit power results [32,33], the distance factor becomes the major contributor in the PA problem.

1) Scenario 1: Effect of Distance
In order to demonstrate the effectiveness of the proposed algorithm, two benchmarks are used as comparison: 1) Uniform allocation; 2) Sum-max optimize allocation. In the uniform allocation scheme, all power resources are evenly distributed to Q = 8 targets. As for the sum-max optimize allocation scheme, the optimization objective is to maximize the sum of MIs of all targets, thus its optimization model can be expressed as Moreover, (36) is solved by the PSO algorithm. Figure 3 demonstrates the effective tracking quantity comparison among the three resource allocation strategy. In general, the proposed RPA strategy performs best among all the adopted methods obviously. In addition, although the effective tracking quantity obtained by the summax optimize allocation scheme is less than that obtained by the average allocation method in the initial period of time, this situation changes with the improvement of the overall target tracking accuracy. Figure 4 further demonstrates the average PA results obtained by the proposed RPA strategy in scenario 1. Herein, the grid colors denote the ratio of allocated power towards different moving targets, which is given by  where P j k,q denotes the PA results in the jth trail. Moreover, the pinky white color represents that the ratio is zero, which means the corresponding target will not be tracked at this tracking interval. It should be noted that the target far from the radar is easily unable to meet the requirements of effective tracking under the given power resource budget due to the larger tracking error, so that the corresponding allocated power is less. Moreover, when the tracking error meets the effective tracking condition, the farther target from the radar is tend to allocated more power resource. Figure 5 shows the targets that meet the effective tracking condition at each frame for the three PA schemes. Obviously, target 4 and target 5 can be tracked more easily by radar due to their closer proximity. In addition, the proposed RPA strategy shows good robustness as the tracking time increases.

2) Scenario 2: Effect of Interference Intensity
In this senario, we will further study the influence of interference intensity on PA results. Therefore, we consider a time-varying JSR model. In the model, the JSR levels of target target q (q = 2, 3, 4, 5) are remain at 12 dB, which is consistent with that of scenario 1. In addition, it is assumed that the JSR levels of the rest of moving targets are time-varying, as shown in Fig. 6. As such, in addition to the distance factor, the interference intensity factor is also added to affect PA results.
The effective tracking quantity performances among the three PA schemes in scenario 2 are compared and demonstrated in Fig. 7. Obviously, it can be seen from Fig. 7 that the proposed RPA strategy still performs best among the three algorithms. Due to the stronger inteference intensity, the effective tracking quantity is smaller in scenario 2 than in scenario 1. Moreover, in the long run, the sum-max optimize allocation strategy performs better than the uniform allocation method in increasing tracking performance in scenario 2. Figures 8 and 9 show the average PA results obtained by the proposed RPA strategy and the indices of effective tracking target for the three PA strategies, respectively. Compared with scenario 1, it can be noted that less power resources are allocated to targets (target 1 and target 8) with stronger interference intensity. In addition, due to the dual influence of longer radial distance and higher interference intensity, target 1 is abandoned due to the limitation of system power budget among the tracking process.

Conclusions
In this paper, we proposed an RPA strategy in the C-MIMO radar system for MTG application under blanket jamming environment. Based on deriving and calculating the predicted MI and then setting the predetermined threshold of the MI, we formulated the RPA strategy as a non-convex optimization problem. In order to tackle the difficulty in solving this problem, a two-step optimization algorithm based on the BCR is utilized in the solving process. Numerical results showed that the distance from target to radar and the interference intensity have impact on the PA results. Additionally, in the proposed RPA strategy, those targets with low tracking accuracy are tend to be abandoned in exchange for higher tracking accuracy of other targets.