### 3.1 Compressive sensing

Compressive sensing has shown its strength in reconstructing sparse signals using far fewer samples than required by the Nyquist sampling theorem [16]. It requires a transform domain that provides a sparse representation of the observed signal. Its sensing structure, i.e. the measurement transform matrix, has to satisfy the *restricted isometry property* [17]. The transform domain with the low-dimensional representation is called sparse domain. A signal having \(\textit{k}\) nonzero coefficients in the sparse domain is called \(\textit{k}\)-sparse. Generally, the sparsity of the signal \(\mathbf {y}\) is measured by the \(l_{0}\) pseudo norm of its representation vector \(\mathbf{x}\) where the \(l_{0}\) pseudo norm denotes the cardinality of the support of \(\mathbf{x}\) [18]:

$$\begin{aligned} \left\| \mathbf{x} \right\| _{0} = \text {card}\{\text {supp}(\mathbf{x} )\} = \textit{k}. \end{aligned}$$

(8)

The representation of the beat signal in most automotive radar application scenarios is sparse in the RD spectrum. Hence, the interference-pruned discrete beat signal can be seen as the beat signal with a reduced sampling rate and its sparse representation can be restored by the CS algorithm. The 1D signal model in vector form can be easily connected to the CS framework. The measurement transform matrix can be written as \({\tilde{\mathbf{W }}}={\tilde{\mathbf{W }}_\mathbf{M }}\otimes {\tilde{\mathbf{W }}_\mathbf{N }}\in \mathbb{C}^{\textit{MN}\times \textit{MN}}\).

The dimensions of the inverse DFT (IDFT) matrices \({\tilde{\mathbf{W }}_\mathbf{N }}\) and \({\tilde{\mathbf{W }}_\mathbf{M }}\) are \(N\times N\) and \(M\times M\) (see Appendix A), respectively.

Assume that the number of all undisturbed samples across all chirps in (5) is *q*. Thus, the resulting beat signal vector is given by \({\tilde{\mathbf{y }}} = ({y}_{{i}_{0}},...,{y}_{{i}_{{q-1}}})^{T}\) with \(0<q<MN\) and \(\{i_0,...,i_{q-1}\}\subset \{0,...,MN -1 \}\), and \({\tilde{\varvec{\Psi }}} = \left( {\varvec{\psi }}_{i_0}^{{T}},...,{\varvec{\psi }}_{i_{q-1}}^{{T}}\right) ^{{T}}\) where \(\varvec{\psi }_{i_{q-1}}\) denotes the \(i_q\)-th row vector in matrix \({\tilde{\mathbf{W }}}\). The radar interference mitigation problem can be rephrased as the reconstruction of the sparse vector \(\mathbf{x}\) from the noisy measurement \({\tilde{\mathbf{y }}}={\tilde{\varvec{\Psi }}}{} \mathbf{x}\). This problem is equivalent to solving an underdetermined set of linear equations. An illustration of how to utilize the 1D CS framework for the interference mitigation of the automotive radar signal is presented in Fig. 2.

Under the condition that \(\mathbf{x}\) is sparse, the problem can be reduced to a minimization problem:

$$\begin{aligned} \text {P}_{0}: \hat{\mathbf{x }}=\mathop {{{\,\mathrm{argmin}\,}}}\limits _\mathbf{x \in \mathbb{C}^{MN}} \left\{ \frac{1}{2} ||{\tilde{\mathbf{y }}}-{\tilde{\varvec{\Psi }}} \mathbf{x} ||_2^2 + \nu ||\mathbf{x} ||_0 \right\} , \end{aligned}$$

(9)

with the Lagrange multiplier \(\nu\) [18]. However, due to the discrete and discontinuous nature of the \(l_{0}\) pseudo norm, the \(l_{0}\)-minimization is NP-hard in general [18]. Thus, \(\text {P}_{0}\) is computationally intractable. The \(l_{1}\)-minimization or basis pursuit [19] can be interpreted as the convex relaxation of the \(l_{0}\)-minimization, and the \(l_{1}\)-minimization is as follows

$$\begin{aligned} \text {P}_{1}: \hat{\mathbf{x }}=\mathop {{{\,\mathrm{argmin}\,}}}\limits _\mathbf{x \in \mathbb{C}^{MN}} \left\{ \frac{1}{2} ||{\tilde{\mathbf{y }}}-{\tilde{\varvec{\Psi }}} \mathbf{x} ||_2^2 + \nu ||\mathbf{x} ||_1 \right\} . \end{aligned}$$

(10)

Since \(\text {P}_{1}\) is convex, efficient solvers can be used, such as iterative shrinkage-thresholding pursuit [20]. Alternative reconstruction algorithms include greedy-type methods such as OMP [21, 22], as well as thresholding-based methods [23,24,25] and the AMP algorithm [26]. These algorithms can be easily integrated in the framework shown in Fig. 2. However, the efficiency of this framework is limited as it vectorizes the 2D signal measurement in automotive radar system to a 1D vector of dimension *MN*.

### 3.2 Restricted isometry property

In order to successfully recover a good estimate of signal \(\mathbf{x}\), the measurement transform matrix \({\tilde{\varvec{\Psi }}}\) in (10) should satisfy the *restricted isometry property* [17].

The selection of the measurement transform matrix has been analyzed in [18, 27]. It is shown that a random partial Fourier matrix satisfies a near-optimal RIP with high probability [27, 28]. In this work, the theoretical analysis on RIP of the measurement transform matrix \({\tilde{\varvec{\Psi }}}\) is further conducted in Lemma 1 and Theorem 1 given in Appendix B. Theorem 1 shows that \({\tilde{\varvec{\Psi }}}\) satisfies the RIP with high probability.

### 3.3 Structure of 1D algorithms for RD spectrum recovery

In this subsection we introduce the novel prior-model-based iterative sparsity-promoting algorithms employed in conjunction with the 1D formulation of our interference mitigation approach.

#### 3.3.1 Prior-model-based iterative thresholding

When it comes to solving large-scale systems of linear equations often iterative gradient descent methods are employed rather than the Gaussian elimination [29]. The basic form of the update step of such an iterative algorithm for solving underdetermined systems like (9) and (10) is given by

$$\begin{aligned} \begin{aligned} \mathbf{r} _t&= {\tilde{\mathbf{y }}}- {\tilde{\varvec{\Psi }}}{} \mathbf{x} _t, \\ \mathbf{x} _{t+1}&= \mathcal {T}_{\lambda }(\mathbf{x} _t+\vartheta {\tilde{\varvec{\Psi }}}^T \mathbf{r} _t) . \end{aligned} \end{aligned}$$

(11)

Here \({\tilde{\varvec{\Psi }}}^T \mathbf{r} _t ={\tilde{\varvec{\Psi }}}^T({\tilde{\mathbf{y }}}- {\tilde{\varvec{\Psi }}} \mathbf{x} _t)\) represents the gradient of the approximation error (residual \(\mathbf{r} _t\)) in the *t*-th iteration of the algorithm. \(\mathcal {T}_{\lambda }(\cdot )\) denotes a nonlinear function that promotes the sparsity of the solution. The parameter \(0<\vartheta \le 1\) influences the convergence speed. Hard thresholding and soft thresholding have been shown to be two possibilities for the nonlinear function \(\mathcal {T}_{\lambda }(\cdot )\), where \(\lambda\) denotes the threshold value.

For the algorithms to run as efficiently as possible, the choice of the threshold value \(\lambda\) is crucial. It can remain constant for all iterations, decrease by a fixed multiplicative factor in each iteration, or be adaptively adjusted to the signal properties in each iteration. An adaptive adjustment can be derived by considering the gradient term \({\tilde{\varvec{\Psi }}}^T\mathbf {r}_t\). Assuming that the values in this term correspond to a Gaussian distribution with zero mean and standard deviation \({{\,\mathrm{std}\,}}({\tilde{\varvec{\Psi }}}^T \mathbf{r} _t)\), the threshold is given by

$$\begin{aligned} \varvec{\lambda }_t=\beta \cdot {{\,\mathrm{std}\,}}({\tilde{\varvec{\Psi }}}^T \mathbf{r} _t), \end{aligned}$$

(12)

where \(\beta\) is the threshold control parameter, typically in the range \(2<\beta <4\) [30]. This effectively reduces the noise in the representation by assuming that small signal values are part of the noise.

Since the typical measurement cycle of an automotive radar sensor is around 50 milliseconds [12], the observed movements of the targets in the RD spectra of successive measurement cycles are rather small in most application scenarios. Therefore, the prior information about the positions of the targets can be utilized to expedite the update steps of the iterative thresholding algorithms described in (11).

Instead of using the standard IST [16, 31] and IHT [32], we incorporate prior information into the thresholding process for solving (9) and (10). The prior-model-based soft thresholding function is defined as:

$$\begin{aligned} \begin{aligned} \mathcal {S}_{\lambda }(\theta _i)&={{\,\mathrm{sign}\,}}(\theta _i)\max (|\theta _i|-(1-\zeta (p_{i}))\cdot \lambda ,0), \\&\quad \text {for}~ i=0,...,MN-1, \end{aligned} \end{aligned}$$

(13)

where *i* denotes the index of the elements of vector \(\varvec{\theta }\). The prior probability \(p_{i}\) ensures that if a sparse maximum is likely at the *i*-th position, the threshold \(\lambda\) is scaled down accordingly by \((1-\zeta (p_{i}))\). This helps to reduce the number of iterations for searching an optimal estimate of \(\mathbf{x}\) and facilitates the detection of local maxima, thereby reducing the reconstruction error. The mapping function

$$\begin{aligned} \zeta :\mathbb{R}\rightarrow \mathbb{R},~\zeta (p_{i})=\frac{a\cdot p_{i}+b}{e} \end{aligned}$$

(14)

is introduced for regulating the prior model, where \(a,b,e\in \mathbb{R}\) are the control parameters.

Similarly, the prior-model-based hard thresholding function is defined as:

$$\begin{aligned} \begin{aligned} \mathcal {H}_\lambda (\theta _i)={\left\{ \begin{array}{ll} \theta _i, ~~~&{} |\theta _i| \ge (1-\zeta (p_{i}))\cdot \lambda \\ 0, &{} |\theta _i| <(1-\zeta (p_{i}))\cdot \lambda \\ \end{array}\right. }, \\ \text {for}~ i=0,...,MN-1, \end{aligned} \end{aligned}$$

(15)

where *i* denotes the index of the elements of vector \(\varvec{\theta }\).

#### 3.3.2 Determination of the prior probability

The prior of the presence of the target at the \(\textit{i}\)-th position of \(\mathbf{x}\) is assumed to follow a normal distribution whose expected value \(\mu _{\textit{i}}\) and variance \(\sigma ^2_{\textit{i}}\) are equal to the empirical mean and variance of the peak values at this position in the latest *Q*-measurements:

$$\begin{aligned} p(x_{\textit{i}}) \sim \mathcal {N}(\mu _{\textit{i}},\sigma ^2_{\textit{i}}), \textit{i} \in \xi , \end{aligned}$$

(16)

where \(\xi = \xi _0 \cup \xi _1... \cup \xi _{Q-1}\) represents the set of positions of target peaks detected by a cell averaging constant false alarm rate (CA-CFAR) algorithm in the (original or recovered) RD spectra^{Footnote 1} of the latest *Q* measurements. \(\xi _0\) and \(\xi _{Q-1}\) denote the sets of detected positions of target peaks in the RD spectrum of the current measurement \(\mathbf{x} _{\eta }\) and the measurement at time \(\eta -Q+1\), respectively. The prior probability for the presence of the target at the *i*-th position (\(\textit{i} \in \xi\)) of the next measurement \(\mathbf{x} _{\eta +1}\) is determined by \(x_{\textit{i}}\) in \(\mathbf{x} _{\eta }\). Because the presence of target peaks at other positions (\(\textit{i} \notin \xi\)) was not observed in the latest Q-measurements, the prior probability of the these positions is initially set to zero. Then, a prior probability matrix \({\tilde{\mathbf{P }}}\in \mathbb{R}^{\textit{N}\times \textit{M}}\) can be constructed. However, since the targets may move slightly in the next measurement cycle, the prior probability should optimally be propagated from the historical target positions to the neighboring positions surrounding them. The new prior probability matrix is then recalculated as \(\mathbf{P} = \mathbf{G} \circledast {\tilde{\mathbf{P }}}\), where \(\mathbf{G}\) represents a 2D window function and \(\circledast\) denotes the convolution operator. \(p_i\) is then the *i*-th element of \(\text {vec}(\mathbf{P} )\).

### 3.4 Integration of 2D masked residual updates

Since the multiplication of a vector with the Fourier matrix can utilize the fast Fourier transform (FFT), it can significantly improve the efficiency of recovery algorithms. The microcontrollers of most automotive radar sensors have an accelerator for FFT processing with reduced computational latency. However, the previously discussed radar interference mitigation framework that vectorizes the radar measurement to match the general CS framework cannot take advantage of this benefit. More precisely, recalling the framework illustrated in Fig. 2, by removing the interference-contaminated measurement samples in \(\mathbf{y}\), the corresponding rows of the measurement transform matrix \({\tilde{\mathbf{W }}}\) are pruned. Then, the remaining measurement signal \({\tilde{\mathbf{y }}}\) and the pruned measurement transform matrix \({\tilde{\varvec{\Psi }}}\) are used with different CS solvers to compute a sparse solution of \(\hat{\mathbf{x }}\). Therefore, the FFT operator cannot be directly incorporated. In order to utilize the computational advantage of the FFT, a 2D masked residual updates framework is proposed that can be easily incorporated into existing CS solvers.

Recalling the residual updates in (11), the choice of \({\tilde{\varvec{\Psi }}}\in \mathbb{C}^{q\times MN}\) and \(\mathbf{r} _{t}\in \mathbb{C}^q\) have a clear dependency on the *q*-rows of interference-free samples. As the positions of interference-contaminated samples can be random [33], the size of \({\tilde{\varvec{\Psi }}}\) can vary with interference scenarios. However, the residual updates always correspond to the remaining interference-free samples. Therefore, it is possible to control the residual updates using a mask and to keep the size of the measurement transform matrix fixed. In other words, by tracking the residual updates at exact positions of the interference-free samples in \(\mathbf{y}\) with a mask, the measurement transform matrix will always have the original size of \({\tilde{\mathbf{W }}}\in \mathbb{C}^{MN\times MN}\). The advantage of a fixed size of the measurement transform matrix is that the matrix-vector multiplications for Fourier transforms can be replaced by the FFT or inverse FFT (IFFT) operations on a 2D signal matrix.

After the detection of distorted samples, the position of interference-free samples in the *m*-th column of the discrete beat signal matrix \(\mathbf{Y}\) is stored in an index vector \(\mathbf{b} _m\in \{0,1\}^{N}\), the value one is used to indicate the position of the interference-free samples. Then, by grouping the individual index vectors \(\mathbf{b} _m\), a mask matrix \(\mathbf{B} =[\mathbf{b} _1\cdot \cdot \cdot \mathbf{b} _M]\in \{0,1\}^{N\times M}\) is created. The notation \(\mathbf{Y} [\mathbf{B} =1]\) then describes the selection of all elements from \(\mathbf{Y}\) whose places in \(\mathbf{B}\) have the value one. We use the notation \(\mathbf{Y} [\mathbf{B} =1]\xrightarrow {p\cdot M}{\tilde{\mathbf{Y }}}\) for drawing *p* elements from each of *M* columns in \(\mathbf{Y}\), whose index is equal to one in \(\mathbf{B}\), then storing their elements in \({\tilde{\mathbf{Y }}}\in \mathbb{C}^{p\times M}\). Since the interference may generate extended segments of disturbed data in various positions of different chirps, this operation guarantees the matrix form of the computation during the pursuit of the sparse solution and it also adds additional randomness to the measurement transform matrix. Correspondingly, we use the notation \(\mathbf{Z} [\mathbf{B} =1]\xleftarrow {p\cdot M}{\tilde{\mathbf{Y }}}\) for indicating the mapping of elements from each of *M* columns in \({\tilde{\mathbf{Y }}}\in \mathbb{C}^{p\times M}\) to the positions in a zero matrix \(\mathbf{Z} \in \mathbb{C}^{N\times M}\), whose indices in \(\mathbf{B}\) have the value one. The mask matrix \(\mathbf{B}\) is used for tracking the residual updates at the exact positions of interference-free samples. Consequently, the update step in (11) becomes

$$\begin{aligned} \begin{aligned} \mathbf{R} _t&= {\tilde{\mathbf{Y }}} - \left( \mathbf{Y} _{\rm{rec}} [\mathbf{B} =1]\xrightarrow {p\cdot M}{} \mathbf{Y} _t\right) ,\\ \mathbf{X} _{t+1}&= \mathcal {T}_{\lambda }\left( \mathbf{X} _t+\vartheta \cdot \mathbf{W} _\mathbf{N} \cdot \mathbf{R} _{\rm{rec}}\cdot \mathbf{W} _\mathbf{M} ^{\textit{T}}\right) , \end{aligned} \end{aligned}$$

(17)

where \(\mathbf{Y} _{\rm{rec}} = {\tilde{\mathbf{W }}_\mathbf{N} }\cdot \mathbf{X} _t\cdot {\tilde{\mathbf{W }}_\mathbf{M} }^{\textit{T}}\) and \(\mathbf{R} _{\rm{rec}}=\mathbf{Z} [\mathbf{B} =1]\xleftarrow {p\cdot M}{} \mathbf{R} _t.\) We refer to this residual updates process as 2D masked residual updates.

From Theorem 1 (see Appendix B), it is known that a random row sub-matrix of \({\tilde{\mathbf{W }}}\) satisfies a near-optimal RIP with high probability. For the analysis of the RIP condition of the 2D MRU framework, we consider the equation \(\mathbf{Y} _{\rm{rec}} = {\tilde{\mathbf{W }}_\mathbf{N} }\cdot \mathbf{X} _\textit{t}\cdot {\tilde{\mathbf{W }}_\mathbf{M} }^{\textit{T}}\) instead of the 1D formulation \(\mathbf{y} _{\rm{rec}} = {\tilde{\mathbf{W }}}\cdot \mathbf{x} _t\). The theoretical guarantee of successful RD spectrum recovery discussed in Theorem 1 also applies to 2D algorithms, where *q* is now substituted by \(p\cdot M\). More concretely, with the help of the mask matrix \(\mathbf{B}\), \(\mathbf{Y} _{t}\) obtains values from \(\mathbf{Y} _{\rm{rec}}\) only at the selected *p* positions in each chirp, meaning that the valid updates of \(\mathbf{Y} _{\rm{rec}}\) are preserved in these \(p\cdot M\) entries (the same for \(\mathbf{y} _{\rm{rec}}\)). In this way, the rows with the same indices as the indices of these \(p\cdot M\) entries are “subsampled” from \({\tilde{\mathbf{W }}}\) in this particular manner for the 2D case. The advantage of incorporating 2D MRU is that the matrix-vector multiplications for Fourier transforms can be solved quickly with hardware acceleration in the automotive radar system. Algorithm 1 describes the incorporation of 2D MRU with prior-model-based IST/IHT (PM-IHT/PM-IST) for RD spectrum recovery, where \(\epsilon\) is the threshold parameter of the relative residual update.

The proposed 2D MRU framework can be easily integrated with other well-known CS solvers, e.g., fast iterative shrinkage-thresholding algorithm (FISTA) [20], OMP [21], compressive sampling matching pursuit (CoSaMP) [22], YALL1 [34], AMP [26], and generalized AMP (GAMP) [35], since these solvers make use of residual updates as well.

#### 3.4.1 Computational complexity

The largest part of the computational effort of the 2D algorithms is taken by the nested FFT in each case. Since FFT in column direction is performed for all rows of the matrix and FFT in row direction is performed for all columns of the matrix, the total computational effort for 2D MRU in each iteration loop can be described by

$$\begin{aligned} & O\left( {MN\log (N) + NM\log (M)} \right) \\ & \quad = O\left( {MN(\log (N) + \log (M))} \right) \\ & \quad = O\left( {MN(\log (NM))} \right). \\ \end{aligned}$$

(18)

The computational effort of 1D solvers in each iteration loop can be described by \(\textit{O}\left( q MN\right)\), where *q* denotes the number of all undisturbed samples across all chirps. With the proposed approach the complexity is reduced considerably by a factor of \(\textit{O}\left( \log (NM)/q\right)\). Considering a discrete beat signal \(\mathbf{Y} \in \mathbb{C}^{128\times 256}\) where 10% samples are disturbed by interference (i.e., *q* equals 29491), \(\log (NM)/q \approx 1/6531\).