The signal model of grid-based CS is introduced in Section 2.1. We combine the greedy idea of MP methods and the constrained total least squares (CTLS) technique [20], and thus produce AMP-CTLS to alleviate the off-grid problem. In AMP-CTLS, the grid is cast as an unknown parameter, and is jointly estimated together with x. In Section 2.2, the framework of AMP-CTLS is given. Section 2.3 is dedicated to the iterative joint estimator (IJE) algorithm, which is implemented in AMP-CTLS. In the IJE algorithm, the CTLS technique is used, which is presented in Section 2.4. Section 2.5 summarizes the entire procedure of AMP-CTLS. In Section 2.6, the convergence of IJE is analyzed.

### 2.1 Grid-based CS

CS promises efficient recovery of sparse signals. In many applications, signals are sparse in a continuous parameter space rather than finite discrete atoms. Usually, we divide the continuous parameter into discrete grid points and cast the problem as a grid-based CS model:

\mathbf{y}=\mathbf{\Phi}\left(\mathbf{g}\right)\mathbf{x}+\mathbf{w},

(1)

where **y** ∈ ℂ^{M × 1}and **w** ∈ ℂ^{M × 1}are measurement vector and white Gaussian noise (WGN) vector, respectively. **x** ∈ ℂ^{N × 1}is to be learned. **g** ∈ ℂ^{N × 1}are discrete grid points **g** = [*g*_{1}, *g*_{2}, . . . , *g*_{
N
} ]. **Φ**(**g**) ∈ ℂ ^{M × N} is built from **g**, **Φ**(**g**) = [*ϕ*(*g*_{1}), *ϕ*(*g*_{2}), . . . , *ϕ*(*g*_{
N
} )], and the mapping **g** → **Φ** is known. For example, to recover a frequency sparse signal, we grid the frequency space into discrete frequency points \mathbf{g}={\left[0,\phantom{\rule{2.77695pt}{0ex}}\frac{1}{N},\frac{2}{N},\phantom{\rule{2.77695pt}{0ex}}\dots ,\phantom{\rule{2.77695pt}{0ex}}\frac{N-1}{N}\right]}^{\mathsf{\text{T}}}. **Φ** is a DFT matrix, of which the *m* th-row, *n* th-column element is exp \left(j2\pi \frac{n}{N}m\right). However, the signal is only sparse in the DFT atoms if all of the sinusoids are exactly at the pre-defined grid points [13]. In some cases, no matter how densely we grid the frequency space, the sinusoids could be off-grid, which saps the performance of CS methods [13].

### 2.2 Main idea of AMP-CTLS

The off-grid problem usually emerges because we do not often have enough priori knowledge to generate a perfect grid to guarantee that all of the signals exactly lie on grid points. Thus, we cast the grid as an unknown parameter, and search for the best grid **g** as well as the sparsest **x** by solving the optimum problem:

\widehat{\mathbf{x}},\widehat{\mathbf{g}}=\underset{\mathbf{x}\mathsf{\text{,}}\mathbf{g}}{\mathsf{\text{arg}}\mathsf{\text{min}}}\left|\right|\mathbf{x}|{|}_{0}\mathsf{\text{,s}}.\mathsf{\text{t}}.\phantom{\rule{2.77695pt}{0ex}}\left|\right|\mathbf{y}-\mathbf{\Phi}\left(\mathbf{g}\right)\mathbf{x}|{|}_{2}^{2}\le \eta ,

(2)

where *η* is the noise power. Equation (2) is similar to that used in traditional MP methods [2–7], except that we recover **x** *and* simultaneously estimate the grid. In most cases, solving (2) is a complex non-linear optimum problem. In this article, an iterative method is introduced.

AMP-CTLS inherits the greedy idea from MP methods, which use correlations to iteratively find the support set. In each iteration, one or more atoms are added into the support set. Suppose the support set is obtained as Λ^{(k)}after the *k* th iteration, and denote the corresponding grid points as {\widehat{\mathbf{g}}}_{\text{\Lambda}}^{\left(k\right)}. In traditional MP methods [2–7], **x**_{Λ} is estimated by solving a least squares problem. In AMP-CTLS, considering the off-grid problem, we jointly search for **x**_{Λ} and the best grid points in the neighboring continuous region of {\widehat{\mathbf{g}}}_{\text{\Lambda}}^{\left(k\right)} via (3), in which we minimize norm of the *residual error*, which is defined as **r** = **y** - **Φ** (**g**_{Λ}) **x**_{Λ}.

{\widehat{\mathbf{x}}}_{\mathbf{\Lambda}}^{\left(k+1\right)}\mathsf{\text{,}}{\widehat{\mathbf{g}}}_{\mathbf{\Lambda}}^{\left(k+1\right)}=\underset{{\mathbf{x}}_{\mathbf{\Lambda}},{\mathbf{g}}_{\mathbf{\Lambda}}}{\mathsf{\text{arg}}\mathsf{\text{min}}}{\left|\right|\mathbf{y}-\mathbf{\Phi}\left({\mathbf{g}}_{\mathbf{\Lambda}}\right){\mathbf{x}}_{\mathbf{\Lambda}}\left|\right|}_{2}^{2}

(3)

We develop the iterative joint estimator (IJE) algorithm to solve (3), which is detailed in ensuing section.

### 2.3 IJE algorithm

It is difficult to find an analytical solution to (3). The IJE algorithm is devised to seek a numerical solution. Given initial grid points {\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(0\right), IJE searches for the best grid points **g**_{Λ} in the neighborhood of {\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(0\right). The mismatch of the grid is denoted as \text{\Delta}{\mathbf{g}}_{\text{\Lambda}}={\mathbf{g}}_{\text{\Lambda}}-{\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(0\right)={\left[\text{\Delta}{g}_{1},...,\text{\Delta}{g}_{\left|\text{\Lambda}\right|}\right]}^{\mathsf{\text{T}}}. IJE includes three steps: calculate the estimation of the mismatch, {\widehat{\Delta \mathbf{g}}}_{\Lambda}; update the grid with {\widehat{\Delta \mathbf{g}}}_{\Lambda}; and estimate **x**_{Λ} with projection onto the new grid points. These three steps are executed iteratively to pursue more accurate results. To distinguish from iterations in search for the support set in (3), we denote *l* as the counter of loops in IJE; thus, IJE is expressed as follows:

{\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right),{\mathbf{x}}_{\mathsf{\text{CTLS}}}=\underset{\text{\Delta}{\mathbf{g}}_{\text{\Lambda}},{\mathbf{x}}_{\text{\Lambda}}}{\mathsf{\text{arg}}\mathsf{\text{min}}}{C}_{\mathsf{\text{CTLS}}},

(4)

{\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l+1\right)={\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l\right)+{\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right),

(5)

{\widehat{\mathbf{x}}}_{\text{\Lambda}}\left(l+1\right)=\underset{{\mathbf{x}}_{\text{\Lambda}}}{\mathsf{\text{arg}}\mathsf{\text{min}}}{\u2225\mathbf{y}-\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l+1\right)){\mathbf{x}}_{\text{\Lambda}}\right)\u2225}_{2}^{2}.

(6)

In (4), CTLS technique is applied to simultaneously search for the mismatch Δ**g**_{Λ} and **x**_{
Λ
}, and {\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right) and **x**_{CTLS} are the results. *C*_{CTLS} denotes the penalty function of CTLS, which is detailed in Section 2.4. Since (6) is a linear least squares problem, the closed-form solution is

{\widehat{\mathbf{x}}}_{\text{\Lambda}}\left(l+1\right)={\left(\mathbf{\Phi}\left({\hat{\mathsf{\text{g}}}}_{\text{\Lambda}}\left(l+1\right)\right)\right)}^{\u2020}\mathbf{y}={\left({\mathbf{\Phi}}^{\mathsf{\text{H}}}\mathbf{\Phi}\right)}^{-1}{\mathbf{\Phi}}^{\mathsf{\text{H}}}\mathbf{y}.

(7)

The loops are terminated when the norm of residual error is scarcely reduced.

### 2.4 CTLS technique

Traditional MP methods [2–7] apply least squares to calculate amplitudes of **x**_{Λ} after finding the support set. When there are off-grid signals, mismatches occur in the dictionary; thus, we replace the least squares model with total least squares (TLS) criterion, which is appropriate to deal with the fitting problem when perturbations exist in both the measurement vector and in the dictionary [21]. Since the dictionary mismatches are constrained by errors of grid points, we introduce the constrained total least squares (CTLS) technique [20] in AMP-CTLS to jointly estimate the grid point errors and **x**_{Λ}, i.e., solving (4). It has been proved that CTLS is a constrained space state maximum likelihood estimator [20].

Suppose that we obtain the estimate of grid points as {\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l\right) after *l* th IJE iteration. Assume that the mismatch Δ**g**_{Λ} is significantly small; thus we can approximate the perfect dictionary **Φ**(**g**_{
Λ
}) as a linear combination of the mismatch Δ**g** with Taylor expansion:

\mathbf{\Phi}\left({\mathbf{g}}_{\text{\Lambda}}\right)=\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l\right)\right)+\sum _{i=1}^{\left|\text{\Lambda}\right|}{\text{R}}_{i}\left({\widehat{\mathsf{\text{g}}}}_{\text{\Lambda}}\left(l\right)\right)\text{\Delta}{g}_{i}+\sum _{i=1}^{\left|\text{\Lambda}\right|}o\left(\text{\Delta}{\mathbf{g}}_{i}^{2}\right),

(8)

where **R**_{
i
} ∈ ℂ^{M ×|Λ|}is

{\mathbf{R}}_{i}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l\right)\right)={\left(\right)close="|">\frac{\partial \mathbf{\Phi}\left({\mathbf{g}}_{\text{\Lambda}}\right)}{\partial {g}_{i}}}_{}{\mathbf{g}}_{\text{\Lambda}}={\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l\right)\n

(9)

and *o*(·) denotes higher order terms. For simplicity, in this section we ignore the iteration counter in the notations, and {\mathbf{R}}_{i}\phantom{\rule{0.3em}{0ex}}\left({\u011d}_{\text{\Lambda}}\phantom{\rule{0.3em}{0ex}}\left(l\right)\right),\phantom{\rule{0.3em}{0ex}}\mathbf{\Phi}\left({\u011d}_{\text{\Lambda}}\phantom{\rule{0.3em}{0ex}}\left(l\right)\right) are, respectively, simplified as **R**_{
i
}, **Φ**_{
Λ
}. Neglect o\left(\text{\Delta}{g}_{i}^{2}\right) and the signal model in (1) is replaced by:

\mathbf{y}=\left({\mathbf{\Phi}}_{\text{\Lambda}}+\sum _{i=1}^{\left|\text{\Lambda}\right|}{\mathbf{R}}_{i}\text{\Delta}{g}_{i}\right){\mathbf{x}}_{\text{\Lambda}}+\mathbf{w}.

(10)

CTLS models Δ**g**_{
Λ
}as an unknown random perturbation vector. The grid misalignment and the noise vector are combined into a (*M* +*|* Λ*|*)-dimensional vector \mathbf{v}={\left[{\left(\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\right)}^{\mathsf{\text{T}}},{\mathbf{w}}^{\mathsf{\text{T}}}\right]}^{\mathsf{\text{T}}}, and CTLS aims at minimizing \left|\right|\mathbf{v}|{|}_{2}^{2}. It has been proved that CTLS is a constrained space state maximum likelihood estimator if **v** is a WGN vector [20]. Thus, we first whiten **v**. Assume that Δ**g**_{
Λ
} is independent of **w**. The covariance matrix of Δ**g**_{
Λ
}is {\mathbf{C}}_{\mathbf{g}}=E\left[\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}{\left(\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\right)}^{\mathsf{\text{H}}}\right]\in {\u2102}^{\left|\text{\Lambda}\right|\times \left|\text{\Lambda}\right|}. **D** ∈ ℂ^{|Λ|×|Λ|} obeys {\mathbf{C}}_{\mathbf{g}}^{-1}={\mathbf{D}}^{\mathsf{\text{H}}}\mathbf{D}. The variance of white noise **w** is {\sigma}_{\mathbf{w}}^{2}. We denote an unknown normalized vector **u** ∈ ℂ^{(M+| Λ|) × 1}as (11); thus, **u** is a WGN vector.

\mathbf{u}=\left[\begin{array}{c}\hfill \mathbf{D}\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\hfill \\ \hfill \frac{1}{{\sigma}_{\mathbf{w}}}\mathbf{w}\hfill \end{array}\right]

(11)

Minimize the penalty function {C}_{\mathsf{\text{CTLS}}}=\phantom{\rule{2.77695pt}{0ex}}\left|\right|\mathbf{u}{\left|\right|}_{2}^{2} and (4) is detailed as follows:

\widehat{\mathbf{u}},{\mathbf{x}}_{\mathsf{\text{CTLS}}}=\underset{\mathbf{u},{\mathbf{x}}_{\text{\Lambda}}}{\mathsf{\text{arg}}\mathsf{\text{min}}}{\left|\right|\mathbf{u}\left|\right|}_{2}^{2},

(12)

\mathsf{\text{s}}.\mathsf{\text{t}}.\phantom{\rule{2.77695pt}{0ex}}-\mathbf{y}+\left({\mathbf{\Phi}}_{\text{\Lambda}}+\sum _{i}^{\left|\text{\Lambda}\right|}{\mathbf{R}}_{i}\text{\Delta}{g}_{i}\right){\mathbf{x}}_{\text{\Lambda}}+\mathbf{w}=0.

(13)

The constraint condition (13) can be rewritten as:

\mathsf{\text{s}}.\mathsf{\text{t}}.\phantom{\rule{2.77695pt}{0ex}}-\mathbf{y}+{\mathbf{\Phi}}_{\text{\Lambda}}{\mathbf{x}}_{\text{\Lambda}}+{\mathbf{W}}_{\mathbf{x}}\mathbf{u}=0,

(14)

where **W**_{
x
} = [**H** *σ*_{
w
}**I** _{
M
}] ∈ ℂ^{M × (| Λ|+M)}. **H** ∈ ℂ^{M ×| Λ|}is defined as

H=\mathbf{G}\left({\mathbf{D}}^{-1}\otimes {\mathbf{I}}_{\left|\text{\Lambda}\right|}\right)\left({\mathbf{I}}_{\left|\text{\Lambda}\right|}\otimes {\mathbf{x}}_{\text{\Lambda}}\right),

(15)

where \text{G}=[{\text{R}}_{1},\dots ,{\text{R}}_{|\Lambda |}]\in {\u2102}^{M\times {|\Lambda |}^{2}}. The equivalence between (13) and (14) is proved as follows:

\begin{array}{cc}\hfill \left(\sum _{i=1}^{\left|\text{\Lambda}\right|}{\mathbf{R}}_{i}\text{\Delta}{g}_{i}\right){\mathbf{x}}_{\text{\Lambda}}& =\mathbf{G}\left(\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\otimes {\mathbf{I}}_{\left|\text{\Lambda}\right|}\right){\mathbf{x}}_{\text{\Lambda}}\hfill \\ =\mathbf{G}\left({\mathbf{D}}^{-1}\otimes {\mathbf{I}}_{\left|\text{\Lambda}\right|}\right)\left(\mathbf{D}\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\otimes {\mathbf{I}}_{\left|\text{\Lambda}\right|}\right){\mathbf{x}}_{\text{\Lambda}}\hfill \\ =\mathbf{G}\left({\mathbf{D}}^{-1}\otimes {\mathbf{I}}_{\left|\text{\Lambda}\right|}\right)\left({\mathbf{I}}_{\left|\text{\Lambda}\right|}\otimes {\mathbf{x}}_{\text{\Lambda}}\right)\mathbf{D}\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\hfill \\ =\mathbf{H}\mathbf{D}\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}.\hfill \end{array}

(16)

When **W**_{
x
} is of full-row rank, the optimum problem (12, 14) are equivalent to (17)-(19), which has been proved in [20].

{\mathbf{x}}_{\mathsf{\text{CTLS}}}=\underset{{\mathbf{x}}_{\mathbf{\Lambda}}}{\mathsf{\text{min}}}{\left|\right|{\mathbf{W}}_{\mathbf{x}}^{\u2020}\left(\mathbf{y}-{\mathbf{\Phi}}_{\mathbf{\Lambda}}{\mathbf{x}}_{\mathbf{\Lambda}}\right)\left|\right|}_{2}^{2}

(17)

\widehat{\mathbf{u}}={\mathbf{w}}_{\mathbf{x}}^{\u2020}\left(\mathbf{y}-{\mathbf{\Phi}}_{\text{\Lambda}}{\mathbf{x}}_{\text{\Lambda}}\right){|}_{\mathbf{x}={\mathbf{x}}_{\mathsf{\text{CTLS}}}}

(18)

{\mathbf{w}}_{\mathbf{x}}^{\u2020}={\mathbf{w}}_{\mathbf{x}}^{\mathsf{\text{H}}}{\left({\mathbf{w}}_{\mathbf{x}}{\mathbf{w}}_{\mathbf{x}}^{\mathsf{\text{H}}}\right)}^{-1}

(19)

It is quite difficult to obtain analytical solution to (17). A complex version of Newton method is developed in [20], which is presented in Appendix 1. Initial value of **x**_{
Λ
} required in Newton's method for (17) can be given as:

{\mathbf{x}}_{\mathsf{\text{ini}}}={\mathbf{\Phi}}_{\text{\Lambda}}^{\u2020}\mathbf{y}={\left({\mathbf{\Phi}}_{\text{\Lambda}}^{\mathsf{\text{H}}}{\mathbf{\Phi}}_{\text{\Lambda}}\right)}^{-1}{\mathbf{\Phi}}_{\text{\Lambda}}^{\mathsf{\text{H}}}\mathbf{y}.

(20)

{\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}} is extracted from **û** via {\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}=\left[{\mathbf{D}}^{-1}\phantom{\rule{2.77695pt}{0ex}}{0}_{N}\right]\widehat{\mathbf{u}}, thus (4) is solved. The sketch of CTLS is given in Algorithm 1. As the authors' best knowledge, the convergence guarantees for this Newton method are still open question.

### 2.5 Sketch of AMP-CTLS

Similarly to traditional MP methods [2–7], AMP-CTLS first greedily finds the support set. Then AMP-CTLS adaptively optimizes the grid points indexed in the support set. In this article, we imitate the greedy approach of OMP, in which only one atom is added to the support set in each iteration. If the number of atoms is known, terminate the iterations when the cardinality of the support set reaches the pre-specified number. If it is not known, we can apply some other successfully used stopping criterions, e.g., norm of residual being below a threshold [22]. A sketch of AMP-CTLS is presented in Algorithm 2.

### 2.6 Convergence of the IJE algorithm

Here, we analyze convergence of the IJE algorithm. Assume that the mapping **g**_{Λ}*→* **Φ** (**g**_{Λ}) is linear, which means

\mathbf{\Phi}\left({\mathbf{g}}_{\text{\Lambda}}+\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\right)=\mathbf{\Phi}\left({\mathbf{g}}_{\text{\Lambda}}\right)+\mathbf{g}\left(\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\otimes {\mathbf{I}}_{\left|\text{\Lambda}\right|}\right),

(21)

and **G** should be a constant matrix.

**Proposition**. If the measurement y is perturbed by WGN and (21) is obeyed, IJE monotonically reduces values of the penalty function in (3). The estimates of **x**_{
Λ
} and **g**_{
Λ
} satisfy:

{\left|\right|\mathbf{y}-\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l\right)\right){\widehat{\mathbf{x}}}_{\text{\Lambda}}\left(l\right)\left|\right|}_{2}^{2}\ge {\left|\right|\mathbf{y}-\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l+1\right)\right){\widehat{\mathbf{x}}}_{\text{\Lambda}}\left(l+1\right)\left|\right|}_{2}^{2}.

(22)

**Proof**. Define a penalty function as follows:

{f}_{p}\left(\text{\Delta}{\mathbf{g}}_{\text{\Lambda}},\phantom{\rule{2.77695pt}{0ex}}{\mathbf{x}}_{\text{\Lambda}}\right)={\sigma}_{\mathbf{w}}^{2}{\left|\right|\mathbf{u}\left|\right|}_{2}^{2}={\sigma}_{\mathbf{w}}^{2}{\left(\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\right)}^{\mathsf{\text{H}}}{\mathbf{C}}_{\mathbf{g}}^{-1}\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}+{\left|\right|\mathbf{y}-\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l\right)\right){\mathbf{x}}_{\text{\Lambda}}-\mathbf{G}\left(\text{\Delta}{\mathbf{g}}_{\text{\Lambda}}\otimes {\mathbf{I}}_{\left|\text{\Lambda}\right|}\right){\mathbf{x}}_{\text{\Lambda}}\left|\right|}_{2}^{2};

(23)

thus, {\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right) and **x**_{CTLS} are obtained by solving

{\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right),{\mathbf{x}}_{\mathsf{\text{CTLS}}}=\underset{\text{\Delta}{\mathbf{g}}_{\text{\Lambda}},{\mathbf{x}}_{\text{\Lambda}}}{\mathsf{\text{arg}}\mathsf{\text{min}}}{f}_{p}\left(\text{\Delta}{\mathbf{g}}_{\text{\Lambda}},\phantom{\rule{2.77695pt}{0ex}}{\mathbf{x}}_{\text{\Lambda}}\right),

(24)

which is the same as (4), for {\sigma}_{\mathbf{w}}^{2} is a constant. Thus, it is satisfied that

{f}_{p}\left({\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right),{\mathbf{x}}_{\mathsf{\text{CTLS}}}\right)\le {f}_{p}\left(0,{\widehat{\mathbf{x}}}_{\text{\Lambda}}\left(l\right)\right)=\phantom{\rule{2.77695pt}{0ex}}{\left|\right|\mathbf{y}-\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l\right)\right){\widehat{\mathbf{x}}}_{\text{\Lambda}}\left(l\right)\left|\right|}_{2}^{2}.

(25)

Substitute (5), (21) into {f}_{p}\left({\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right),{\mathbf{x}}_{\mathsf{\text{CTLS}}}\right), and note that {\mathbf{C}}_{\mathbf{g}}^{-1} is a positive definite matrix; thus,

\begin{array}{cc}\hfill {f}_{p}\left({\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right),\phantom{\rule{2.77695pt}{0ex}}{\mathbf{x}}_{\mathsf{\text{CTLS}}}\right)& =\phantom{\rule{2.77695pt}{0ex}}\left|\right|\mathbf{y}-\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l+1\right)\right){\mathbf{x}}_{\mathsf{\text{CTLS}}}|{|}_{2}^{2}+{\sigma}_{\mathbf{w}}^{2}{\left({\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right)\right)}^{\mathsf{\text{H}}}{\mathbf{C}}_{\mathbf{g}}^{-1}{\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right)\hfill \\ \ge \phantom{\rule{2.77695pt}{0ex}}\left|\right|\mathbf{y}-\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l+1\right)\right){\mathbf{x}}_{\mathsf{\text{CTLS}}}|{|}_{2}^{2}\hfill \\ \ge \phantom{\rule{2.77695pt}{0ex}}\left|\right|\mathbf{y}-\mathbf{\Phi}\left({\widehat{\mathbf{g}}}_{\text{\Lambda}}\left(l+1\right)\right){\widehat{\mathbf{x}}}_{\text{\Lambda}}\left(l+1\right)|{|}_{2}^{2},\hfill \end{array}

(26)

where the last inequality is taken from (6). The inequalities in (25) and (26) are transformed to equalities if and only if {\hat{\text{\Delta}\mathbf{g}}}_{\text{\Lambda}}\left(l\right)=0. □

For simplicity, we assume that the transform **Φ**(**g**_{Λ}) is linear. In some practical applications like harmonic retrieval, linearity is not strictly guaranteed. However, when atom mismatch \Delta \text{g} is significantly small, the higher order errors due to Taylor expansion (8) are ignorable, and (21) is approximately satisfied. Numerical examples are performed in Section 5, which demonstrate the convergence of the proposed algorithm in the case of harmonic retrieval.