Open Access

Analysis of the maximum likelihood channel estimator for OFDM systems in the presence of unknown interference

EURASIP Journal on Advances in Signal Processing20172017:69

Received: 8 February 2017

Accepted: 19 September 2017

Published: 29 September 2017


This paper is a theoretical analysis of the maximum likelihood (ML) channel estimator for orthogonal frequency-division multiplexing (OFDM) systems in the presence of unknown interference. The following theoretical results are presented. Firstly, the uniqueness of the ML solution for practical applications, i.e., when thermal noise is present, is analytically demonstrated when the number of transmitted OFDM symbols is strictly greater than one. The ML solution is then derived from the iterative conditional ML (CML) algorithm. Secondly, it is shown that the channel estimate can be described as an algebraic function whose inputs are the initial value and the means and variances of the received samples. Thirdly, it is theoretically demonstrated that the channel estimator is not biased. The second and the third results are obtained by employing oblique projection theory. Furthermore, these results are confirmed by numerical results.

1 Introduction

Narrow band interference (NBI) arises in orthogonal frequency-division multiplexing (OFDM) systems in a number of transmission scenarios, such as Wi-Fi communications [1, 2] or cognitive radio, where different types of wireless services can use the same frequency band. The NBI can affect several subcarriers. It is well known that it greatly degrades the performance of the receiver if it is not treated [3, 4]. When the transmission is NBI-free, the noise consists only of thermal noise, yielding a uniform noise variance for all subcarriers, resulting in the estimation of a single scalar parameter. However, in the presence of NBI, the noise originates from both thermal noise and interference. Due to the nature of NBI, neither the number of affected subcarriers nor their location in the spectrum is known. This brings about the need to estimate the noise variance for each subcarrier, yielding a vector estimation, denoted σ 2 , rather than a scalar. The objective in the presence of NBI is therefore to estimate the set of parameters {h,σ 2 }, where h is the vector containing the taps of the channel impulse response.

Several methods have been proposed to estimate the channel in the presence of interference. In [5], channel estimation is investigated for OFDM systems in the presence of synchronous interference. However, in practical situations, the interferer’s signals are in general asynchronous. Zhou et al.’s work [6] deals with the estimation of noise plus interference power together with successive soft data estimation at each subcarrier. Here, we focus on channel estimation based on pilots.

In [7], the authors proposed an estimator employing a specific pilot structure consisting of two types of pilot symbols with different pilot densities. More recently, in [8], a channel estimator is proposed based on a robust least-squares approach. However, the proposed method requires that the number of pilots be greater than twice the channel order, defined as the number of taps.

In [9] and [10], the NBI is assumed to be Gaussian distributed in the frequency domain, with zero mean and unknown power.

Under the same assumptions, channel estimation with NBI is investigated in the seminal paper [11]. The authors also consider the case where any possible correlation between the interference over adjacent subcarriers is neglected. This case can be considered as the worst case, since correlation is additional information that could be used to improve the estimation. The authors perform pilot-aided channel estimation under the classical assumption that the number of pilots is greater than the channel order. Note that this assumption is also supposed to be verified in this paper. After formulating the maximum likelihood (ML) algorithm for the joint estimation of {h,σ 2 }, it is shown in [11] that the solution is non-unique when the channel order (denoted by L) is greater than the number of transmitted OFDM symbols (denoted by K), leading to ambiguous channel estimates. This is a severe limitation since KL is a common scenario in practice. For example in Wi-Fi scenarios, short frames like control frames are sent frequently. For this reason, the authors suggest resorting to another algorithm, the expectation maximization (EM) algorithm, with the complete data set {X,σ 2 }, where X contains the received signal. This amounts to treating the noise variances as a nuisance random vector. The drawback of this approach is that it imposes the selection of a distribution for the random vector, the inverse gamma, and then the fixing of the distribution parameter off-line through exhaustive grid-search simulations. This can be a limitation for practical use.

In this paper, we first demonstrate that the ambiguities actually appear only when K=1 unless the signal-to-noise ratio (SNR) value is extremely high, in which case ambiguities indeed appear if KL. Thus, for typical SNR values corresponding to practical applications and K>1, the joint ML technique can be used instead of the EM technique. This makes it possible to avoid a grid search. But even the case K=1 can be handled with a specific approach briefly outlined in this paper. Thus, these results open a wider field of application for the joint ML technique.

For the case K>1, the likelihood equations can be solved with the conditional ML (CML) algorithm. Note that the CML algorithm has been investigated for channel estimation in code division multiple access (CDMA) systems [12]. In this paper, we first present the CML equations for the considered OFDM system. Numerical simulations indicate that the CML is well defined in the SNR range corresponding to practical applications.

Then, we use a new formulation of the CML based on oblique projections to investigate the first moment. Oblique projections are well known for their applications in signal processing, especially in channel estimation [1316]. With this formulation, it is proved that the channel estimator is unbiased. This result is of importance, in particular, for deriving the Cramer-Rao bound of this channel estimation problem. Moreover, the channel estimate is proved to be an algebraic function whose inputs are the initial value and the means and variances of the received samples.

This paper is organized as follows. Section 2 describes the system model. In Section 3, we discuss the joint ML estimation of {h,σ 2 } and the question of the uniqueness of the solution. Then, Section 4 introduces the CML algorithm to find the solution. A theoretical study of the CML is provided in Section 5. The Cramer-Rao bound is derived in Section 6, and simulation results are presented in Section 7.

Notations: The field of complex numbers is denoted \(\mathcal {C}\). Matrices [vectors] are denoted with upper [lower] case boldface letters (e.g., A or a). The complex number a i,j indicates the (i,j)th entry of the matrix A; a i indicates the ith entry of the vector a. The vector A i is the ith row vector of matrix A. The N×N identity matrix is denoted by I N , and 0 M,N is the M×N matrix of zeros. The matrix D(x) is a diagonal matrix with vector x on its main diagonal. The superscripts (·) T , (·) H , (·), (·) R , and (·) I stand, respectively, for the operations of taking the transpose, Hermitian conjugate, complex conjugate, real part, and imaginary part. The mathematical expectation is denoted \(\mathbb {E}[\cdot ]\). The multivariate complex normal distribution of a P-dimensional random vector is denoted by \(\mathcal {C}\mathcal {N}(\boldsymbol {\mu }, \boldsymbol {\Sigma })\) where μ is the P-dimensional mean vector and Σ the P×P covariance matrix. The chi-square distribution with k degrees of freedom is denoted by \(\chi ^{2}_{k}\). The notations Range(A) and Null(A) indicate, respectively, the range space and the null space of A.

Note on the notations in bold. 2 and |.| 2 : let a=[a 1,,a N ] T be a vector of size N×1. We use the notation in bold a 2 to denote the vector that is formed by taking the square of the entries of a, i.e., \(\mathbf {a}^{\mathbf {2}}=[a_{1}^{2}, \cdots, a_{N}^{2}]^{T}\). Similarly, | a | 2 =[|a 1|2,,|a N |2] T .

2 System model

Let us consider an OFDM system with N subcarriers and a cyclic prefix length N g . We assume that the channel between the transmitter and the receiver is modelled as a frequency-selective fading channel with a channel impulse response (CIR) vector h of order L, h=[h 1,…,h L ] T . The CIR h is assumed to be static over the transmission of K OFDM symbols. To estimate the channel, P pilot symbols with constant energy are inserted into the N subcarriers at the positions \(\mathcal {P}=\{n_{p}, p=1, \ldots, P\}\). In our paper, we do not consider a particular pilot scheme \(\mathcal {P}\), and all our derivations could be applied to any \(\mathcal {P}\). The only constraint is that L<P. The received frequency-domain pilot sample of the kth OFDM symbol at the n p subcarrier is
$$ x_{p, k}=c_{p, k}H_{p}+w_{p,k}, $$
where c p,k is the pilot symbol with normalized power, i.e., |c p,k |2=1, transmitted on the n p th subcarrier, and w p,k is a disturbance term that takes into account the background noise plus any possible interference. The random complex number w p,k is assumed to be Gaussian distributed with zero mean and unknown variance \(\sigma _{p}^{2} = \sigma ^{2}_{\text {AWGN}} + \sigma ^{2}_{\text {NBI},p}\), where \(\sigma ^{2}_{\text {AWGN}}\) is the additive white Gaussian noise (AWGN) contribution and \( \sigma ^{2}_{\text {NBI},p}\) is the average NBI power, which is assumed constant over the transmission period. The channel frequency response H p at the n p th subcarrier is given by
$$ H_{p}=\sum\limits_{l=1}^{L} h_{l}\exp\left(-j\frac{2\pi\,n_{p}(l-1)}{N}\right), \ p=1, \hdots, P. $$
This yields the model for the P pilot subcarriers of the kth received OFDM block:
$$\begin{array}{@{}rcl@{}} \mathbf{x}_k=\mathbf{C}_{k}\mathbf{F}\mathbf{h} + \mathbf{w}_{k}, \quad k=1, \ldots, K, \end{array} $$

with x k =[x 1,k ,…,x P,k ] T , w k =[w 1,k ,…,w P,k ] T , C k =D([c 1,k ,…,c P,k ]), where D(u) is the diagonal matrix with the entries of vector u on its diagonal and F is the P×L matrix with the (p,l)th entry defined as \(\exp \left (-j\frac {2\pi \,n_{p}(l-1)}{N}\right), \ p=1, \ldots, P, l=1, \hdots, L\).

The ML estimation of the set of unknown parameters {h,σ 2 } is desired, where \(\boldsymbol {\sigma }^{\mathbf {2}} = \left [\sigma ^{2}_{1}, \hdots, \sigma ^{2}_{P}\right ]^{T}\), based on the set of received samples \(\{\mathbf {y}_{k}=\mathbf {C}_{k}^{-1}\mathbf {x}_{k}, k=1, \hdots, K\}\).

Let us now define the sample means and the sample variances of the received samples, which will be used in the rest of the paper. The sample mean vector is denoted by \(\bar {\mathbf {y}}=[\bar {y}_{1}, \ldots, \bar {y}_{P}]^{T}\), where for p=1,…,P,
$$ \bar{y}_{p} = \frac{1}{K}\sum\limits_{k=1}^{K} y_{p, k}. $$
The sample variance vector is denoted by \(\mathbf {s}^{\mathbf {2}} = \left [s^{2}_{1}, \hdots, s^{2}_{P}\right ]^{T}\), where for p=1,…,P,
$$ s^{2}_{p} = \frac{1}{K}\sum\limits_{k=1}^{K}| y_{p,k}-\bar{y}_{p}|^{2}. $$

3 Maximum likelihood estimation

In this section, the ML estimate of {h,σ 2 } will be investigated by following the approach presented in [11]. However, it will be shown that the ambiguities mentioned in [11] appear only when K=1, unless the SNR value is extremely high. Hence, when K≥2 and practical SNR values are considered, it will be possible to get the ML solution without ambiguities. The estimates when K≥2 will be derived through the conditional ML and their properties studied in the next section.

Recall that the K independent observations y 1,…,y K are drawn from the following p-variate normal regression model:
$$\begin{array}{@{}rcl@{}} \mathcal{C}\mathcal{N}\left(\mathbf{F}\mathbf{h}, \mathbf{D}\left(\boldsymbol{\sigma}^{\mathbf{2}}\right)\right). \end{array} $$
Then, the negative log-likelihood function \(\ell (\tilde {\mathbf {h}},\tilde {\boldsymbol {\sigma }}^{\mathbf {2}})\) is given by [11]:
$$\begin{array}{@{}rcl@{}} \ell\left(\tilde{\mathbf{h}},\tilde{\boldsymbol{\sigma}}^{\mathbf{2}}\right) =K\sum\limits_{p=1}^{P}\ln\left(\pi\tilde{\sigma}_{p}^{2}\right)+\sum\limits_{k=1}^{K}\sum\limits_{p=1}^P \frac{|y_{p,k}-\mathbf{F}_{p}\tilde{\mathbf{h}}|^{2}}{\tilde{\sigma}_{p}^{2}}. \end{array} $$

Here, F p denotes the pth row of the matrix F.

The approach to derive the ML solution is summed up below. The variances which minimize (7) for a given \(\tilde {\mathbf {h}}\) are first calculated:
$$ \hat{\sigma}_{p}^{2}(\tilde{\mathbf{h}})=\frac{1}{K}\sum_{k=1}^{K}|y_{p,k}-\mathbf{F}_{p}\tilde{\mathbf{h}}|^{2}, \ p=1, \hdots, P. $$
Then, substituting \(\hat {\sigma }^{2}_{p}(\tilde {\mathbf {h}})\) for \(\tilde {\sigma }_{p}^{2}\) in (7) yields
$$\begin{array}{@{}rcl@{}} \Lambda(\tilde{\mathbf{h}}) := \ell(\tilde{\mathbf{h}},\hat{\boldsymbol{\sigma}}^{\mathbf{2}}(\tilde{\mathbf{h}})) =K\sum\limits_{p=1}^{P}\ln\left(\pi\hat{\sigma}_{p}^{2}(\tilde{\mathbf{h}})\right)+KP. \end{array} $$
Finally, the ML estimate of the CIR vector h is the one that minimizes \(\Lambda (\tilde {\mathbf {h}})\). Special treatment is required due to the presence of the logarithm function in (9). Indeed, the values of \(\tilde {\mathbf {h}}\) for which \(\hat {\sigma }_{p}^{2}(\tilde {\mathbf {h}})=0\) make \(\Lambda (\tilde {\mathbf {h}})\) tend to −. The consequences for the uniqueness of the solution are explained in more detail in [11], where it is shown that the minimization leads to ambiguous channel estimates if KL. At this point, it is important to precise the context leading to this result. It was assumed that no prior knowledge about the noise variances was available, which implied that theoretically they could be zero. However, in practice, the noise variance is never zero due to the presence of thermal noise. We will now revisit this ambiguity issue by taking into account the fact that the noise variance is not zero. Let us first observe that the equation \(\hat {\sigma }_{p}^{2}(\tilde {\mathbf {h}})=0\) (8) for a given p yields a linear system of K equations with L unknowns:
$$ {\mathbf{A}} {\tilde{\mathbf{h}}} = \left[y_{p,1}, \hdots, y_{p,K}\right]^{T}, $$

where the K×L matrix A is built by stacking the row vectors F p . If K=1, then the system (10) is underdetermined (one equation and L unknowns), yielding an infinite number of solutions. However, for K>1, the specific structure of A has to be taken into account when solving the system. On the one hand, the rows of A are all identical. On the other hand, in the presence of AWGN, the samples y p,1,…,y p,K are all different since they are Gaussian and independent. Consequently, it is found that (10) has no solution. Then, it becomes obvious that for K>1, \(\hat {\sigma }_{p}^{2}(\tilde {\mathbf {h}}) >0\) for all p and for all \(\tilde {\mathbf {h}}\). Therefore, the ML estimate of the CIR vector h is well defined for K>1. This is true in the presence of AWGN, but at extremely high values of SNR, the difference between the samples tends to disappear and the problem of ambiguities still remains. However, simulation results show that the system is well conditioned if the SNR is less than 150 dB, which is the case of practical applications.

This ambiguity issue appears more clearly with the formulation of \(\hat {\sigma }_{p}^{2}(\tilde {\mathbf {h}})\) based on the sample means and variances:
$$ \hat{\sigma}_{p}^{2}(\tilde{\mathbf{h}})=s^{2}_{p}+|\bar{y}_{p}-\mathbf{F}_{p}\tilde{\mathbf{h}}|^{2}, $$

when noticing that if K=1, the sample variances \(s^{2}_{p}=0\) for all p. However, if K>1, \(s^{2}_{p} \neq 0\) for all p, making it impossible to set \(\hat {\sigma }_{p}^{2}(\tilde {\mathbf {h}})\) to zero.

This article is concerned with the case K>1. However, it is worth mentioning that the case of K=1 can still be handled with the following approach. It has been shown that it is meaningless to search for the ML of σ 2 in the domain (0,+) P . A possible solution is to restrict the parameter space by imposing a priori lower bounds of the form [17]
$$\begin{array}{@{}rcl@{}} 0<\delta_{p}^2 \leq \hat{\sigma}_{p}^2 \end{array} $$

on the variances \(\hat {\sigma }_{p}^{2}\). Let us define the vector \(\boldsymbol {\delta }^{\mathbf {2}} = \left [\delta _{1}^{2}, \cdots, \delta _{P}^{2}\right ]^{T}\). The variances \(\hat {\boldsymbol {\sigma }}^{\mathbf {2}}\in \prod _{p=1}^{P}[\delta _{p}^{2}, +\infty)\) which minimize (7) for given \(\tilde {\mathbf {h}}\) are given by

$$\begin{array}{@{}rcl@{}} \hat{\sigma}_{p}^{2}\left(\tilde{\mathbf{h}}, \delta_{p}^{2}\right)&=&|y_{p}-\mathbf{F}_{p}\tilde{\mathbf{h}}|^{2},\quad \text{if}\quad |y_{p}-\mathbf{F}_{p}\tilde{\mathbf{h}}|^{2} \geq \delta_{p}^{2}, \end{array} $$
$$\begin{array}{@{}rcl@{}} \hat{\sigma}_{p}^{2}\left(\tilde{\mathbf{h}}, \delta_{p}^{2}\right)&=&\delta_{p}^{2},\quad \text{if}\quad |y_{p}-\mathbf{F}_{p}\tilde{\mathbf{h}}|^{2} < \delta_{p}^2. \end{array} $$
The vector \(\hat {\boldsymbol {\sigma }}^{\mathbf {2}}(\tilde {\mathbf {h}}, \boldsymbol {\delta }^{\mathbf {2}})\) is substituted for \(\tilde {\boldsymbol {\sigma }}^{\mathbf {2}}\) in \(\ell (\tilde {\mathbf {h}},\tilde {\boldsymbol {\sigma }}^{\mathbf {2}})\) to obtain
$$\begin{array}{@{}rcl@{}} \Lambda\left(\tilde{\mathbf{h}}, \boldsymbol{\delta}^{\mathbf{2}}\right)=\sum_{p=1}^{P}\left(\ln\left(\pi\hat{\sigma}_{p}^{2}(\tilde{\mathbf{h}}, \delta_{p}^2)\right)+\frac{|y_p-\mathbf{F}_{p}\tilde{\mathbf{h}}|^{2}}{\hat{\sigma}_{p}^{2}(\tilde{\mathbf{h}}, \delta_{p}^2)}\right), \end{array} $$

and then the CIR estimate \(\hat {\mathbf {h}}\) is the one that minimizes \(\Lambda (\tilde {\mathbf {h}}, \boldsymbol {\delta }^{\mathbf {2}})\) with respect to \(\tilde {\mathbf {h}}\).

As previously stated, this article will focus on K>1. As there is no closed-form solution for the minimization of \(\Lambda (\tilde {\mathbf {h}})\), we suggest using the conditional ML in the next section to find an iterative solution and study the properties of this solution.

4 Conditional ML

The CML algorithm is an iterative algorithm for solving the ML problem. The CML is the result of two nested minimizations. First, (7) is minimized given the channel \(\tilde {\mathbf {h}}\), yielding the estimation of σ 2 given by (8) or (11):
$$ \hat{\boldsymbol{\sigma}}^{\mathbf{2}}(\tilde{\mathbf{h}})=\mathbf{s}^{\mathbf{2}} +\boldsymbol{|}\bar{\mathbf{y}}-\mathbf{F}\tilde{\mathbf{h}}\boldsymbol{|}^{\mathbf{2}}, $$
where \(\boldsymbol {|}\bar {\mathbf {y}}-\mathbf {F}\tilde {\mathbf {h}}\boldsymbol {|}^{\mathbf {2}} = \left [|\bar {y}_{1}-\mathbf {F}_{1}\tilde {\mathbf {h}}|^{2}, \hdots, |\bar {y}_{P}-\mathbf {F}_{P}\tilde {\mathbf {h}}|^{2}\right ]^{T}\) (see the note on the notation in bold |.| 2 in the Notations section). Conversely, (7) is minimized given \(\tilde {\boldsymbol {\sigma }}^{\mathbf {2}}\), yielding the following estimate for h:
$$\begin{array}{@{}rcl@{}} \hat{\mathbf{h}}(\tilde{\boldsymbol{\sigma}}^{\mathbf{2}})=\left(\mathbf{F}^{H}\mathbf{D}^{-1} \left(\tilde{\boldsymbol{\sigma}}^{\mathbf{2}}\right)\mathbf{F}\right)^{-1}\mathbf{F}^{H}\mathbf{D}^{-1} \left(\tilde{\boldsymbol{\sigma}}^{\mathbf{2}}\right)\bar{\mathbf{y}}. \end{array} $$

We obtain the following iterative algorithm, with I the number of iterations:

We recall that we have used the notations in bold defined in the notation section. Note that with this initialization, \(\hat {\mathbf {h}}^{(1)}\) is the ordinary least-squares estimate of h. Note also that the expectation-maximization algorithm of [11, equations (28),(29)] corresponds to the CML algorithm for λ=0.

This algorithm is known as the scoring method [18, 19] or the conditional maximum likelihood (CML) algorithm [20].

The CML’s important properties include:

1. Given \({\hat {\boldsymbol {\sigma }}}^{\mathbf {2}(i)}\), the vector \(\hat {\mathbf {h}}^{(i+1)}\) maximizes the likelihood. Given \(\hat {\mathbf {h}}^{(i)}\), the vector \({\hat {\boldsymbol {\sigma }}}^{\mathbf {2}(i+1)}\) maximizes the likelihood.

2. The logarithmic means \(\frac {1}{P}\sum _{p=1}^{P}\ln ({\hat {\sigma }_{p}}^{2(i)})\), for i=0,1,…, are non-increasing. In other words, for all i,
$$ \sum_{p=1}^{P}\ln\left({\hat{\sigma}_{p}}^{2(i+1)}\right)\leq \sum_{p=1}^{P}\ln\left({\hat{\sigma}_{p}}^{2(i)}\right). $$

In addition, \(\sum _{p=1}^{P}\ln \left ({\hat {\sigma }_{p}}^{2(i)}\right)\) converges to some constant \(\ln (c^{*}) \geq \sum _{p=1}^{P}\ln \left (s^{2}_{p}\right)\).

In this section, the CML algorithm has been presented with some of its well-known properties. However, to our knowledge, no work has been carried out about the moments of the CML solution in this particular context. The next section will address this topic.

5 Theoretical analysis of the CML algorithm

In this section, some properties of the CML algorithm are studied. In particular, the first moment of the estimators is investigated. To do so, a formulation of the CML based on oblique projections is established. It will be shown that the CML algorithm can be viewed as successive oblique projections.

5.1 The CML algorithm and oblique projections

First, we briefly present a few preliminaries about projections. For any invertible matrix Σ, observe that the matrix Π(Σ):=F(F H Σ F)−1 F H Σ splits the space \(\mathcal {C}^{P}\) into two subspaces: the range space \(\text {Range}(\Pi (\Sigma))=\Pi (\Sigma)\left (\mathcal {C}^{P}\right)\) of Π(Σ) and its null space \(\text {Null}(\Pi (\Sigma))=[\mathbf {I}-\Pi (\Sigma)]\left (\mathcal {C}^{P}\right)\). Note that the range of Π(Σ) is the range of F. The linear operator defined by Π(Σ) is known as an oblique projection onto Range(F). If Σ=I, then Π(I)=F(F H F)−1 F H is the orthogonal projection on Range(F). For simplicity of notation, Π(I) is now denoted by Π.

Now, we go back to the CML algorithm defined in the previous Section.

At the first iteration, the orthogonal projection Π splits the sample mean into the following two components (see Fig. 1 for a geometrical interpretation):
$$\begin{array}{@{}rcl@{}} \mathbf{F}\hat{\mathbf{h}}^{(1)}&=&\Pi\bar{\mathbf{y}}, \\ \mathbf{b}^{(1)}&:=&(\mathbf{I}-\Pi)\bar{\mathbf{y}}. \end{array} $$
Fig. 1

First iteration of the CML algorithm: orthogonal projection

The vector b (1) is the orthogonal projection of \(\bar {\mathbf {y}}\) onto Null(Π).

Given \(\hat {\mathbf {h}}^{(1)}\), the ML of the variance vector is given by
$$\begin{array}{@{}rcl@{}} {\hat{\boldsymbol{\sigma}}}^{\mathbf{2}(1)}=\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(1)}\boldsymbol{|}^{\mathbf{2}}, \end{array} $$

where the column vector \(\boldsymbol {|}\mathbf {b}^{(1)}\boldsymbol {|}^{\mathbf {2}} = \left [|b_{1}^{(1)}|^{2}, \hdots, |b_{P}^{(1)}|^{2}\right ]^{T}\).

At the i+1th iteration, we have
$$\begin{array}{@{}rcl@{}} \hat{\mathbf{h}}^{(i+1)}=\left(\mathbf{F}^{H}\mathbf{D}^{-1}\left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i)}\boldsymbol{|}^{\mathbf{2}}\right)\mathbf{F}\right)^{-1}\\ \mathbf{F}^{H}\mathbf{D}^{-1}\left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i)}\boldsymbol{|}^{\mathbf{2}}\right)\bar{\mathbf{y}}. \end{array} $$
The oblique projection Π(D −1(s 2 +| b (i) | 2 )) splits the sample mean into the following two components (see Fig. 2):
$$\begin{array}{@{}rcl@{}} \mathbf{F}\hat{\mathbf{h}}^{(i+1)}&=&\Pi\left(\mathbf{D}^{-1}\left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i)}\boldsymbol{|}^{\mathbf{2}}\right)\right) \bar{\mathbf{y}},\\ \mathbf{b}^{(i+1)}&=&\bar{\mathbf{y}}-\mathbf{F}\hat{\mathbf{h}}^{(i+1)}\\ &=&\left(\mathbf{I}-\Pi\left(\mathbf{D}^{-1}\left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i)}\boldsymbol{|}^{\mathbf{2}}\right)\right)\right)\bar{\mathbf{y}}. \end{array} $$
Fig. 2

Iteration i≥1 of the CML algorithm: oblique projection

Given \(\hat {\mathbf {h}}^{(i+1)}\), the ML of the variance vector is given by
$$\begin{array}{@{}rcl@{}} {\hat{\boldsymbol{\sigma}}}^{\mathbf{2}(i+1)}=\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i+1)}\boldsymbol{|}^{\mathbf{2}}. \end{array} $$
From this discussion, it can be concluded that the solution provided by the CML algorithm can be sought either in \(\mathbf {F}\hat {\mathbf {h}}^{(i)}\) or in the variable \(\mathbf {b}^{(i)}=\bar {\mathbf {y}}-\mathbf {F}\hat {\mathbf {h}}^{(i)}\). In the perspective of investigating the CML properties, it will be seen that it is more convenient to consider b (i), which yields an equivalent algorithm for solving the CML:
$$\begin{array}{@{}rcl@{}} \mathbf{b}^{(i+1)}=(\mathbf{I}-\Pi\left(\mathbf{D}^{-1}\left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i)}\boldsymbol{|}^{\mathbf{2}}\right)\right)\bar{\mathbf{y}}. \end{array} $$
Now, let us study the properties of Eq. (20). To do so, let us first write b (i+1) as a function of the three following variables \(\mathbf {b}^{(i+1)} = \varphi \left (\bar {\mathbf {y}},\mathbf {s}^{\mathbf {2}},\mathbf {b}^{(i)}\right)\) where
$$ \varphi\left(\bar{\mathbf{y}},\mathbf{s}^{\mathbf{2}},\mathbf{b}^{(i)}\right):=(\mathbf{I}-\Pi\left(\mathbf{D}^{-1}\left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i)}\boldsymbol{|}^{\mathbf{2}}\right) \right)\bar{\mathbf{y}}, $$

is a function of size P×1. From now on, φ p denotes the pth entry of φ. Now, the following properties are stated in Proposition 1. They will be used to derive Proposition 2.

Proposition 1

1) For i≥1,
$$ \varphi\left(\bar{\mathbf{y}},\mathbf{s}^{\mathbf{2}},\mathbf{b}^{(i)}\right) = \varphi\left(\mathbf{b}^{(1)},\mathbf{s}^{\mathbf{2}},\mathbf{b}^{(i)}\right), $$
$$ \begin{aligned} \varphi\left(\mathbf{b}^{(1)},\mathbf{s}^{\mathbf{2}},\mathbf{b}^{(i)}\right)&=\left(\mathbf{I}-\Pi\mathbf{D} \left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i)}\boldsymbol{|}^{\mathbf{2}}\right)\right.\\&\quad\times\left.\Pi\mathbf{D}^{-1} \left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}^{(i)}\boldsymbol{|}^{\mathbf{2}}\right)\right)\mathbf{b}^{(1)}. \end{aligned} $$
2) The maps
$$ \begin{aligned} \varphi_{p}\left(\mathbf{b}^{(1)}, \mathbf{s}^{\mathbf{2}},\mathbf{b}\right)= \left(\mathbf{I} -\Pi\mathbf{D}\left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}\boldsymbol{|}^{\mathbf{2}}\right)\Pi\mathbf{D}^{-1} \left(\mathbf{s}^{\mathbf{2}}+\boldsymbol{|}\mathbf{b}\boldsymbol{|}^{\mathbf{2}}\right)\right)_{p}\mathbf{b}^{(1)} \end{aligned} $$
for p=1,…,P are linear with respect to b (1) and rational with respect to the variables \(s_{1}^{2}, \hdots,s_{P}^{2}\), |b 1|2,…,|b P |2. Recall that the subscript p applied to a matrix means taking the pth row of the matrix. More precisely,
$$ \begin{aligned} \varphi_{p}\left(\mathbf{b}^{(1)}, \mathbf{s}^{\mathbf{2}},\mathbf{b}\right)&= b_{p}^{(1)}-\sum\limits_{k_{1}=1}^{P}\sum\limits_{k_{2}=1}^{P}\frac{s_{k_{1}}^{2}+|b_{k_{1}}|^{2}} {s_{k_{2}}^{2}+|b_{k_{2}}|^{2}}\\ &\quad\times\left[\Pi\mathbf{D}^{(k_{1})}\Pi\mathbf{D}^{(k_{2})}\right]_{p}\mathbf{b}^{(1)}, \end{aligned} $$

where D (k)=D(0,…,0,1,0,…,0) and the kth entry is 1.

3) We have for p=1,…,P,
$$\begin{array}{@{}rcl@{}} b_{p}^{(i+1)}&=&\varphi_{p}\left(\mathbf{b}^{(1)}, \mathbf{s}^{\mathbf{2}},\mathbf{b}^{(i)}\right)\\ &=&\varphi_{p}\left(\mathbf{b}^{(1)},\mathbf{s}^{\mathbf{2}},\varphi\left(\mathbf{b}^{(1)},\mathbf{s}^{\mathbf{2}},\mathbf{b}^{(i-1)}\right)\right)\\ &\vdots&\\ &=&\varphi_{p}^{(i)}\left(\mathbf{b}^{(1)},\mathbf{s}^{\mathbf{2}}\right) \end{array} $$

where the map \(\varphi _{p}^{(i)}(\mathbf {b}^{(1)},\mathbf {s}^{\mathbf {2}})\)is odd with respect to the variable \(b_{k}^{(1)}\) for all k=1,…,P.

4) Any limit (say) ω(b (1),s 2 ) of the sequence b (i+1) is a root of the rational map
$$\begin{array}{@{}rcl@{}} \mathbf{b}-\varphi\left(\mathbf{b}^{(1)},\mathbf{s}^{\mathbf{2}},\mathbf{b}\right). \end{array} $$

Thus, the map (b (1),s 2 )→ω(b (1),s 2 )is an algebraic function, i.e., for p=1,…,P, there exist polynomial functions Q p such that Q p (b (1),s 2 ,ω(b (1),s 2 ))=0.

The proofs are given in Appendix 1. Note that (22) is straightforward with the geometrical interpretation of Fig. 2, where it can be observed that the projections of b (1) and \(\bar {\mathbf {y}}\) onto Null(Π(D −1(s 2 +| b (i) | 2 ))) are the same.

5.2 The mean of the CML estimator

The distributions of the sample mean and the sample variance are derived in Appendix 2. Upon the convergence of the algorithm (20), we obtain \(\hat {\mathbf {h}} \), \(\hat {\mathbf {b}}\), and \(\hat {\boldsymbol {\sigma }}^{\mathbf {2}}\). From \(\mathbf {F}\hat {\mathbf {h}} = \bar {\mathbf {y}} - \hat {\mathbf {b}}\) and (38) in 2, we obtain
$$ \mathbb{E}\left[\hat{\mathbf{h}}\right]=\mathbf{h}-(\mathbf{F}^{H}\mathbf{F})^{-1}\mathbf{F}^{H}\mathbb{E}[\hat{\mathbf{b}}]. $$
From \(\hat {\boldsymbol {\sigma }}^{\mathbf {2}}=\mathbf {s}^{\mathbf {2}}+\boldsymbol {|}\hat {\mathbf {b}}\boldsymbol {|}^{\mathbf {2}}\) and (39) in Appendix 2, we obtain
$$ \mathbb{E}\left[\hat{\boldsymbol{\sigma}}^{\mathbf{2}}\right]=\frac{K-1}{K}\boldsymbol{\sigma}^{\mathbf{2}}+\mathbb{E} \left[\boldsymbol{|}\hat{\mathbf{b}}\boldsymbol{|}^{\mathbf{2}}\right], $$

using the property that the mean of a chi-square random variable of n degrees of freedom is n. Now, from Proposition 1, the following can be shown (see Appendix 3 for the proof).

Proposition 2

The vector \( \hat {\mathbf {b}}\) is zero mean, i.e.,
$$ \mathbb{E}\left[\hat{b}_{p}\right]=0, \ p=1, \hdots, P. $$

Therefore, from (26), \(\hat {\mathbf {h}}\) is an unbiased estimator.

5.3 Complexity of the CML algorithm

In this Section, the complexity of the CML algorithm (20) is investigated. First, it is noteworthy that it is more interesting in terms of complexity to use the formulation (23) instead of (20). Indeed, in (20), the matrix inversion (F H (D −1(s 2 +| b (i) | 2 )F)−1 is required at each iteration, whereas in (23), the computation of matrix Π can be done off-line and stored in a memory. In this way, the operations for computing (23) only consist in matrix products, which is less demanding. Moreover, the calculation of s 2 is carried out just once at the beginning of the algorithm. Therefore, the algorithm requires O(P 3) floating point operations in total for each iteration i.

6 The Cramer-Rao bound

Let us define the 2L+P×1 vector of the real parameters to be estimated:
$$ \boldsymbol{\theta}= \left[h_{1}^{R}, \hdots, h_{L}^{R}, h_{1}^{I}, \hdots, h_{L}^{I}, \sigma^{2}_{1}, \hdots, \sigma^{2}_{P}\right]^{T}, $$

where \(h_{l}^{R}, h_{l}^{I}\) are, respectively, the real and imaginary parts of h l .

The Cramer-Rao bound (CRB) for this estimation problem states that the covariance matrix of \(\hat {\boldsymbol {\theta }}\) satisfies
$$\begin{array}{@{}rcl@{}} \text{cov}(\hat{\boldsymbol{\theta}})\geq \frac{\partial \psi(\boldsymbol{\theta})}{\partial\boldsymbol{\theta}} \mathbf{J}^{-1}\frac{\partial \psi(\boldsymbol{\theta})}{\partial\boldsymbol{\theta}}^{T}, \end{array} $$
$$ \psi(\boldsymbol{\theta}) = \mathbb{E}\left[\hat{\boldsymbol{\theta}}\right] $$

and \(\frac {\partial \psi (\boldsymbol {\theta })}{\partial \boldsymbol {\theta }}\) is the Jacobian matrix, whose (n,m)th element is given by \(\frac {\partial \psi _{n}(\boldsymbol {\theta })}{\partial \theta _{m}}\). Note that in [11], the CRB is calculated by assuming perfect knowledge of the variance. In this Section, both channel and variance parameters are being considered. The matrix J is the 2L+P×2L+P Fisher information matrix. Its (m,n)th entry is defined as \(\mathbb {E}\left [\frac {\partial ^{2}}{\partial \theta _{n} \partial \theta _{m}} \ell \left (\mathbf {h}, \boldsymbol {\sigma }^{\mathbf {2}}\right)\right ]\), where (h,σ 2 ) is the negative log-likelihood defined in (7).

Therefore, it can be seen from (31) that knowing the moments of the estimator is required in order to calculate the CRB. The results of Section 5.2 will be exploited to do so.

The results of the calculation of J are given below, and the details are in Appendix 4. The matrix J can be written as
$$ \mathbf{J} = \left[ \begin{array}{cc} \mathbf{J}_{h} & {0}_{2L, P} \\ {0}_{P, 2L} & \mathbf{J}_{\sigma^{2}} \end{array} \right] $$
where \(\mathbf {J}_{\sigma ^{2}} = K\mathbf {D}\left (\frac {1}{\sigma ^{4}_{1}}, \hdots,\frac {1}{\sigma ^{4}_{P}}\right)\) and where the entries of J h are defined by (41), (42), and (43) in Appendix 4. To compute the CRB, the calculation of the inverse of J is required. Since the inverse of a block diagonal matrix is the block diagonal matrix of the inverses of the blocks, as long as the submatrices are invertible, we have
$$ \mathbf{J}^{-1} = \left[ \begin{array}{cc} \mathbf{J}_{h}^{-1} & {0}_{2L, P} \\ {0}_{P, 2L} & \mathbf{J}_{\sigma^{2}}^{-1} \end{array} \right]. $$
From (26), (27), and Proposition 2, we can express ψ(θ) as a function of h and σ 2 and calculate the derivative:
$$ \frac{\partial\psi(\boldsymbol{\theta})}{\partial\boldsymbol{\theta}} = \left[ \begin{array}{cc} \mathbf{I}_{2L} & {0}_{2L \times P} \\ {0}_{P \times 2L} & \mathbf{M} \end{array} \right] $$
where the (p 1,p 2)th entry of M is defined as \(\frac {K-1}{K}\delta _{p_{1}}^{p_{2}} + \mathbb {E}\left [\frac {\partial |\hat {b}_{p_{1}}|^{2}}{\partial \sigma ^{2}_{p_{2}}}\right ]\). Therefore, we have
$$ \frac{\partial\psi(\boldsymbol{\theta})}{\partial\boldsymbol{\theta}} \mathbf{J}^{-1}\frac{\partial \psi(\boldsymbol{\theta})}{\partial\boldsymbol{\theta}}^{T} = \left[ \begin{array}{cc} \mathbf{J}_{h}^{-1} & {0}_{2L \times P}\\ {0}_{P \times 2L} & \mathbf{M} \mathbf{J}_{\sigma^{2}}^{-1} \mathbf{M}^{T} \end{array} \right]. $$
Note that the calculation of \(\mathbb {E}\left [\frac {\partial |\hat {b}_{p_{1}}|^{2}}{\partial \sigma ^{2}_{p_{2}}}\right ]\) is not feasible since there is no analytical expression for \(\hat {\mathbf {b}}\). Therefore, the bound for the variance estimation cannot be found. However, the bound for the channel estimation is given from (29) and (35) by
$$ \text{cov}\left(\left[\hat{h}_{1}^{R}, \hdots, \hat{h}_{L}^{R}, \hat{h}_{1}^{I}, \hdots, \hat{h}_{L}^{I}\right]^{T}\right) \geq \mathbf{J}_{h}^{-1}. $$

Recall that this bound has been derived by using the result of Proposition 2 showing that \(\hat {\mathbf {h}}\) was unbiased.

7 Simulation results

In order to validate the results, computer simulations were carried out in accordance with the IEEE 802.11g standard, with a carrier frequency equal to 2.4 GHz. The system parameters used for the simulation are as follows: N=64 subcarriers, a bandwidth of 20 MHz, and a cyclic prefix of length 16. The discrete-time channel is assumed to have L=6 channel taps modelled with a Rayleigh channel with an exponentially decaying power such that \(\mathbb {E}[\lvert h_{l}\rvert ^{2}]=\sigma ^{2}_{h}\cdot \exp (-l)\) with l=1,2,,L=6. The constant \(\sigma ^{2}_{h}\) is chosen to normalize the channel power to one. For the simulation, the pilots are evenly inserted every eight subcarriers, yielding P=8. A frame of K=4 OFDM symbols is assumed. Note that K<L with these considered values.

It is also assumed that two contiguous pilot subcarriers are affected by NBI, by adding a Gaussian disturbance of variance \(\sigma _{\text {NBI}}^{2}\) to both subcarriers. The signal-to-noise ratio (SNR) is defined as \(10\log \frac {1}{\sigma _{\text {AWGN}}^{2}}\) where the power of the signal is normalized to one, and the signal-to-interference ratio (SIR) is defined as \(10\log \frac {1}{\sigma _{\text {NBI}}^{2}}\). The accuracy of the channel estimates is measured in terms of the mean square error (MSE), which is defined as \(\frac {1}{L}\mathbb {E} \left [(\hat {\mathbf {h}} - \mathbf {h})^{H} (\hat {\mathbf {h}} - \mathbf {h}) \right ]\), where the expectation is estimated via Monte Carlo simulations.

Figure 3 shows the MSE as a function of the SNR when the SIR is fixed to − 5 dB. Four iterations are employed. Let us recall that the first iteration is the ordinary least-squares estimate (OLSE). For reference, the CRB calculated in Section 6 is added. It is seen that the algorithm converges after four iterations for the SNR ranging from 0 to 30 dB and nearly attains the CRB, whereas the MSE for the OLSE has a floor at 2.5×10−2. This shows that the ML algorithm is well conditioned for the considered SNR range [0,50] dB.
Fig. 3

Performance of the CML estimator as a function of the SNR for SIR=− 5 dB, K=4

Next, the SIR is fixed to 0 dB in Fig. 4 and to 5 dB in Fig. 5. In these two cases, the algorithm converges after only three iterations. Moreover, it can be observed that the performance of the OLSE approaches that of the second iteration when the SIR increases. This makes sense, as the OLSE is designed to perform well without NBI.
Fig. 4

Performance of the CML estimator as a function of the SNR for SIR=0 dB, K=4

Fig. 5

Performance of the CML estimator as a function of the SNR for SIR=5 dB, K=4

To investigate the impact of the number of OFDM symbols on the performance, K is now fixed to 8 and the other parameters are the same. The results are shown in Fig. 6 for an SIR=− 5 dB. Compared to K=4, it can be observed that the convergence is faster, since only three iterations are needed to converge. This is understood since it is a more favorable scenario (Fig. 6).
Fig. 6

Performance of the CML estimator as a function of the SNR for SIR=− 5 dB, K=8

Finally, the bias of \(\hat {\mathbf {h}}\) is studied. We plot the histogram of the real part and imaginary part of \(\hat {b}_{p}\), p=1,…,P in Figs. 7 and 8, respectively. The SNR is fixed to 10 dB and the SIR to 0 dB. It can be observed that the mean is zero for all p, which leads to an unbiased estimator for \(\hat {\mathbf {h}}\). This confirms Proposition 2).
Fig. 7

Histogram of \(\hat {b}^{R}_{p}\), p=1,…,P, SNR = 10 dB, SIR = 0 dB

Fig. 8

Histogram of \(\hat {b}^{I}_{p}\), p=1,…,P, SNR = 10 dB, SIR = 0 dB

8 Conclusions

This article has addressed the problem of maximum likelihood channel estimation for OFDM systems in the presence of unknown interference. First, it was proved that the solution is without ambiguities as long as the number of transmitted OFDM symbols is strictly greater than one. For this case, we proposed using the conditional maximum likelihood (CML) algorithm to obtain the estimates. New theoretical developments of the CML algorithm in this context have been presented. It was proved that the solution provided by the CML is an algebraic function of the data. Furthermore, it was also proved that the channel estimator is unbiased.

9 Appendix 1: Proof of Proposition 1

The proof of Proposition 1 1) is a consequence of the following general results. We have for any invertible matrix Σ that
$$\begin{array}{@{}rcl@{}} (\mathbf{I}-\Pi(\Sigma))=(\mathbf{I}-\Pi(\Sigma))(\mathbf{I}-\Pi). \end{array} $$
The latter equality is equivalent to
$$\begin{array}{@{}rcl@{}} \Pi=\Pi(\Sigma)\Pi. \end{array} $$
Now, it can be easily shown that
$$\begin{array}{@{}rcl@{}} \Pi(\Sigma)\Pi=\mathbf{F}\left(\mathbf{F}^{H}\Sigma\mathbf{F}\right)^{-1}\mathbf{F}^{H}\Sigma\mathbf{F}\left(\mathbf{F}^{H}\mathbf{F}\right)^{-1}\mathbf{F}^H =\Pi. \end{array} $$
The proof of 2) is a consequence of the following general result:
$$\begin{array}{@{}rcl@{}} \left(\mathbf{F}^{H}\Sigma^{-1}\mathbf{F}\right)^{-1}=\mathbf{F}^+\Sigma\left(\mathbf{F}^{H}\right)^+ \end{array} $$
where F +=(F H F)−1 F H and (F H )+=F(F H F)−1. From this, we have
$$ \Pi\left(\Sigma^{-1}\right)=\mathbf{F}\mathbf{F}^{+}\Sigma\left(\mathbf{F}^{H}\right)^{+}\mathbf{F}^{H}\Sigma^{-1} =\Pi\Sigma\Pi\Sigma^{-1}. $$

The proof of the remaining assertions is straightforward.

10 Appendix 2: Probability distribution function of the sample mean and the sample variance

Cochran’s theorem [21] states that the sample mean and variance are two independent random variables. Moreover, it is also stated that the sample variance of K independent normally distributed real random variables with mean 0 and standard deviation 1 has a chi-square distribution with K−1 degrees of freedom. Therefore, we can write down the distribution for \(s_{p}^{2}\). The distribution for \(\bar {y}_{p}\) is straightforward:
$$\begin{array}{@{}rcl@{}} \bar{y}_{p} &\sim& \mathcal{C} \mathcal{N} \left(\mathbf{F}_{p} \mathbf{h}, \frac{\sigma^{2}_{p}}{K}\right), \end{array} $$
$$\begin{array}{@{}rcl@{}} s_{p}^{2} &\sim& \frac{\sigma^{2}_{p}}{2K}\chi^{2}_{2(K-1)}. \end{array} $$

Here we find 2(K−1) degrees of freedom since the considered random variables are complex.

11 Appendix 3: Proof of Proposition 2

The Gaussian vector b (1), defined in (19), is zero mean with the covariance matrix K −1(IΠ)D(σ 2 ), i.e., \((\mathbf {I}-\Pi)\bar {\mathbf {y}} \sim (\mathbf {I}-\Pi)K^{-1/2}\mathbf {D}(\boldsymbol {\sigma })\mathcal {N}(0,\mathbf {I})\).

The components of the vector 2K s 2 are independent with the distribution \(\left (\sigma _{1}^{2}\chi _{2(K-1)}^{2}(1), \ldots, \sigma _{P}^{2}\chi _{2(K-1)}^{2}(P)\right)\). Here \(\chi _{2(K-1)}^{2}(1), \ldots,\chi _{2(K-1)}^{2}(P)\) are i.i.d. with the common distribution \(\chi _{2(K-1)}^{2}\). From Cochran’s theorem, s 2 and \(\bar {\mathbf {y}}\) are independent. From Proposition 1, the random vector b (i+1)=φ (i)(b (1),s 2 ) is a rational function having a positive denominator. Therefore,
$$\begin{array}{@{}rcl@{}} &&\mathbb{E}\left(\mathbf{b}^{(i+1)}|\mathbf{s}^{\mathbf{2}}\right)=\mathbb{E}\left(\varphi^{(i)}\left(\mathbf{b}^{(1)},\mathbf{s}^{\mathbf{2}}\right)|\mathbf{s}^{\mathbf{2}}\right)\\ &&=\int \varphi^{(i)}\left(\mathbf{b}, \mathbf{s}^{\mathbf{2}}\right) f_{\mathbf{b}^{(1)}}(\mathbf{b})d\mathbf{b}=0 \end{array} $$

because b (1) is zero mean which implies that its probability density function \(\mathbf {b}\to f_{\mathbf {b}^{(1)}}(\mathbf {b})\) is even and bφ (i)(b,s 2 ) is odd. Then, using the basic property \(\mathbb {E}(\mathbb {E}(X|Y))=\mathbb {E}(X)\), where X and Y are random variables, we obtain (28).

12 Appendix 4: The Fisher information matrix

To facilitate the calculations, the negative log-likelihood is rewritten using real numbers:
$$ \begin{aligned} \ell\left(\mathbf{h}, \boldsymbol{\sigma}^{\mathbf{2}}\right)&= K\sum_{p=1}^{P}\ln\left(\sigma^{2}_{p}\right)+ \sum_{k=1}^{K}\sum_{p=1}^{P}\\&\quad\times\frac{\left(y_{p,k}^{R}-H_{p}^{R}\right)^{2}+\left(y_{p,k}^{I}-H_{p}^{I}\right)^{2}}{\sigma_{p}^{2}}. \end{aligned} $$
First, we define the 2P×2L matrix G:
$$ \mathbf{G} = \left[ \begin{array}{cc} \mathbf{F}^{R} & -\mathbf{F}^{I}\\ \mathbf{F}^{I} & \mathbf{F}^{R} \end{array}\right], $$
and we write \(H_{p}^{R}\) and \(H_{p}^{I}\) as functions of \(\left [h_{1}^{R}, \hdots, h_{L}^{R}, h_{1}^{I}, \hdots, h_{L}^{I}\right ]\):
$$\begin{array}{@{}rcl@{}} H_{p}^{R}&=&\mathbf{G}_{p} \cdot \left[ h_{1}^{R}, \hdots, h_{L}^{R}, h_{1}^{I}, \hdots, h_{L}^{I}\right]^{T}, \\ H_{p}^{I}&=&\mathbf{G}_{P+p}\cdot \left[h_{1}^{R}, \hdots, h_{L}^{R}, h_{1}^{I}, \hdots, h_{L}^{I}\right]^{T}. \end{array} $$
where G p means the pthrow of G. Let g p,q be the (p,q)th entry of G. The derivatives of (h,σ 2) with respect to h R ,h I , and σ 2 are
$$\begin{array}{@{}rcl@{}} \partial_{h_{l}^{R}}\ell\left(\mathbf{h}, \boldsymbol{\sigma}^{2}\right) &=& -2K\sum_{p=1}^{P} \frac{g_{p,l}b_{p}^{R}+ g_{P+p,l}b_{p}^{I}}{\sigma_{p}^{2}},\\ \partial_{h_{l}^{I}}\ell\left(\mathbf{h}, \boldsymbol{\sigma}^{2}\right) &=&-2K\sum_{p=1}^{P} \frac{g_{p,L+l}b_{p}^{R}+ g_{P+p,L+l}b_{p}^{I}}{\sigma_{p}^{2}},\\ \partial_{\sigma_{p}^{2}}\ell\left(\mathbf{h}, \boldsymbol{\sigma}^{2}\right) &=&\frac{K}{\sigma_{p}^{2}}-K \frac{s^{2}_{p}+|b_{p}^{R}|^{2}+|b_{p}^{I}|^{2}}{\sigma^{4}_{p}}, \end{array} $$
which yields
$$ \partial_{h_{l_{1}}^{R}h_{l_{2}}^{R}}^{2}\ell\left(\mathbf{h}, \boldsymbol{\sigma}^{2}\right) = 2K\sum_{p=1}^{P} \frac{g_{p,l_{1}}g_{p,l_{2}}+g_{P+p,l_{1}}g_{P+p,l_{2}}}{\sigma_{p}^{2}} $$
$$ \partial_{h_{l_{1}}^{R}h_{l_{2}}^{I}}^{2}\ell\left(\mathbf{h}, \boldsymbol{\sigma}^{2}\right) = 2K\sum_{p=1}^{P} \frac{g_{p,l_{1}}g_{p,L+l_{2}}+g_{P+p,l_{1}}g_{P+p,L+l_{2}}}{\sigma_{p}^{2}} $$
$$ \begin{aligned} \partial_{h_{l_{1}}^{I}h_{l_{2}}^{I}}^{2}\ell\left(\mathbf{h}, \boldsymbol{\sigma}^{2}\right) &= 2K\sum_{p=1}^{P}\\ &\quad\times\frac{g_{p,L+l_{1}}g_{p,L+l_{2}}+g_{P+p,L+l_{1}}g_{P+p,L+l_{2}}}{\sigma_{p}^{2}} \end{aligned} $$
$$ \partial_{h_{l}^{R}\sigma_{p}^{2}}^{2}\ell\left(\mathbf{h}, \boldsymbol{\sigma}^{2}\right) =2K \frac{g_{p,l}b_{p}^{R}+g_{P+p,l}b_{p}^{I}}{\sigma_{p}^{4}} $$
$$ {\begin{aligned} \partial_{\sigma_{p_{1}}^{2}\sigma_{p_{2}}^{2}}^{2}\ell\left(\mathbf{h}, \boldsymbol{\sigma}^{2}\right) = \left(-\frac{K}{\sigma_{p_{1}}^{4}}+ 2K\frac{s^{2}_{p_{1}}+|b_{p_{1}}^{R}|^{2}+|b_{p_{1}}^{I}|^{2}}{\sigma^{6}_{p_{1}}}\right)\delta_{p_{1}}^{p_{2}}. \end{aligned}} $$

Now, the expectation needs to be taken to find J. The expectations of (41), (42), and (43) are unchanged. From the definition of \(b_{p}=\bar {y}_{p} - H_{p}\) and (38), the expectation of (44) is 0, and from (39), the expectation of (45) is \(\frac {K}{\sigma ^{4}_{p_{1}}}\delta _{p_{1}}^{p_{2}}\).



This works has been carried out in the framework of The ELSAT2020 project which is co-financed by the European Union with the European Regional Development Fund, the French state, and the Hauts de France Region Council. This work has been supported in part (or partially) by IRCICA USR 3380 CNRS-Univ (project: connected objects), F-59000 Lille, France (


We confirm that we do not have a funding source.

Authors’ contributions

AD and EPS both worked on the calculation of the theoretical results. EPS made the matlab programs. Both authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

Laboratoire Paul Painlevé, USTL-UMR-CNRS 8524. UFR de Mathématiques, Bât. M2
IEMN lab, TELICE group, University of Lille


  1. K Ohno, T Ikegami, Interference mitigation technique for coexistence of pulse-based UWB and OFDM. EURASIP J.Wirel. Commun. Netw. 2008(1), 285683 (2008).View ArticleGoogle Scholar
  2. J-W van Bloem, R Schiphorst, T Kluwer, C Slump, in Wireless Communications, Networking and Mobile Computing (WiCOM), 2012 8th International Conference on. Interference measurements in IEEE 802.11 communication links due to different types of interference sources, (Shangai, 2012), pp. 1–6.Google Scholar
  3. A Coulson, Narrowband interference in pilot symbol assisted OFDM systems. IEETrans, E,Wireless Commun. 3(6), 2277–2287 (2004).View ArticleGoogle Scholar
  4. AJ Coulson, Bit error rate performance of OFDM in narrowband interference with excision filtering. IEEE Trans Wireless Commun. 5(9), 2484–2492 (2006).View ArticleGoogle Scholar
  5. A Jeremic, T Thomas, A Nehorai, OFDM channel estimation in the presence of interference. IEEE Trans. Signal Process. 52(12), 3429–3439 (2004).MathSciNetView ArticleMATHGoogle Scholar
  6. J Zhou, J Qin, Y-C Wu, Variational inference-based joint interference mitigation and OFDM equalization under high mobility. IEESignal Signal Process. Lett.22(11), 1970–1974 (2015).View ArticleGoogle Scholar
  7. S Lee, K Kwak, J Kim, D Hong, Channel estimation approach with variable pilot density to mitigate interference over time-selective cellular OFDM systems. IEEE Trans. Wireless Commun.7(7), 2694–2704 (2008).View ArticleGoogle Scholar
  8. Y Zhang, X Zhang, D Yang, in Wireless Communications and Networking Conference (WCNC), 2013 IEEE. A robust least square channel estimation algorithm for OFDM systems under interferences (Shangai, 2013), pp. 3122–3127.Google Scholar
  9. T Li, W-H Mow, V Lau, M Siu, R Cheng, R Murch, Robust joint interference detection and decoding for OFDM-based cognitive radio systems with unknown interference. IEEE J.Sel. Areas Commun. 25(3), 566–575 (2007).View ArticleGoogle Scholar
  10. M Morelli, M Moretti, Robust frequency synchronization for OFDM-based cognitive radio systems. IEEE Trans. Wireless Commun.7(12), 5346–5355 (2008).View ArticleGoogle Scholar
  11. M Morelli, M Moretti, Channel estimation in OFDM systems with unknown interference. IEEE Trans. Wireless Commun.8(10), 5338–5347 (2009).View ArticleGoogle Scholar
  12. X Mestre, JR Fonollosa, ML approaches to channel estimation for pilot-aided multirate DS/CDMA systems. IEEE Trans. Signal Process.50(3), 696–709 (2002).View ArticleGoogle Scholar
  13. RT Behrens, LL Scharf, Signal processing applications of oblique projection operators. IEEE Trans. Signal Process.42(6), 1413–1424 (1994).View ArticleGoogle Scholar
  14. RT Behrens, LL Scharf, Corrections to “signal processing applications of oblique projection operators” [correspondence]. IEEE Trans. Signal Process.44(5) (1300).Google Scholar
  15. B Cao, Q-Y Zhang, L Jin, N-T Zhang, Oblique projection polarization filtering-based interference suppressions for radar sensor networks. EURASIP J. Wirel. Commun. Netw.2010(1), 605103 (2010).View ArticleGoogle Scholar
  16. W Qiu, SK Saleem, E Skafidas, Identification of MIMO systems with sparse transfer function coefficients. EURASIP J. Adv. Signal Process. 2012(1), 104 (2012).View ArticleGoogle Scholar
  17. H Hartley, K Jayatillake, Estimation for linear models with unequal variances. J. Am. Stat. Assoc. 68(341), 189–192 (1973).View ArticleMATHGoogle Scholar
  18. EB Andersen, Asymptotic properties of conditional maximum-likelihood estimators. J. R. Stat. Soc. Ser. B Methodol.283–301 (1970).Google Scholar
  19. TH Szatrowski, Necessary and sufficient condition for explicit solutions in the multivariate normal estimation problem for patterned means and covariances. Ann. Stat. 8:, 802–810 (1980).MathSciNetView ArticleMATHGoogle Scholar
  20. AP Dempster, NM Laird, DB Rubin, Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B Methodol.1–38 (1977).Google Scholar
  21. WG Cochran, in Mathematical Proceedings of the Cambridge Philosophical Society, vol. 30, no. 02. The distribution of quadratic forms in a normal system, with applications to the analysis of covariance (Cambridge Univ Press, 1934), pp. 178–191.Google Scholar


© The Author(s) 2017