EURASIP Journal on Applied Signal Processing 2005:12, 1886–1901 c ○ 2005 Hindawi Publishing Corporation Instantaneous Frequency Estimation Using Stochastic Calculus and Bootstrapping

Stochastic calculus methods are used to estimate the instantaneous frequency of a signal. The frequency is modeled as a polynomial in time. It is assumed that the phase has a Brownian-motion component. Using stochastic calculus, one is able to develop a stochastic differential equation that relates the observations to instantaneous frequency. Pseudo-maximum likelihood estimates are obtained through Girsanov theory and the Radon-Nikodym derivative. Bootstrapping is used to find the bias and the confidence interval of the estimates of the instantaneous frequency. An approximate expression for the Cramér-Rao lower bound is derived. An example is given, and a comparison to existing methods is provided.


INTRODUCTION
Over the past several years there has been a marked increase in the application of coherent signal and image processing. Coherent processing requires an accurate estimate of the phase (Blackledge [1]). Examples where coherent processing is required include synthetic aperture radar (SAR), synthetic aperture sonar, adaptive beam forming, acoustic imaging, projection and diffraction tomography, adaptive optics, magnetic resonance imaging (MRI), and inverse synthetic aperture radar (ISAR) (see Goldstein et al. [2], Song et al. [3], Ghiglia and Pritt [4]). Phase estimation also has many applications in the areas of radar (Wehner [5]) and communications (Li [6]) and others. The instantaneous frequency (IF) is defined as the derivative of the phase.
There are mainly two approaches for the estimation of the IF of the time-varying frequency signals; parametric and nonparametric (Chen and Ling [7]). In the nonparametric approach, if the signal has frequency contents that are rapidly changing over time, one of the most popular approaches is to use short-time Fourier transform (STFT). In this approach, the signal is divided into overlapping segments. The fast Fourier transform (FFT) is then applied to each segment. By observing the evolution of the frequency components over time, one should be able to estimate both the frequency components and their magnitudes (Wehner [5]). This approach suffers from the poor resolution of the FFT, and the large amount of needed data to get reasonable results. Nevertheless, the STFT gives us some idea about the complexity of the data and suggests probable models for the IF.
The Wigner-Ville distribution has proven to be the most suitable nonparametric approach to handle instantaneous frequency estimation (IEEE [8]). Several modifications have been suggested, that concentrate on the choice of the window (see Katkovnik and Stankovic [9], Barkat [10]). Still, a major requirement of the Wigner-Ville distribution is the large size of the data, a situation that is not available in some applications.
In the parametric approach, the phase is modeled as an autoregressive (AR) process, moving average (MA) process, or a polynomial (Benidir and Ouldali [11]). The coefficients of the model are estimated adaptively using the least mean square algorithm (LMS), the recursive least square (RLS) algorithm, phase-locked loop (PLL), or others such as hidden Markov models (see Goto et al. [12], IEEE [8]). For large data sizes, however, the accuracy of the parametric methods was shown to be less than the nonparametric methods. Obviously if the parametric model is accurate, then it should outperform any nonparametric method. The parametric methods on the other hand work better for small-size data (Boashash [13]).
In many applications in communications, the signal could be modeled as higher-order polynomial phase signals with constant amplitudes. Conventional power spectrum methods are ineffective in handling these kinds of signals. For example, the Wigner-Ville distribution (WVD) can handle polynomial phases with a maximum order of two. The polynomial phase transform was introduced to tackle this problem (see Peleg and Porat [14], Peleg and Friedlander [15], Porat and Friedlander [16]). Another suboptimal algorithm was introduced to estimate the polynomial phase (Golden and Friedlander [17]). The major problems with these two algorithms are the required large number of data, and the needed, relatively, high SNR (30 dB and above) to obtain good estimates.
In this paper, we introduce a new parametric approach that is based on the stochastic (Ito) calculus. Concepts from stochastic or Ito calculus (Kloeden and Platen [18]) are used to set up the phase estimation problem. Assuming that the phase could be modeled as a polynomial in time with a scaled additive Brownian motion, which is a common model for random media transmission, one could develop a stochastic differential equation (SDE) for the observations. Other models for the phase or the amplitude could be tackled with the same approach. For this model of the phase, the SDE is a linear equation in the unknown coefficients of the polynomial phase but nonlinear in the other parameters. This way a nonlinear estimation problem has been transformed into a partially linear estimation problem. Using the Girsanov theory, and assuming complete knowledge of the other parameters of the signal, one is able to find an exact expression for the likelihood function of the observations. This is actually a pseudo-likelihood function. Maximizing this pseudolikelihood function with respect to the unknown coefficients of the phase, one is able to find a closed form solution for these coefficients. Simulation-based methods (McFadden [19]) are used to estimate the other parameters of the signal.
Moreover, approximate statistical properties of the pseudo-maximum likelihood estimates of the coefficients of the phase were obtained in a straightforward fashion. Approximate expressions for the Cramér-Rao lower bounds for the variances of the estimates were easily obtained; another advantage of using Ito calculus. While many estimation problems could be cast in a discrete time setting, there is no equivalent in the discrete time to Ito's lemma; a fundamental result in stochastic calculus. Thus, one has to use continuous time models in order to take advantage of the Ito calculus properties. "It should be added that in some cases it is easier, at first, to study the continuous analog of problems formulated for discrete time, and use the results obtained in the solution of the latter" (Lipster and Shiryayev [20, Volume I]). In some cases, however, using stochastic calculus would be cumbersome and the discrete methods of time series analysis would be much easier to use.
In the given examples, the required SNR to get good estimates of the phase is more than 10 dB. Using the bootstrapping method, one was able to obtain good results at lower SNR. Bootstrap is a method that handles small samples and yields good estimates for the bias and the confidence intervals for the estimates (see Efron and Tibshirani [21], Zaman [22], Politis [23], Zoubir and Boashash [24]). It was not meant to be a method to improve the accuracy of the estimates. Using the peak of the histogram, generated from the bootstrapping method for each estimated parameter, as the desired estimate of the unknown parameter, one is able to get good estimates at lower SNR. This was observed in this report and in some other applications (see Souza and Neto [25], Abutaleb [26]). Through the bootstrapping method, one is able to generate a large number of samples from the measured small sample data. Bootstrapping amounts to treating the observed samples as if they exactly represent the whole population. In nonparametric bootstrapping, drawing from this sample space, at random with replacement, will generate as many samples as desired. In parametric bootstrapping, a model is used to generate as many samples as desired. The bootstrapping method found many applications in the area of statistical signal processing (Zoubir and Boashash [24]). This paper is divided as follows: Section 2 describes the mathematical formulation of the phase estimation problem and an approximate maximum likelihood method is described. We also describe in this section the method of stochastic annealing. In Section 3, the proposed stochasticcalculus-based approach is introduced and the statistical properties of the estimates are given. A discussion of the bootstrapping method is also given. In Section 4, we present simulation results and comparison to the approximate maximum likelihood method of Section 2. In Section 5, summary and conclusions are given.

PROBLEM FORMULATION
The real signal, z(t), with one phase component could be modeled as follows: where I is the order of the polynomial, A(t) is the amplitude, φ(t) is the phase, B(t) is a Brownian motion, γ is an unknown constant, and ε(t) is an additive white noise process of unknown variance σ 2 . The one phase component could represent the radar tracking of just one target, that is, maneuvering. It could also represent the tracking of one target in a sequence of images. For constant phase component and polynomial frequency we have In this paper we will be concerned with the situation where A(t) = 1, and the error/noise term ε(t) is imbedded in the random part of the phase, that is, we are interested in the model The case where the amplitude is not equal to 1 could be treated in the same context. We basically estimate the amplitude first by taking, for example, the average of the max and the min of the signal. Other models for the phase such as the Ornstein-Uhlenbeck process, and other models for the amplitude could also be tackled with the proposed stochastic-calculus-based approach. This will be the subject of other reports (Abutaleb [27]). The case where we have more than one stochastic IF is currently under investigation using what is called the Malliavin calculus (see Abutaleb and Papaioannou [28], Nualart [29]).

Approximate maximum likelihood method
To estimate the unknown coefficients, f i , one could ignore the Brownian motion term, γB(t), and use nonlinear least square methods (Goto et al. [12]), the genetic algorithms (Abutaleb [30]), or stochastic annealing (Kloeden and Platen [18]). Specifically, in the least square approach, the instantaneous phase φ LS (t) is assumed to be pure polynomial in the frequency, that is, where One could then find the unknowns by minimizing the sum of squared error, {z(t) − sin[φ LS (t)]} 2 . While this approach is approximate, it gives an indication about the estimates of the unknown parameters that could be used later on to guide the search area with numerical methods. The above approach is a simplification to the true problem since the correct expression should be and when γ → 0, we get the approximate expression that is, where ε(t) ≈ cos[φ LS (t)][γB(t)], that is, the error is function of the unknown parameters.
In discrete time, one could write (8) as where , ∆ is the sampling interval, and w( j) is Gaussian with zero mean and variance 1.
Rearranging (10) we get and for time (k + 1) we get Subtracting the expressions at k and at (k + 1) we get Rearranging (13) we get z(k + 1) = sin φ LS (k + 1) The conditional probability density function (pdf) f [z(k + 1)|z(k)] is thus given by The likelihood function, which is the joint pdf of the observations, is thus given by and the log likelihood function λ, ignoring the initial condi-tions, is given by Substituting for the expression in (15) into (17), we get Maximizing the log likelihood function of (18) with respect to the unknown coefficients, f i and γ, yields the desired estimates.
Due to the nonlinearities in the log likelihood function, one would expect the existence of several local minima and we should use global optimization methods. Thus, it is proposed that the frequencies and γ be estimated through the stochastic annealing method, which is known to have good numerical properties. Other methods such as genetic algorithms could also be used.

Parameter estimation through stochastic annealing
In the stochastic annealing method, for each unknown parameter we develop a stochastic equation. The solution of which is the maximum likelihood estimate of the parameter. This is explained in what is to follow.
The stochastic differential equation (SDE) of the unknown parameters is given by (19) which has the discrete form where B f (t) is a vector Brownian motion, ∆B f (k) is a vector white Gaussian noise. Each element has zero mean and variance ∆, ∆ is the sampling interval. ∇λ( f (k), γ(k)) is the gradient of λ( f , γ) with respect to f . A similar equation for γ is used: where B γ (t) is Brownian motion, and ∇λ[ f (t), γ(t)] is the gradient of λ( f , γ) with respect to γ, and is defined as The variance is defined as The choice above for σ(t) is shown to give quick convergence (Kloeden and Platen [18]). Other choices for σ(t) are also possible.

The Wigner-Ville distribution
The most common nonparametric method to estimate the instantaneous frequency is through the Wigner-Ville distribution (WVD) and its modifications. The WVD(t, ω) at instant t and frequency ω of a signal z(t) and its complex conjugate z * (t) is defined as (Chen and Ling [7]) which has the discrete form (Boashash [13]) where M = (N − 1)/2, and N is the number of data points.
For a sampling interval ∆, the discrete WVD is given as (Boashash and Black [31]) where M = (N − 1)/2, and N is the number of data points. Both formulae (25) and (26) are used in the literature.
The instantaneous frequency f (n) derived via the first moment of the discrete WVD is given by In Section 4, the instantaneous frequency of (27) is compared to the true values. It is shown that they are far apart. Thus, the WVD approach was not used for further comparisons.

The Hilbert transform and the analytic function
A given real signal z(t) can be used to construct a complex waveform, z a (t), as where z(t) is the Hilbert transform of z(t) and is given as (Cohen [32]) The complex waveform, z a (t), is the analytic representation of z(t) and could be calculated either in the time domain using a finite impulse response (FIR) filter or in the frequency domain as follows (Boashash [33]).
The phase angle is then calculated and fitted to the assumed model of the phase. The phase angle, φ Analytic , is defined as In Section 4, the analytic phase, φ Analytic of (31), is estimated and compared to the true value. It is shown that they are far apart. Thus, the analytic signal approach was not used.

ITO CALCULUS FOR PHASE ESTIMATION: THE PROPOSED APPROACH
In this section, we introduce the stochastic-calculus-based approach for phase estimation. It is assumed that the randomness in the signal is due to the Brownian-motion component in the phase. Other models of the signal could also be treated in the same way.

Derivation of the SDE for the observation
As in Section 2, the received signal is modeled as where it is assumed that the additive error is imbedded in the random component (Wiener process or Brownian motion) where dx(t) is the stochastic differential for x(t), and dB(t) is what is known as white Gaussian noise with zero mean and variance dt. We need to find a stochastic differential equation (SDE) for dz(t). In order to do that, we first introduce the general form for SDE and for nonlinear transformation. The general form for a stochastic differential equation is In our case, (33), a(t, x(t)) = 0, and b(t, x(t)) = 1. Let U 1 (t, x(t)) be a nonlinear transformation of x(t), that is, z(t) = U 1 (t, x(t)), using Ito lemma (Kloeden and Platen [18]) we obtain In our case, Substituting the expressions of (37) into (35) we get where we chose cos x = (1 − sin 2 x). Nevertheless, one has to choose the right sign of the cos(x) depending on which quadrant lies x. Let and let Then (38) could be written compactly as The stochastic process z(t) has the following SDE form: where Notice that the SDE of z(t) turned out to be linear in the unknown frequency components, f . This is one advantage of using Ito calculus. Other forms of the phase may not yield this nice linear property however. One could stop here and derive an approximate expression for the likelihood function of the observations as we did in Section 2. Instead we go through the rigorous derivation of the exact likelihood function.

The pseudo-maximum likelihood estimates of the unknown parameters
Let the unknown parameters be defined as the vector θ = [ f T γ] T , and then the SDE of the observations, dz(t), is rewritten as According to the Girsanov theory and for a known γ, the Radon-Nikodym derivative which is the likelihood function, L, is given by (see Lipster and Shiryayev [20], Oksendal [34], Prakasa Rao [35]) and the log likelihood function λ = ln L becomes Substituting the expressions for c[t, z(t); θ] and e[t, z(t); γ] into (46), we get the log likelihood function as Equation (45) is a true likelihood function as long as γ is known. And since γ will be replaced by its estimate γ, (45) represents a pseudo-likelihood function. Maximizing the log of the pseudo-likelihood function λ = ln L with respect to the unknown vector f and assuming estimate γ, one could get the pseudo-maximum likelihood estimate of f as follows: This is reduced to Finally the pseudo-maximum likelihood estimate of f is given as Substituting for dz(s) in (50) we get To estimate the unknown parameter γ, one has to go through another route. This is explained in Section 3.2.3.

Another derivation for the pseudo-maximum likelihood estimates
In the previous subsection, the pseudo-maximum likelihood estimates were obtained using the observed data sequence, z(t). If one is able to find an initial guess for γ, using, for example, the approximated method of Section 2, one could apply a transformation to the data and derive an easier method to find the pseudo-maximum likelihood estimate of the vector f . This approach, however, is tricky. It is always better to work with the original data. The simulations showed the accuracy of the approach presented in Section 2.2 compared to the one presented here. We present the derivation any way because we will need some of its results when calculating the Cramér-Rao lower bound. To find the pseudo-maximum likelihood estimates of the unknown parameters, it will be easier if the coefficient of the Brownian motion is unity. Thus, we need to find another transformation y(t) = U 2 (t, z(t)) such that We pick U 2 (t, z(t)) such that and thus, Integrating ∂U 2 /∂z with respect to z(t) we get that is, where we have chosen to set the constant of integration to zero. The expression for g(t, y(t)) is given as follows: Substituting (52) and (53) into (56) we get which is reduced to Substituting (58) into (51), we get the SDE for dy(t) as follows: This form of dy(t) has the coefficients of dB(t) unity. This will make the derivation of the Cramér-Rao lower bound easier and tractable (Oksendal [34]). Remember that f = [ f 1 f 2 · · · f I ] T and a(t) = 2π[1 2t · · · It I−1 ] T . According to the Girsanov theory and for a given γ, the Radon-Nikodym derivative, which is the pseudo-likelihood function, L, is given by (Oksendal [34]) and the log of the pseudo-likelihood function λ = log L is given as Maximizing the log of the pseudo-likelihood function λ = log L with respect to the unknowns f , one could get the pseudo-maximum likelihood estimate of the unknowns. Remember that in this expression, however, we use the sequence of transformed data y(t), not the observed data z(t). Thus, an initial guess for γ must be available.

Simulation-based estimation of γ
Equation (45) is a true likelihood function as long as the diffusion term, e(t, z(t); γ), is completely known, that is, γ is known. In practice, we do not know γ. As a matter of fact it is one of the unknowns that we need to estimate. There are many methods to estimate γ. A popular, yet not accurate, method to estimate γ (Yoshida [36]) is through the observation that One could use this equation or its modifications to find an estimate for γ. Instead we use the more accurate simulationbased method of McFadden [19] to find γ. The following steps describe the algorithm. (1) For a given estimate f , we simulate a sequence of data points through (4) and we calculate the sum of the squared differences between the simulated and the observed data; this is the error criterion. (2) We change γ; using the method of stochastic annealing of Section 2.2, till the sum of squared error is minimized. (3) Use the maximum likelihood method of Section 3.1 with the estimated γ to find a new estimate for f . (4) Go to step (1). (5) Stop when there is no more reduction in the sum of the squared error. Specifically for an estimated f , and for a given γ, simulate the phase φ(t) given by Thus, the simulated signal z(t) is given by The error criterion is defined as

Pseudo-maximum likelihood estimate for f
The pseudo-maximum likelihood estimates of the coefficients of the polynomial phase are obtained as Rearranging we get that is, which is a closed form solution for estimates of the coefficients of the polynomial phase. Remember that

Statistical properties of the frequency estimates and Cramér-Rao lower bound
To find the expected value, the bias, and the Cramér-Rao lower bound on the variance of the estimates f , we assume that γ is known and deterministic. This way a closed form expression could be found. Otherwise the expressions would be complicated and no insight would be gained.

The mean of f
The following is a derivation of an approximate expression for the mean of f . Substitute for the expression of dy(s), of (59), into (68). This yields that is, where A(t) = [ t 0 a(s)a T (s)ds] and as before a(t) = 2π[1 (2t) · · · (It I−1 )] T . Notice that we got the same expression for f (t) in (50b).
Taking the expected value of both sides of (70), and keeping in mind that under mild conditions E{ t 0 a(s)dB(s)} = 0, we get that is, the estimate is unbiased. This is true given that the value of γ is known and deterministic. In reality, one has to estimate γ. In this case, the asymptotic properties of f will be complex to derive and one might end up with a biased estimate.

The variance of f
The following is a derivation of an approximate expression for the variance of f . Since then the variance of f is given as where and since dB(s)dB(u) = dsδ(s − u), then that is, This is true given that the value of γ is known and deterministic. In reality, one has to estimate γ. In this case, the variance of f will be complex to derive.

An approximate Cramér-Rao lower bound on the variance of the frequency estimates
It is always useful to compare the variance of the estimates to the best possible minimum variance. This is the Cramér-Rao lower bound which is derived in the next part. Again, one is able to get a closed form expression given that γ is known and deterministic. Otherwise the expressions would be much more complicated and no closed form solutions could be obtained.
Since as before, where g(t, y(t)) = {(1/γ)a T (t) f }, then according to Lipster and Shiryayev [20,Chapter 7], the Cramér-Rao lower bound for the unbiased estimate of f given γ is And since as before

s)a T (s)ds], and a(t)
that is, the approximate Cramér-Rao lower bound is equal to Comparing (79) and (76), one concludes that, given γ, the variance of the frequency estimate is exactly the Cramér-Rao lower bound; a nice result indeed. This result is expected since the frequency estimate is the maximum likelihood estimate.

Sensitivity of the estimate with respect to γ
We can find the derivative of f with respect to γ, ∂ f /∂ γ, assuming that z(t) is completely known deterministic quantity. The other approach is to find the derivative of the expected value of f with respect to γ, ∂E{ f }/∂ γ, and replace z(t) with its stochastic solution, that is, z(t) = z(t, W(t)). Fortunately, for a given γ, f is unbiased (see (50b) and (71)). Thus, E[ f ] is independent of the estimate of γ. The variance, however, is dependent on the estimated values of γ as shown in (79). Thus, an accurate estimate of γ is needed.

Numerical solution of the maximization problem
The log likelihood function, λ of (47), is a true log likelihood as long as γ is known. λ has to be maximized with respect to the unknowns, f . Since λ has many integral expressions, one has to use numerical solutions to the integral equation. The pseudo-log-likelihood function is discretized using, for example, Euler approximation and the following formulae: where z(i) is the value of z(t) at instant i, h(i, z(i)) is the value of h(s, z(s)) at instant i, and ∆ i is the sampling interval between the ith sample and the (i + 1)th sample. ∆ i is usually constant. Moreover where ∆z(i) = z(i+1)−z(i). Other more accurate but computationally intensive discretization methods are also available (Kloeden and Platen [18]).

Algorithm 1 3.5.1. Proposed Algorithm 1
The steps described in the previous section are enough to yield the pseudo-maximum likelihood estimates of the unknown parameters. We now put these steps in a form of algorithm as shown in Algorithm 1. Algorithm 1 yields accurate estimates of the parameters as long as the signal-to-noise ratio is more than 10 dB. It was observed that using the method of bootstrapping and using the peak of the histogram generated by this method, one is able to get reasonable results for SNR values as low as 0 db. While the method of bootstrapping is used to find an estimate for the bias and an estimate for the confidence interval around each estimated parameter, our observation is that it could also be used to improve the accuracy of the estimated parameters. This is explained in the bootstrapping subsection.

Convergence of the proposed Algorithm 1
One is always interested in using an algorithm that will yield an estimate that converges to the true unknown parameter. Algorithm 1 has this property. In essence, under mild regularity conditions and assuming that the observed process, z(t), is ergodic, and assuming that we have good initial guesses of γ and f , then the proposed steps will yield estimates that are consistent as the number of data points increases to infinity. For more mathematical details and proofs, for a similar problem, and for more on the necessary conditions for consistency see [36].

Bootstrapping and parameter estimation
The bootstrapping methods depend on the notion of a bootstrap sample. The samples are drawn or generated, at random, from the data. Each set of samples has the same size as the original data size. For each set, the model parameters are estimated. From these estimates, a histogram is obtained. If the samples are drawn at random, we have nonparametric bootstrapping. If the samples are generated at random, from a model for the data, we have what is called parametric bootstrapping (see Zaman [22], Efron and Tibshirani [21]). We will be more concerned with the parametric bootstrap.

Parametric bootstrap
In this approach, by maximizing the pseudo-log-likelihood function, and using the simulation-based method, the estimates of the parameters, f 1 , f 2 , . . . , f I , and γ are obtained from the original data. These estimates are called plug-in estimates. The estimates of the Brownian-motion component, B(t), are obtained as follows: and since cos x = 1 − (sin x) 2 , then the equation for z(t) could be changed to which is an equation in the unknown B(t). Solving, we get two values for B(t). Any one of them could be used for studying the statistical properties of B(t). We should expect B(t) to be Brownian motion if the signal model is correct and if the pseudo-maximum likelihood estimates are close to the true values. Otherwise, one has to reconsider the signal model. If it is Brownian motion as expected, we use random number generator to simulate the jth Brownian motion B ( j) (t). We then generate the random samples for bootstrapping using the plug-in estimates and according to the equation where z ( j) (t) is the jth set of observations that were generated by the jth simulated Brownian motion B ( j) (t).
The set of the new observations, z ( j) (t), represents the data set that will be used in the pseudo-maximum likelihood (47) to obtain the jth estimate of the parameters as explained in Section 3.2. This results in an estimate for each of the unknown parameters. We then use random number generator to simulate the ( j + 1)th Brownian motion B ( j+1) (t). Another set of observations, z ( j+1) (t), is generated according to (84), and another bootstrap estimate is obtained and so forth for as many times as we want. After we stop, a histogram is calculated for each parameter from the bootstrap estimates. From this histogram, the final parameter estimate is the value at the peak of the histogram. One could use the average values of the bootstrap estimates as the final estimate of the parameters. In our case, the peak of the histogram proved to be a more reliable estimate than the plug-in estimate.

Model selection and computational complexity
In some applications one is not able to determine the order of the polynomial that represents the phase. In more complicated scenarios, one is faced with several models for the signal, and we have to determine which model is more representative of the data. For a given model, order selection could be based on something like Akaike information criterion (AIC) or another criterion (Zaman [22]). Model selection should be based on physical understanding of the source of the signal. There are, however, several tests to differentiate between one model and another (Zaman [22]). It is clear that the proposed approach is computationally intensive. The numerical integration and the simulationbased method are time consuming. The bootstrapping method is known to require many iterations. The CPU time needed for the proposed approach is more than 100 times more than the approximate maximum likelihood method. The CPU time is mainly needed by the bootstrapping method. Parallel processing will drastically reduce the required time.

SIMULATIONS
The proposed stochastic-calculus-based approach was tested against the approximate maximum likelihood method of Section 2. In the example, a third-order polynomial with SNR gradually decreased from SNR = 20 dB to reach SNR = 0 dB. The SNR is defined as where SP is signal power and NP is noise power defined as follows: and the noise is defined as The stochastic-calculus-based approach had good performance for SNR as low as 10 dB. Adding the bootstrap approach, one is able to obtain good results even at values of the SNR close to 0 dB. The quality of the results are measured by comparing the variances of the estimates using stochastic calculus to the approximate maximum likelihood estimates and to their corresponding approximate Cramér-Rao lower bounds.
Example 1 (third-order polynomial). In this example, 128 points of the noisy measured signal, z(0), z(1), . . . , z(127), were simulated according to the equation The coefficients were taken as f 1 = 2, f 2 = −1, f 3 = −0.5, that is, I = 3. The sampling interval ∆ = 0.01. The variance is set to have SNR values 20 dB (γ = 0.1), 10 dB (γ = 0.3), and 0 dB (γ = 1.8). The values of the parameter γ were changed till we got the desired SNR. The reported SNR is actually the average value of many simulations. A typical set of data is shown in Figure 1a where we plot the phase without Brownian motion [2π 3 i=1 f i t i ], and the phase with Brownian motion [2π 3 i=1 f i t i + γB(t)], with γ ≈ 1.8. This yields SNR = 0 dB, where SNR is as defined in (85), (86), and (87).
In Figure 1b, we show the instantaneous frequency (IF) as estimated using WVD, as described in Section 2, for a sample of a simulated signal with SNR = 0 dB. The simulated signal is as given above. We also show the true noise free IF and the IF estimated through the stochastic-calculus-based method and bootstrapping. Notice the poor performance of the WVD. Obviously, this performance could be improved if we use a proper window or other modifications to the WVD. Thus, we will not pursue the comparison to the WVD in this paper. Instead, the comparison will be limited to the approximated maximum likelihood method and the stochasticcalculus-based method.
Similarly, in Figure 1c we show the phase angle as estimated using Hilbert transform, as described in Section 2, for a sample of a simulated signal with SNR = 0 dB. We also show the true noise free phase. Notice the poor performance of the Hilbert-transform-based method. Obviously, this performance could be improved if we use a proper window or other modifications to the Hilbert transform. Thus, we will not pursue the comparison to the Hilbert-transform-based method in this paper. Instead, the comparison will be limited to the approximated maximum likelihood method and the stochastic-calculus-based method.

Proposed Algorithm 2
In Section 3.4 we gave an outline of how to find the pseudo-maximum likelihood estimates of the parameters. This did not include the bootstrapping segment. The proposed Algorithm 2 covers this gap. One could use the average values of the bootstrap estimates as the final estimate of the unknown parameters. The peak, however, seems to yield a more accurate estimate of the unknown parameter. As the number of simulations increases, we get more accurate results. In our case, however, 100 simulations seems to be enough to get good results.
The histograms of the bootstrap estimates of the three parameters, using the stochastic-calculus-based method, are shown in Figure 2. The approximate Cramér-Rao lower bound for the variances, the variances of the estimates of the coefficients of the polynomial phase using stochasticcalculus-based method, and the approximated maximum likelihood method are shown in Figure 3.
The mean square error (MSE) between the estimated parameters and the true parameters is taken as a measure of performance. The MSE for the ith coefficient is defined as where M is the total number of the bootstrap simulations and is equal to 100, a m i is the mth estimate of the ith coefficient which is the peak of the histogram of this coefficient.
In Table 1, we present the log value of the approximated Cramér-Rao lower bound (CRB) and the log of the MSE values for the stochastic-calculus-based method with bootstrapping for SNR value of 20 dB, 10 dB, and 0 dB. Notice how close they are.

Effect of bootstrapping
In  with bootstrap and the proposed stochastic-calculus-based method with bootstrap, are compared. Notice that using the bootstrapping, as expected, lowers the MSE of the estimates.
Notice that the accuracy of the proposed bootstrapping and stochastic-calculus-based method is much higher than the approximate maximum likelihood method. The measure of accuracy is the mean square error of the estimate.

Could we use bootstrapping to improve the performance?
Bootstrap is a method that handles small samples and yields good estimates for the bias and the confidence intervals for the plug-in estimates (see Efron and Tibshirani [21], Zaman [22], Politis [23], Zoubir and Boashash [24]). It is not meant to be a method to improve the accuracy of the estimates. Using the peak of the histogram, generated from the bootstrapping method for each estimated parameter, as the desired estimate of the unknown parameter, however, one is able to get good estimates at lower SNR. This was observed in this report and in some other applications (see Souza and Neto [25], Abutaleb [26]). This could be explained from the fact that the bootstrapping samples are almost the same as Monte Carlo simulations. Thus, instead of having just one data set for the estimation, one has as many samples as needed. More important, bootstrapping provides a reasonably good approximation to the density of the parameter estimator. This is apparent from the histogram of each parameter of Figure 2. If the distribution of each parameter was Gaussian, then one should take the mean of the histogram to be the best estimate of the unknown parame-ter. But as we notice from (71) and Section 3.3, the distribution of the estimates, f , is not Gaussian. This is true since the estimate of γ comes into this equation. Thus, one might expect the density of f to be skewed. This is exactly what we observe in the histograms of Figure 2. Choosing the peak of the histogram, as an estimate for the unknown parameter, is the reason for the improved accuracy and the lowering of the MSE. More discussion on this subject could be found in [22,Chapters 12 and 14]. Again this is not a mathematical explanation for this observation. More theoretical analysis is needed to explain why the peak yielded better estimates. This is currently under investigation. Table 3 lists, for SNR = 0 dB and for a typical group of bootstrapping estimates, the true value of the unknown parameter, the plug-in estimate, the bias, the average, the variance, and the peak of the histogram for each parameter. An estimate of the bias is obtained from the bootstrap estimates. It is defined as follows (Efron and Tibshirani [21, Chapter 10]): bias is equal to the average of the bootstrap estimates of the unknown parameter-the plug-in estimate of the unknown parameter.

SUMMARY
In this paper, the formulation of the instantaneous frequency estimation problem using concepts in stochastic calculus was introduced, and applied to the estimation of a single timevarying phase represented by a polynomial with additive Brownian motion. A stochastic differential equation was developed for the observed signal. This resulted in a linear  equation in some of the unknown parameters of the phase; an advantage of using Ito calculus. A pseudo-likelihood function was obtained using the Girsanov theory. It was shown that the pseudo-likelihood function is quadratic in some of the unknown coefficients of the phase. This made it possible to find an exact expression for the estimates of the parameters of the polynomial phase; another advantage of the Ito calculus. The statistical properties of the estimates were also derived in a straightforward manner and an approximate expression for the Cramér-Rao lower bound was ob-tained. A simulation-based method was used to, numerically, obtain estimates of the rest of the unknown parameters. Bootstrapping method was introduced and was used with stochastic annealing. This way, good estimates were obtained; the mean square error of each estimate is close to the approximate Cramér-Rao lower bound. Simulations proved the superior performance of the proposed approach, and a comparison was given with the performance of the Wigner-Ville distribution and the Hilbert-transform-based results.