Efficient phase estimation for the classification of digitally phase modulated signals using the cross-WVD: a performance evaluation and comparison with the S-transform

This article presents a novel algorithm based on the cross-Wigner-Ville Distribution (XWVD) for optimum phase estimation within the class of phase shift keying signals. The proposed method is a special case of the general class of cross time-frequency distributions, which can represent the phase information for digitally phase modulated signals, unlike the quadratic time-frequency distributions. An adaptive window kernel is proposed where the window is adjusted using the localized lag autocorrelation function to remove most of the undesirable duplicated terms. The method is compared with the S-transform, a hybrid between the short-time Fourier transform and wavelet transform that has the property of preserving the phase of the signals as well as other key signal characteristics. The peak of the time-frequency representation is used as an estimator of the instantaneous information bearing phase. It is shown that the adaptive windowed XWVD (AW-XWVD) is an optimum phase estimator as it meets the Cramer-Rao Lower Bound (CRLB) at signal-to-noise ratio (SNR) of 5 dB for both binary phase shift keying and quadrature phase shift keying. The 8 phase shift keying signal requires a higher threshold of about 7 dB. In contrast, the S-transform never meets the CRLB for all range of SNR and its performance depends greatly on the signal's frequency. On the average, the difference in the phase estimate error between the S-transform estimate and the CRLB is approximately 20 dB. In terms of symbol error rate, the AW-XWVD outperforms the S-transform and it has a performance comparable to the conventional detector. Thus, the AW-XWVD is the preferred phase estimator as it clearly outperforms the S-transform.


Phase shift keying signals and the problem of phase estimation
Phase shift keying (PSK) is commonly used [1] due to better noise immunity and bandwidth efficiency compared to amplitude shift keying (ASK) and frequency shift keying (FSK) modulations [2]. This is reflected in current wireless communication technologies such as 3G, CDMA, WiMax, WiFi, and the 4G technologies that employ PSK modulation [3]. In addition, digital phase modulation is also used in HF data communication such as in PACTOR II/III, CLOVER 2000, STANAG 4285, and MIL STD 188-110A/B format [4]. The instantaneous information bearing phase (IIB-phase) in the class of PSK signal represents the transmitted symbol, the signal symbol duration, and class of PSK modulation scheme used. This information is useful to classify and demodulate signals.

Phase estimation and signal demodulation
Several phase estimation methods are proposed for PSK signal demodulation, interference cancellation, coherent communication over time-varying channels, and direction of arrival estimation [5][6][7][8][9][10][11][12]. Such phase estimation methods can be classified as coherent and non-coherent detections [13]. The coherent detector is often referred to as a maximum likelihood detector [13]. The term non-coherent refers to a detection scheme where the reference signal is not necessary to be in phase with the received signal. One of the earliest contributions for the phase estimation of binary phase shift keying (BPSK) signal is an optimum phase estimator which derives a reference signal from the received data itself using Costas loop [5]. In [6], an open loop phase estimation method for burst transmission is proposed. The phaselocked loop (PLL) method used in conventional timedivision multiple access system is inefficient due to the very long acquisition time. This problem is resolved using the new method proposed in this article which yields an identical performance with the PLL method. However, the frequency uncertainty problem degrades the performance of the estimator. In order to overcome this degradation, an improved algorithm which includes the frequency and phase offset is proposed in [7,8]. By estimating the frequency and phase offset, the performance degradation caused by the frequency offset in [6] is eliminated. The work reported in [9][10][11] proposed a carrier phase estimator for orthogonal frequency division multiple access systems based on the expectationmaximization algorithm to overcome the computational burden of the likelihood function. This method is actually equivalent to the maximum likelihood phase estimation using an iterative method without any prior knowledge of the phase. Two practical M-PSK phase detector structures for carrier synchronization PLLs were reported in [12]. These two new non-data-aided phase detector structures are known as the self-normalizing modification of the Mth-order nonlinearity detector and the adaptive gain detector [12]. Both detectors show improvement in phase error variance due to automatic gain control circuit imperfections.

Phase estimation and signal classification
All the above-mentioned methods aimed to develop an optimal phase estimator solely for signal demodulation without estimation of instantaneous parameters of the signals. The Costas loop and PLL are crucial for carrier recovery and synchronization in the demodulation of the class of PSK signals [5][6][7][8]. However, our applications focused on the analysis and classification of signals for spectrum monitoring. The main objective of such a system [14] is to determine the signal parameters such as the carrier frequency, signal power, modulation type, modulation parameters, symbol rate, and data format which are then used as input to a classifier network. This system is used by the military for intelligence gathering [15] and by the regulatory bodies [16] for verifying conformance to spectrum allocation. Recently, similar requirements were identified for spectrum sensing in cognitive radio [17] to determine channel occupancy and dynamically allocate channels to the various users. Spectrum monitoring systems also use data demodulation [14], but with modems tailored for the specific modulation type and data format.
Since PSK signals are time-varying in phase, timefrequency analysis [ [1]8, p. 9] can be used to estimate the signal's instantaneous parameters. The development of signal dependent kernels for time-frequency distribution (TFD) applicable to the class of ASK and FSK signal was proposed in [19]. Further enhancement in [20] improved the time-frequency representation (TFR) by estimating the kernel parameters using the localized lag autocorrelation (LLAC) function. Recent study has proven that the quadratic TFD [21,22] is capable to analyze and classify the class of ASK and FSK signals at very low signal-to-noise ratio (SNR) conditions (-2 dB). However, the loss of the phase information in the bilinear product computation makes it impractical to completely represent the PSK class of signals. Since PSK signals are characterized by the phase, cross time-frequency distributions (XTFD) based method is proposed as it is capable of representing the signal phase information [23]. Just like the quadratic TFD which suffers from the effect of cross terms, there are unwanted terms known as "duplicated terms" a which are present in the XTFD. Preliminary work on the XTFD shows that a fixed window is insufficient to generate an accurate IIB-phase estimation [23], thus justifying the need for an adaptive window.
This article presents a time-frequency analysis solution to the optimum phase estimation of PSK class of signals and then evaluates its performance. Signals tested are BPSK, QPSK, and 8PSK signals. The first method is based on the localized adaptive windowed cross Wigner-Ville distribution (AW-XWVD). In this method, the adaptation of the window width is based on the LLAC function of the signals of interest. For comparison, a second method is selected that is based on the S-transform [24]. It is an invertible time-frequency spectral localization technique that combines elements of the Wavelet transform (WT) and the short-time Fourier transform (STFT). This S-transform is selected for comparison as it has the property of preserving the phase of a signal as well as retaining other key characteristics such as energy localization and instantaneous frequency [24].
This correspondence is organized as follows. Section 2 first describes the signal models used in this article and introduces the general representations of the quadratic TFDs, XTFD, and S-transform. Section 3 presents the general equations for cross bilinear product in time-lag domain for both auto-terms and duplicated terms together with the LLAC algorithm for estimating the adaptive window for the PSK class signals. Next, we present the method for IIB-phase using the peak of the AW-XWVD and S-transform. The Cramer-Rao lower bound (CRLB) which is used for bench marking purposes is discussed in the following subsection. Section 4 presents the discrete time implementation of both method and the performance comparison of the AW-XWVD with the S-transform in the presence of noise. The criteria of comparison are based on the TFR, constellation diagram, main-lobe width (MLW) and the phase estimate variance. Then, a comparison in terms of the computational complexity and symbol error rate is given. Conclusions are given in the following section. Throughout this article, we use the following terminology: TFDs represent the mathematical formulations for distributing the signal energy in both time and frequency; the actual representations obtained are called TFRs.

PSK Signals Model and TFRs
This section first introduces the model and the parameters for the PSK signals. It then describes the timefrequency analysis techniques used to represent and analyze the signals.

Signal model
Communication signals are time-varying and are mainly characterized by instantaneous parameters such as the instantaneous amplitude for ASK signals, instantaneous frequency (IF) for FSK signals, and the IIB-phase for PSK signals. This section extends the concepts of IF to IIB-phase for digitally phase modulated signals and describes the signal parameters used for analysis. A comprehensive review of IF estimation from the peak of the TFD is given in [25,26]. A time-varying signal corresponding instantaneous phase is represented as where f is the frequency of the signal and θ is the constant initial phase of the signal. The IF is obtained by taking the first derivative of the instantaneous phase.
The instantaneous phase given in [25] has a time-varying frequency and the phase is constant for all time. In contrast, for a digitally phase modulated signal the phase term is also time-varying. If we extend Equation (1) to represent a phase modulated signal, the instantaneous phase then becomes where (t) is the IIB-phase which is very crucial in defining digitally phase modulated signals as it contains information of the transmitted data. This article evaluates the comparative performance of the AW-XWVD as an estimator of the IIB-phase for BPSK, QPSK, and 8PSK signals and then compares the results with the Stransform as both methods claimed to provide accurate phase representation. Note that this study does not include the class of quadrature amplitude modulation signals (QAM). Even though this signal has IIB-phase, its time-varying amplitude characteristic is not suitable for the adaptation algorithm described in this article (see Section 3.1.2). The algorithm is developed based on the assumption of constant amplitude signal such as the class of PSK signals.
In this article, the analytical form of the signal is used to minimize the effect of cross terms in the TFR [27]. Even though signals are real in practice, the analytical form of the signal can be generated using an FIR Hilbert filter [28]. Thus, an arbitrary digital phase modulated signal may be expressed as where k represents the order of the binary sequence transmitted, A represents the signal amplitude, f k is the subcarrier frequency, k represents the information bearing phase, and T b is the symbol duration of the signals. The variables A and f k are constant as the signals considered are PSK signals. For simplification of notation, in this article, the box function ∏(t) is defined as Figure 1 shows the time representations of the BPSK, QPSK, and 8PSK signals defined in Equation (4). The signal parameters are given in Table 1 and the sampling frequency is assumed to be 1 Hz. The analysis methods proposed in this article is applicable to communication applications in all frequency bands as long as they meet the Nyquist sampling theorem. Due to the frequency dependency of the S-transform, the analysis signals consist of both high-and low-frequency components so as to compare the performance of the AW-XWVD and Stransform for phase estimation.
The received noisy signal can be modeled as where z(t) is the noiseless PSK signal and v(t) is the complex-valued additive white Gaussian noise. The noise has independent and identically distributed real and imaginary parts with total variance σ 2 v and zero mean [ [18], p. 437].

TFDs, cross TFDs, and S-transform
The quadratic TFD is a useful technique to analyze time-varying signals, but the resulting TFR does not represent phase directly. Due to the need to estimate IIB-phase in PSK signals, the XTFD and the S-transform are introduced for this purpose as both can represent phase in the time-frequency domain.

Quadratic TFDs and cross TFDs
The quadratic TFD [ [18], p. 66] uses the bilinear product of the signal of interest to generate a TFR. To represent the phase information in the time-frequency domain, the cross bilinear product in the XTFD is calculated using TFDs from both signal of interest and reference signal. The resulting formulation for the XTFD can be expressed as follows where G(t, τ) is the time-lag kernel function that can also be represented in the Doppler-lag domain as described in [18,29]. The cross bilinear product K zr (t, τ) is given as where z(t) is the analytical signal of interest and r(t) is the reference signal. The cross bilinear product is the instantaneous cross correlation function (ICF) between the signal of interest and the reference signal. Similar to the signal of interest, the reference signal can be defined as But it does not contain IIB-phase. A box function is used in the representation of the reference signal to keep track of the location of interaction between the signals of interest with the reference signal in the time-lag representation. Similar study presented in [30,31] on the use of XWVD for IF estimate of linear FM signals requires a reference signal identical to the signal of interest. However, this is not necessary for this application since the reference signal required is a pure sinusoid with the same frequency as the signal of interest. Hence, any power spectrum estimation method [ [32], p. 214] can be used to determine the frequency of the received signal. From there, a pure sinusoid reference signal of the same frequency is generated. This article assumes that the signal of interest is in perfect synchronization with the reference signal. In practical applications, the presence of phase synchronization error introduces an offset in the IIB-phase. This phase offset could be compensated using a PLL or Costas loop [33] to generate the reference signal. Furthermore, the computation of the XTFD is done based on a segment of received signal. Combining the features of the PLL and Costas loop is only possible if the XTFD is computed iteratively one sample at a time.
In the general formulation of the quadratic TFD [ [18], p. 68], the various TFD such as the Wigner-Ville distribution (WVD), Choi-Williams distribution, spectrogram, Bdistribution, and other distributions can be defined by their respective time-lag kernels. The choice of this kernel function can help minimize cross terms in the TFR. A separable kernel allows the flexibility to separately control the smoothing in the time and frequency domain [18, Section 5.7]. The kernel function for the windowed WVD (WWVD) is an example of a separable kernel. It performs smoothing only in the frequency direction to reduce the effect of the cross terms. Similar to the WWVD, the kernel function for the windowed XWVD (WXWVD) is defined as Since the time component is a delta function, this kernel is independent of the Doppler variable and only a function of lag. The kernel is known as Doppler-independent kernel [ [18], p. 71], a special case of separable kernel. It is shown that such kernel applies one-dimensional filtering and is adapted to only a particular kind   (10) causes smoothing only in the frequency direction. By substituting Equation (10) into Equation (7), the WXWVD can be represented as The lag window function w(τ) can be one of the window functions typically used in filter design or spectrum analysis.

The S-transform
The S-transform is a spectral localization technique which is very much similar to the WT and STFT [24]. It can be considered as a special case of the STFT by replacing the window function with a frequency-dependent Gaussian window [24]. It is also related to the WT as it can be derived from the WT with a specific mother wavelet multiplied by the phase factor. The Gaussian window of the S-transform is scaled so that the window width is inversely proportional to the frequency, and its height is scaled linearly to the frequency. Due to the behavior of the window scaling, it possesses good time resolution for high-frequency components and good frequency resolution for low-frequency components. This transform has successfully been applied for resolving problems in the field of geophysics [34], power quality analysis [35], and medicine [36]. The original formulation for the S-transform of a signal, z(t), is given as [24] The frequency-dependent Gaussian window g(t, f) is given as [24] where f is the signal frequency and τ in Equation (12) denotes the position of the midpoint of the window. The window spread or standard deviation depends on f. Based on the characteristics of the Gaussian distribution, a window width of 6/f ensures that 99.72% of the signal values are enclosed within the window function [37]. Therefore, the window width is given as 6/f and height is given by the term f / √ 2π . The term f / √ 2π is also a normalizing factor which ensures that S (t, f) converges to Z(f) when averaged over time [36], as shown below.
Thus, the S-transform is invertible and the original signal can be recovered by taking the inverse Fourier transform of the above equation, resulting in the following expression of z(t).

Phase estimation methodology
This section describes the characteristics of the cross bilinear product in the time-lag representation and outlines the derivation of the AW-XWVD. The adaptation method used to set up the localized lag adaptive window is then discussed. Next, the method used for phase estimation from the peak of the TFR is presented.

AW-XWVD
The S-transform can be applied directly to the class of PSK signals to obtain the TFR without any modification in the algorithm. However, this is not the case with the XTFD where interference due to duplicated terms is introduced in the TFR [23]. Previous study defined methods to determine optimum windows for TFDs [38,39] that can reduce cross terms. A window matching algorithm [39] is used to determine the optimum window for a TFD at all time instant. The algorithm iteratively evaluates the localized energy distribution to minimize the error between successive window estimates. The concept of time-frequency coherence is introduced in [38] where the XWVD and WVD for each signal components are used in its computation. The required window function is estimated based on the autoregressive moving average modeling and Karhunen Loeve expansion. In this PSK communication application, the cross bilinear product has a certain pattern that can be utilized in computing the optimum window. Therefore, the adaptive window is designed based on the characteristics of the cross bilinear product. The resulting distribution, the AW-XWVD, can generate an accurate TFR and the subsequent IIB-phase estimate.

The cross bilinear product
The cross bilinear product consists of auto-terms and duplicated terms. The duplicated terms carry the same information as the auto-terms but shifted in time and lag domain; therefore, it can cause interference to the auto-terms [23]. In order to obtain an accurate XTFR, the auto-terms must be preserved and the duplicated terms must be removed or attenuated. The auto-terms and duplicated terms for any PSK class signal can be expressed as where K ∏ (t, τ) is the instantaneous autocorrelation function of the box function given as The proofs for Equations (16) and (17) are given in Appendix 1.
In practical digital communication applications, the amplitude of the signal might not be ideally constant due to channel impairments such as multipath fading, attenuation by the propagation channel and any kind of amplification performed by the circuits at both the transmitter and receiver [13]. Therefore, the variable A is retained throughout the derivation of the cross bilinear product. Other than that, signals that combine amplitude and phase modulations such as QAM can also be used provided a suitable adaptation algorithm for the XTFD is designed. The variation in the amplitude, A, caused by the transmitted binary data will correspond to the variation in the energy represented in the XTFR. Figure 2 shows the graphical representation of the above cross bilinear product. All the auto-terms lie along the τ = 0 axis, whereas the duplicated terms are shifted in both time and lag. Therefore, the terms labeled as K zr1,2 , K zr1,3 , and K zr1,4 are the duplicated terms for the autoterm, K zr1,1 , which are centered at τ = 0 axis. The same label applies to the rest of the auto-terms and duplicated terms. The following examples illustrate the problem caused by the duplicated terms to the estimation of IIBphase. There is no interference observed in the IIB-phase estimate if there are only auto-terms present. For instance, at time t = T b /2 the cross bilinear product evaluated is given as Only the auto-terms with the IIB-phase of 1 exist. When both the auto-terms and duplicated terms are present, there will be more than one phase term. This is observed at t = 3T b /2 where the cross bilinear product is represented as The interaction of the auto-terms and duplicated terms can be visualized as the addition of multiple vector components which result in a new vector component with different magnitude and phase. Instead of IIBphase of 2 which is caused by the auto-terms, the resulting IIB-phase consists of the interaction between all the phase terms 1 and 3 caused by the duplicated terms.
Since all auto-terms lie along the τ = 0 axis, a fixed width lag window was used in [23] to preserve the autoterms and partially remove the duplicated terms that cause distortion in the IIB-phase represented on the XTFR. However, success is limited because the duplicated terms are not completely removed resulting in a distorted IIB-phase estimate. To resolve this problem, the fixed lag window w (τ) in Equation (11) is replaced with a time-dependent window function w (t, τ) and the resulting new TFD, known as the AW-XWVD, is given as The adjustment of this time-dependent window width is based on the computation of the LLAC function at every time instant to separate the auto terms and duplicated terms. This is equivalent to use a separable kernel to reduce all cross terms as shown in [21,22]. An analysis window centered at τ = 0 is used as a reference to perform the similarity test using the LLAC function. This similarity test detects the variation of the signal in the lag direction at every time instant and estimates the window width. The time-dependent window function can be implemented using one of the common windows used in digital filter design and spectrum estimation. In this application, a rectangular window is used and it can be defined as where τ g (t) is the time-dependent window width defined within 0 ≤ t ≤ T, and T is the signal duration. Since the cross bilinear product is asymmetric, the window width in the positive lag and negative lag must be estimated accordingly. The desired τ g (t) in the positive lag (or in the negative lag direction) is selected if where ς is the time instant in lag and |R KK (t, ς)| is the amplitude of the LLAC which will be discussed in the following section. Note that the rectangular window was used for simplicity as we observed that the proposed methodology performance is not affected significantly by the choice of the window shape.

Adaptation algorithm
The LLAC [20] of the kernel, K, is a function of time and lag and it can be defined as where w a (τ) is the analysis window, τ is the lag instant, and ς is the lag running variable. The possible range for the normalized LLAC amplitude is A higher value of the amplitude of the LLAC function implies that the similarity is high and vice versa. The miscorrelation in the signal is indicated by a drastic drop in the amplitude of the LLAC function. The LLAC function will give a value approaching unity at lag instant, ς = 0.
The analysis window is a parameter of the LLAC. Its selection is important to ensure that the LLAC can detect the variation along the lag axis based on the condition specified in Equation (23) as to estimate the time-dependent window width. The analysis window is defined as where τ a is the analysis window width. In this article, the analysis window width is chosen experimentally as τ a = 10 s based on the sampling frequency of 1 Hz. The LLAC is applied to the signal x(t) and evaluated for the normalized frequencies of 1/32, 1/16, 1/8, and1/4 Hz. In this evaluation, the signal is defined as follows (with similar characteristic to the cross bilinear product in lag) Table 2 shows the minimum values of the LLAC evaluated in time. An analysis window of τ a = 10 s is sufficient to determine the time-dependent window width for frequencies ranged from 1/32 to 1/4 Hz. Thus, the analysis width is valid for the test signals as specified in Section 2. In the case of PSK signals, two consecutives symbols may be different from each other depending on the transmitted data. By applying the LLAC on the cross bilinear product, the resulting adaptive window resembles the shape of a parallelogram as shown in Figure 3.

IIB-phase estimation from the peak of TFDs
By extending the approach used for IF estimation from the peak of WVD presented in [26], the IIB-phase is estimated frrm the peak of the AW-XWVD and S-transform as outlined in the following sections.

IIB-phase estimation using the AW-XWVD
The IF can be estimated from the peak of the TFD for all time instants as shown below [ [18], p. 429] wheref (t) is the estimated frequency. The peak of the TFD is detected and the location is used as the frequency estimate. In this application, the peak of the XTFD for the AW-XWVD is detected for all time instants and it is used to estimate the phase. Since the peak value is complex, the IIB-phase may be expressed as the inverse tangent of the imaginary and real component, that iŝ The detailed derivation of the above equation is given in Appendix 3.

IIB-phase estimation using the S-transform
For the S-transform, the IIB-phase estimation from the peak of the TFD, however, is not as straightforward as the AW-XWVD. The phase term in the frequency representation introduced by the time shift window has to be compensated in the actual IIB-phase estimate. The relationship between the time delay and phase shift is presented in [40], where the authors utilized this property to generate the analytical signal as an alternative to the Hilbert transform. By applying this concept, the estimated IIB-phase using S-transform can be represented aŝ  The detailed derivation for IIB-phase estimation using S-transform is given in Appendix 4.

Comparison to CRLB
This section compares the performance of both AW-XWVD and S-transform as a phase estimator with the CRLB which is often used as a benchmark [41], as it gives the theoretical lower limit to the variance of any unbiased parameter estimator [42]. The CRLB derived in [43,44] uses a likelihood function on a known signal in the presence of additive white noise for the digitally phase modulated signal.
In terms of SNR, the CRLB for BPSK and QPSK signals can be represented, respectively, as [43] where N is the average window width, g is the SNR, F B and F Q are, respectively, the ratio of the CRLB for random BPSK and QPSK signals to the CRLB for an unmodulated carrier of the same power. At high SNR, the value of F B and F Q is equivalent to one; so, the same bound applies for both the BPSK and QPSK signals [43]. The value of F B and F Q differs at low SNR and is obtained from the results presented in [43]. In [44], the authors extended the study presented in [43] and derived the CRLB for 8PSK signal with random phase. It is shown that the CRLB for MPSK signal for moderate to low SNR is given as [44] CRB MPSK (φ) = 1 2Nγ (33) The variance of the actual IIB-phase estimator for both AW-XWVD and S-transform method can be represented as where N is the total number of samples,φ n is the estimated phase at every time sample n, andφ is the actual IIB-phase.

PSK signal detection algorithm
In addition to the estimation of modulation parameters, the IIB-phase estimate derived from the XTFR can also be used as a demodulator for the class of PSK signals.
The detection is performed by first estimating the IIBphase, (t) from the peak of the TFD. The average IIBphase within a symbol duration can be estimated as followsφ For a BPSK signal, the symbols are detected based on a set of decision rule [45] that are defined as where s BPSK is the estimated binary data. The decision boundary is defined based on the signal parameters shown in Table 1. Similarly, the same approach described for BPSK is extended to QPSK and 8PSK. The decision rule for QPSK and 8PSK signals are defined, respectively, as

Implementation, results, and discussions
This section discusses the implementation and realization of the TFDs as well as the performance comparison between the AW-XWVD and S-transform from several measures. First, comparison is made in terms of the TFR plot, the slice of the TFR, the IIB-phase, the instantaneous energy, and the constellation diagram. Then, comparison in terms of the MLW is discussed. Next, the performance of the AW-XWVD and S-transform as a phase estimator is benchmarked to the CRLB. This is followed by the evaluation of the symbol error rate performance of the AW-XWVD, S-transform, and conventional detector. Finally, a comparison is made in terms of the computational complexity between the AW-XWVD, S-transform, and conventional detector.

Discrete-time formulation and implementation
The discrete time formulation of the TFDs is needed for implementation on digital systems; and this applies for both the discrete forms of the AW-XWVD and S-transform. In [ [18], p. 235], the windowed discrete WVD (DWVD) of a continuous-time signal z(t) is expressed as where M is a positive integers representing the window length in samples, n is the discrete time samples, m is the discrete lag samples, and k is the discrete frequency. Thus, by using Equation (21), the discrete AW-XWVD can be expressed as ρzr,AWXWVD, n, k = 2 (40) Using the same notation as above, the discrete S-transform [24] can be represented as The discrete time representation of the S-transform is similar to the spectrogram. However, there is a tradeoff between the time and frequency resolution for the Stransform as the window width is frequency dependent. 5 show the TFR, TFR slice, IIB-phase, instantaneous energy, and constellation plots for QPSK2 signals at SNR of 10 dB using the AW-XWVD and Stransform, respectively. The two TFRs show at which frequency the signal exists, but for the S-transform there are distortions in the TFR at every symbol transition. The high contrast region in the TFR of the S-transform indicates that there are low-density components other than the signal component. However, this is not present in the TFR of the AW-XWVD. The TFR slice is normalized to the peak value of the TFR and observed in frequency for time n = 100 samples. From the TFR slice, it is shown that the AW-XWVD gives better frequency concentration compared to the S-Transform. This is because the MLW of the TFR slice for the S-transform appears to be much wider than AW-XWVD. As shown in Equation (12), the S-transform's window width is frequency dependent where the window is wider for low-frequency signal and narrower for high-frequency signal. This implies that the S-transform at higher frequency gives worse frequency resolution and wider MLW. Results confirming this statement are presented in Table 3. Besides the MLW of the TFR slice, there is also a difference in the average side lobe level. The side lobe level is higher for the S-transform at about 0.18, while this level is lower at 0.05 for the AW-XWVD. This explains the appearance of the high contrast region on the TFR of the S-transform.

Figures 4 and
The IIB-phase plot shows that the AW-XWVD gives better accuracy for the IIB-phase estimate. For the S-transform, distortion is observed in the IIB-phase estimate at the phase transition regions which is absent in the AW-XWVD. The sliding window in the S-transform causes distortion in the IIB-phase at the symbol transition region due to the interaction between adjacent symbols. Since digitally phase modulated signals have constant amplitude, their instantaneous energy should also be constant at all times. However, due to noise, the amplitude of the signal appears to vary. This is reflected as variation in the magnitude of the instantaneous energy for AW-XWVD and S-transform. A significant drop is observed in the instantaneous energy for the Stransform at every symbol transition. Similar to the phase, this drop is caused by the interactions between the adjacent symbols within the sliding window. Since the AW-XWVD produces accurate instantaneous energy and IIB-phase estimates, the constellation diagram generated shows almost no variation from the original points and is better compared to the S-transform. Table  3 shows the MLW estimated at SNR of 6and 10 dB using both methods. In general, the SNR has no significant effect in the MLW obtained for both methods. However, the effect of signal frequency is more significant for the S-transform compared to the AW-XWVD. For instance, the MLW for both BPSK1 and BPSK2 with the AW-XWVD based estimate is the same at 0.012 Hz. However, for the S-transform the MLW is larger for BPSK2 than BPSK1 with a difference of 0.028 Hz. The scaling of the Gaussian window results in a broader MLW for higher-frequency signal. The signal modulation level has no significant effect on the MLW. This is shown by the MLW measured for BPSK, QPSK, and 8PSK signals where there are only minor differences. These results imply that the AW-XWVD gives better IIB-phase estimation results compared to the S-transform as the performance of an estimator is associated with the MLW [ [46], p. 50].

Variance comparison with the CRLB
In order to evaluate the performance of the AW-XWVD method, we compare it with the S-transform. Both methods are benchmarked with the CRLB for phase estimate. It is assumed that there is perfect synchronization between the received and reference signals. We consider the signal is corrupted by a zero mean additive white Gaussian noise channel with variance s 2 . In simulations, the received signal is generated by adding the noiseless signal, z(t), and the additive white Gaussian noise, v(t), as given in Equation (6) where the SNR is varied at 1 dB step from 0 to 10 dB. Monte Carlo simulations based on 1,000 realizations of the predefined signals are conducted for each value of SNR. The estimated IIB-phase is obtained from the peak of the XTFD as described in Section 3.2. Assuming the actual IIB-phase of the signal is known, the MSE is computed using Equation (34) . Figures 6, 7, and 8 show the results of the estimated IIBphase variance for BPSK, QPSK, and 8PSK signals, respectively. In general, for both BPSK and QPSK signals, the variance of IIB-phase estimate using the AW-XWVD lies close to the CRLB. The AW-XWVD estimate meets the CRLB at SNR ≥ 5 dB for both BPSK1 and BPSK2 signals. Figure 7 shows that the cutoff point for both QPSK1 and QPSK2 signals are the same as the BPSK signals, at SNR of 5 dB. However, the variances in the IIBphase estimate for QPSK signals are higher compared to the BPSK signals at the same SNR. A higher cutoff point is recorded for the 8PSK estimates where both 8PSK1 and 8PSK2 signals meet the CRLB at SNR ≥ 7 dB. In general, the AW-XWVD outperformed the S-transform as a phase estimator and the S-transform estimates never meet the CRLB for all signals even at SNR of 10 dB. As expected, the performance of the S-transform deteriorates greatly for higher frequency signals due to broader MLW. This is in contrast with the AW-XWVD where it maintains reasonably insignificant changes in the variance of IIB-phase estimates for all range of frequency. Thus, the AW-XWVD is more robust to noise for phase estimation compared to the S-transform. The result obtained in this article is in line with [30] where the XWVD is shown to be more robust to noise compared to the WVD.

Symbol error rate performance
In this section, the symbol error rate performance of the AW-XWVD and S-transform is compared with the conventional detector defined in [ [13], p. 188] for the class of PSK signals. It is based on the matched filter structure, where the reference signal must have the same parameters as and be in phase with the signal of interest. In general, an increment in the number of bits per symbol will increase the throughput at the expense of downgrading the symbol error rate. The formulation for where P E (M) is the symbol error rate, the function Q(x) is the complementary error function, E s is the energy per symbol, N 0 is the noise power, and M is the size of symbol set. Detection of the PSK signals is done based on the IIB-phase estimate and the sets of rules defined in Section 3.4. All the defined test signals of 10,000 symbols are evaluated under AWGN channel. The results presented in Figures 9, 10, and 11 show the symbol error rate performance as a function of SNR for BPSK, QPSK, and 8PSK signals. BPSK signal required a SNR of about 5 dB to achieve a symbol error rate of 10 -3 for the AW-XWVD method. Using the same method, a higher SNR is observed for QPSK signal, approximately 8 dB. To achieve the same performance, the conventional detector needs an SNR of 7 and 8 dB for BPSK and QPSK signals, respectively. For 8PSK signal, it is shown that to achieve a symbol error rate of 10 -3 , SNR of 10 dB is required for AW-XWVD. As for the conventional detector, it gives the same performance at SNR of 11 dB. From the symbol error rate plot, for all the test signals, it is observed that the advantage of the AW-XWVD over the conventional detector is at the low SNR range where it gives lower error rate. In general, the symbol error rate for the AW-XWVD is much lower compared to the S-transform. The S-transform could not provide symbol error rate ≤ 10 -3 even at SNR of 12 dB for all signals. Thus, it is impractical to use it as a detector for the class of PSK signals.

Computation complexity
The number of computations for implementing the AW-XWVD, the S-transform, and conventional detectors is discussed in this section to compare the practicality of the proposed method for signal analysis and classification applications as used in spectrum monitoring and cognitive radio [14][15][16][17]. The computation complexity of the AW-XWVD is similar to the smooth WWVD due to the similarity in the computation of the bilinear product. To implement the AW-XWVD, the number of computation required in terms of the number of multiplication is given below [22]. For the sake of clarity in terminology, in the paragraph below, N is the signal length, N τ is the lag window length, N A is the length of the analysis window, and N w is the average length of adaptive lag window.   1. Computation of the cross bilinear product to obtain its time-lag representation requires N τ N multiplications. Ideally, the number of computation for the cross bilinear product is N 2 where the lag and time durations are equal to N samples. To maintain equal frequency resolution for N > 512 samples, the duration in lag is maintained at N τ = 512 samples. By limiting the duration in lag, excessive computation of the cross bilinear product is avoided. 2. The LLAC uses an analysis window of N A which slides along the lag axis at every lag sample for a total of N τ samples. Since there are N time instances, the total number of multiplications for the computation of localized lag autocorrelation function is N A N τ N. 3. The LLAC will determine the separation interval between the auto-terms and duplicated terms based on the average lag window width N w . For N time samples, the total number of multiplications to setup the adaptive lag window based on the average lag window width N w is N w N. 4. To get the XTFR, the Fourier transform of the windowed cross bilinear product is calculated in the lag direction with (N τ log 2 N τ ) multiplications. For signal length N, the total number of multiplications 0.5N(N τ log 2 N τ ). Therefore, the total of multiplication required to compute the AW-XWVD is N(N τ + N A N τ + N w + 0.5N τ log 2 N τ ).
The S-transform is very much similar to the spectrogram, except that the window for the S-transform is frequency dependent. Hence, the number of computation required in terms of number of multiplication results from [47]: 1. The product of the frequency-dependent Gaussian window function and the signal of interest to obtain its localized spectrum which requires N multiplications. 2. The Fourier transform of the time-lag representation to obtain the TFR requires 0.5N (N τ log 2 N τ ) multiplications.
Thus, the total number of multiplication required to implement the S-transform is N(1 + 0.5N τ log 2 N τ ).
The number of computation in terms of the number of multiplication required to implement the conventional detector is [13]: 1. Mixing of the incoming signal with two sinusoid signals with 90°phase difference requires 2N multiplications.    Therefore, the total number of multiplications required to implement the conventional detector is 4N.
In this application, the length of the signal evaluated is 640 samples points and the lag window length N τ is set to 512 sample points. The analysis window length N A used in the AW-XWVD is 10 samples points and the average length of adaptive lag window N w is 80 samples points. The number of multiplications required for the AW-XWVD, S-transform, and conventional detector per symbol is summarized in Table 4.
In terms of the number of multiplications, the AW-XWVD requires approximately 4 times more computations compared to the S-transform and 2,000 times more for the conventional detector. Although there is a significant additional number of a computation for the AW-XWVD, recent advances in digital electronics as well as decimation procedures can take care of them; in addition, the performance in terms of the IIB-phase estimates enables more efficient signal parameters estimation in the proposed area of applications. These parameters can be used to classify a signal from a set of reference parameters. If necessary, we can use the IIBphase estimate to detect PSK signals at low SNR conditions where the conventional detector failed. For higher SNR conditions, it is not necessary to use a technique which is computationally intensive when the symbol error rate is low. Thus, we can setup the conventional detector using the parameters estimated from the IIBphase to detect PSK signal. So, we conclude that, with the enhancement of current computer processing combined with appropriate decimation procedures, the realtime implementation of AW-XWVD is feasible with the use of multiple processors or parallel processing [48] and the design proposed in [49].

Conclusions
A performance comparison between the AW-XWVD and S-transform estimators of IIB-phase shows that the AW-XWVD is superior to the S-transform for classifying PSK signals. Results show that the mean square error of the phase estimate using AW-XWVD is on the average lower by 20 dB. The S-transform has a frequency-dependent window width which performs poorly as a phase estimator for high-frequency signal components. Since peak detection is used for the estimation of the IIB-phase for both methods, the frequency resolution and MLW contribute to the estimation accuracy. The AW-XWVD maintains the frequency resolution through the window adaptation and yields better accuracy for the IIB-phase estimate. It also meets the CRLB at moderate SNR for all the defined signals unlike the S-transform that never meets the bound even at high SNR. For symbol error rate performance, the AW-XWVD is also better compared to the S-transform and it is comparable to the conventional detector at the cost of higher number of computations. Thus, this article has proven that the AW-XWVD is an effective phase estimator for digitally phase modulated signals and can be used for similar applications involving time-varying signals. This study suggests new research directions to pursue in the future, such as replacing the S-transform by a modified S-transform that incorporates an adaptive mechanism; replacing the S-transform by the cross Stransform; using separable kernels in defining a XTFD; and investigate the effect of window shape in Equation (40) on the performance of the phase estimator. These advances can be used in a wide range of signal processing applications from Telecommunications to Biomedicine including EEG and Fetal Movement signals analysis and processing, where time-frequency peak detectors can provide additional features for classification improvement.
Endnote a The terminology "duplicated terms" is used in this article instead of the cross terms which is typically used in TFA. This is because, in the proposed method, these terms carry the same information as the auto-terms but are shifted in both time and lag. The duplicated terms are caused by the cross bilinear product between the kth and lth symbol of the signal of interest and the reference signal, where k ≠ l.

Appendix 1. Derivation of the auto-terms
Derivation of the auto-terms will be discussed in this section. Cross bilinear product can be seen as the results of the cross correlation function between the signal of interest and a reference signal. For discussion purposes, a PSK signal of N symbols length with normalized amplitude is evaluated. The signal has the same frequency and the difference between each symbol is the phase that is determined by the binary information present. The N symbol length PSK signal can be represented as Table 4 Comparison of computational complexity between AW-XWVD, S-transform and the conventional detector The reference signal with respect to the signal of interest is given as Substituting Equations (A.1) and (A.2) into Equation (8) then the cross bilinear product is given as