 Review
 Open Access
 Published:
Statistically inferred time warping: extending the cyclostationarity paradigm from regular to irregular statistical cyclicity in scientific data
EURASIP Journal on Advances in Signal Processing volume 2018, Article number: 59 (2018)
Abstract
Statistically inferred timewarping functions are proposed for transforming data exhibiting irregular statistical cyclicity (ISC) into data exhibiting regular statistical cyclicity (RSC). This type of transformation enables the application of the theory of cyclostationarity (CS) and polyCS to be extended from data with RSC to data with ISC. The nonextended theory, introduced only a few decades ago, has led to the development of numerous data processing techniques/algorithms for statistical inference that outperform predecessors that are based on the theory of stationarity. So, the proposed extension to ISC data is expected to greatly broaden the already diverse applications of this theory and methodology to measurements/observations of RSC data throughout many fields of engineering and science. This extends the CS paradigm to data with inherent ISC, due to biological and other natural origins of irregular cyclicity. It also extends this paradigm to data with inherent regular cyclicity that has been rendered irregular by time warping due, for example, to sensor motion or other dynamics affecting the data.
Graphical abstract
ᅟ
Onesentence summary
Wellknown data analysis benefits of cyclostationary signalprocessing methodology are extended from regular to irregular statistical cyclicity in scientific data by using statistically inferred timewarping functions.
The cyclostationarity paradigm in science
Cyclicity is ubiquitous in scientific data
Many dynamical processes encountered in nature arise from periodic or cyclic phenomena. Such processes, although themselves not periodic functions of time, can produce random or erratic or otherwise unpredictable data whose statistical characteristics do vary periodically with time and are called cyclostationary (CS) processes [1,2,3]. For example, in telecommunications, telemetry, radar, and sonar systems, statistical periodicity or regular cyclicity in data is due to modulation, sampling, scanning, framing, multiplexing, and coding operations. In these informationtransmission systems, relative motion between transmitter or reflector and receiver essentially warps the time scale of the received data. Also, if the clock that controls the periodic operation on the data is irregular, the cyclicity of the data is irregular. In mechanical vibration monitoring and diagnosis, cyclicity is due, for example, to various rotating, revolving, or reciprocating parts of rotating machinery; and if the angular speed of motion varies with time, the cyclicity is irregular. However, as explained herein, irregular statistical cyclicity (ISC) due to timevarying RPM or clock timing is not equivalent to timewarped regular statistical cyclicity (RSC). In astrophysics, irregular cyclicity arises from electromagnetically induced revolution and/or rotation of planets, stars, and galaxies and from pulsation and other cyclic phenomena, such as magnetic reversals of planets and stars, and especially Birkeland currents (concentric shells of counterrotating currents). In econometrics, cyclicity resulting from business cycles has various causes including seasonality and other less regular sources of cyclicity. In atmospheric science, cyclicity is due to rotation and revolution of Earth and other cyclic phenomena affecting Earth, such as solar cycles. In the life sciences, such as biology, cyclicity is exhibited through various biorhythms, such as circadian, tidal, lunar, and gene oscillation rhythms. The study of how solar and lunarrelated rhythms are governed by living pacemakers within organisms constitutes the scientific discipline of chronobiology, which includes comparative anatomy, physiology, genetics, and molecular biology, as well as development, reproduction, ecology, and evolution. Cyclicity also arises in various other fields of study within the physical sciences, such as meteorology, climatology, oceanology, and hydrology. As a matter of fact, the cyclicity in all data is irregular because there are no perfectly regular clocks or pacemakers. But, when the degree of irregularity throughout timeintegration intervals required for extracting statistics from data is sufficiently low, the data’s cyclicity can be treated as regular.
The relevance of the theory of cyclostationarity to many fields of timeseries analysis was proposed in the mid1980s in the seminal theoretical work and associated development of data processing methodology reported in [1,2,3], which established cyclostationarity as a new paradigm in data modeling and analysis, especially—at that time—in engineering fields and particularly in telecommunications signal processing where the signals typically exhibit RSC. More generally, the majority of the development of such data processing techniques that ensued up to the turn of the century was focused on statistical processing of data with RSC for engineering applications, such as telecommunications/telemetry/radar/sonar and, subsequently, mechanical vibrations of rotating machinery. But today—more than 30 years later—the literature reveals not only expanded engineering applications but also many diverse applications to measurements/observations of RSC data throughout the natural sciences (see Appendix), and it is to be expected there will be many more applications found in the natural sciences for which benefit will be derived from transforming ISC into RSC, and applying the now classical theory and methodology.
Widesense cyclostationary stochastic processes have autocorrelation functions that vary periodically with time. This function of time, under mild regularity conditions on its mathematical model, can be expanded in a Fourier series whose coefficients, referred to as cyclic autocorrelation functions, depend on the lag parameter; the Fourier frequencies, called cycle frequencies, are multiples of the reciprocal of the period of cyclostationarity [1,2,3].
More generally, if the frequencies of the (generalized) Fourier series expansion of the autocorrelation function are not commensurate, that is, if the autocorrelation function is an almostperiodic (in the mathematical sense) function of time, then the process is said to be almostcyclostationary [4]. This large class includes as subclasses the polycyclostationary (polyCS) processes, which exhibit only a finite number of incommensurate periods, and the cyclostationary processes which exhibit only one period. The (almost) periodicity property of the autocorrelation function is manifested in the frequency domain of the data as statistical dependence (e.g., correlation) between the spectral components of the data process that are separated in frequency by amounts equal to the cycle frequencies of the process and are shifted to any common spectral band for correlation measurement. In contrast to this, stationary (in the widesense) processes have joint moments (autocorrelation functions) that are independent of time, depending on only the lag parameter, and all spectral components at distinct frequencies are statistically independent (uncorrelated) with each other.
This subject has been further broadened by the generalization of the theory to generalized almostcyclostationary processes, which exhibit cycle frequencies of the autocorrelation function that are dependent on the value of the lag variable, in [5].
As a simple means of assessing the current prevalence of the cyclostationarity paradigm in scientific data processing—that is, the concept of cyclostationarity and the associated body of data processing theory and method—in various fields of science and engineering, a web search using https://scholar.google.com/ was performed in April 2018, as a refinement and update of the search performed during the writing of this paper in 2015. This latter search was based on just under 50 nearly distinct applications areas in science and engineering, and the search terms were chosen to yield only results involving cyclostationarity. By “nearly distinct”, it is meant that the search terms were also selected to minimize redundancy (multiple search application areas producing the same “hits”). The results are shown in Table 1 in Section 17, Appendix. The total number of hits was about 136,000. The hits grow from a trickle of 1 to 2 figures per decade in the 1960s/1970s to a total of 6 figures over the ensuing half century.
The same is true, with 5 figures, for a search performed on the single general search term “cyclostationary OR cyclostationarity”. Also, as shown in Table 2, another search was performed using just over 20 search terms that represent partiallyredundant general subjects in science and engineering. The total number of hits was about 238,000. These hits also grew from a trickle of 1 to 2 figures per decade in the 1960s/1970s to a total of 6 figures over the ensuing half century.
Some analysis of Google Scholar’s search results obtained using the terms shown in these tables suggest that this search engine’s proprietary search algorithm is corrupting the logical “OR” operation and possibly the “AND” operation. Further attempts will be made in an attempt to minimize the impact of this hypothesized corruption, and results obtained will be posted in [6]. Yet, there is good reason to believe that this body of theory and method and their applications would be even more pervasive if its utility could be extended from data with regular statistical cyclicity (RSC) to data with irregular statistical cyclicity (ISC).
The purpose of this paper is to enable an extension of the cyclostationarity paradigm from data exhibiting RSC to data exhibiting ISC. The approach taken, when performed in discrete time (as required when implemented digitally), can be classified as adaptive nonuniform resampling of data, and the adaptation proposed is performed blindly (requires no training data) using a propertyrestoral technique specifically designed to exploit cyclostationarity.
For what follows, readers would benefit from some basic knowledge of the concept of cyclostationarity—the periodic time variation of probabilistic (mathematical) or statistical (empirical) parameters of timeseries data, which are sometimes called signals. These parameters are most notably the joint probability density function for the signal’s amplitude at multiple points in time, or moments of these density functions. A polyperiodic function of time (typically not the signal itself) is defined by its characteristic of being able to be expressed as a finite sum of periodic functions with multiple incommensurate periods (these are periods whose ratios are all irrational numbers. Polyperiodic time variation of probabilistic/statistical parameters characterizes polycyclostationary signals. The frequencies of the individual harmonics associated with each period of a cyclostationary/polycyclostationary signal are called the cycle frequencies. Polycyclostationary signals were originally [4], and are still [1,2,3, 7], most frequently called almostcyclostationary (particularly by mathematically oriented authors) because polyperiodic functions are examples of almost periodic (in the mathematical sense) functions; such functions need not have only a finite number of incommensurate periods.
Tutorial treatments of cyclostationarity theory and method are available in the books and journal articles [1,2,3,4,5, 7, 8] and references therein; for the highest fidelity treatments in the literature, all of which share a common terminology and a selfconsistent foundational theory (over the last three decades), readers are referred to those authored by the first author of [1,2,3,4] and [7,8,9,10,11], the originator of the cyclostationarity paradigm in signal processing, whose publications on this topic date back to the early 1970s [9]. Also recommended are the more recent publications by the originator of several extensions and generalizations of cyclostationarity, the author of [5], who uses terminology and develops theory that are (for the most part) consistent with that in the foundational literature.
There exists a duality between two alternative theories of polycyclostationarity: (1) the traditional theory, introduced in 1978 [4], which is more abstract and is based on the stochasticprocess model (introduced in the 1940s by Kolmogorov) and the associated probabilistic expectation operation, and (2) the empiricist’s alternative, introduced during 1985–1991 [1, 2, 8, 10], which is recommended for scientists working with empirical data and is based on fractionoftime probability and the sinewaveextraction operation (introduced during that same period [1, 2, 8, 10]). For tutorial treatments of the concepts underlying this duality, these four originating publications plus [3, 11] are recommended, particularly for analytically inclined practitioners but also for any reader seeking a treatment that starts from basics and proceeds step by step to build advanced concepts and theory. For deeper mathematical treatments of fundamentals, the primary publications to date are [5, 12]. (Some of the references cited herein, such as outofprint books, authored by the author of this paper are accessible to all for free as downloadable PDF documents at the webpage [9]; in the future, the primary source on cyclostationarity is expected to be the website (presently under construction) with domain name cyclostationarity followed by any of the domain extensions .com, .org, .info, .net, and .us [6].) The presently inadequate Wikipedia article entitled Cyclostationary_process (and several inadequate/redundant articles at other Wikipedia sites) requires major upgrading.
One simple example of a CS signal is described here to illustrate that what is here called regular statistical cyclicity for timeseries data can represent extremely erratic behavior relative to a periodic time series. Consider a long train of pulses or bursts of arbitrary complexity and identical functional form and assume that, for each individual pulse the shape parameters, such as amplitude, width (or time expansion/contraction factor), timeposition, and center frequency, are all random variables—their values change unpredictably from one pulse in the train to the next. If these multiple sequences of random parameters associated with the sequence of pulses in the train are jointly stationary random sequences, then the signal is CS and therefore exhibits regular statistical cyclicity, regardless of the fact that the pulse train can be far from anything resembling a periodic signal. As another example, any broadband noise process with center frequency and/or amplitude and/or time scale that is varied periodically is CS. Thus, exactly regular statistical cyclicity can be quite subtle and even unrecognizable to the casual observer. This is reflected in the frequent usage in recent times of CS models for timeseries data from natural phenomena of many distinct origins (see Appendix). Yet, there are many ways in which a timeseries of even exactly periodic data can be affected by some dynamical process of cycletime expansion and/or contraction in a manner that renders its statistical cyclicity irregular: not CS or polyCS. The particular type of dynamic process of interest in this contribution is time warping.
Time warping
Time warping is a dynamical process of cycletime expansion and/or contraction
Let x(t) be a widesense CS, or widesense polyCS signal or time series of data with at least one cycle frequency α, and let y(t) be a timewarped version
in which the timewarping function ψ(t) represents a causal data transformation, meaning warped time never pauses or reverses direction: if t_{2} > t_{1}, then ψ(t_{2}) > ψ(t_{1}). In this case, the warping function is an invertible function. The notation \( t={\psi}^{1}(s) \) is used herein to denote the inverse of \( s=\psi (t) \).
In general, if the time warping is not an affine transformation, \( \psi (t)= at+b \), or some periodic or polyperiodic generalization thereof, such as \( \psi (t)= at+b(t) \), in which b(t) is a periodic or polyperiodic function, then any cyclicity in x(t) is absent in y(t): the signal y(t) is not CS or polyCS. Nevertheless, by dewarping time in y(t), x(t) is recovered and therefore cyclicity is restored.
The periodically (or polyperiodically) timevarying autocorrelation function for x(t) is given by
where the cyclic autocorrelations \( {R}_{xx^{\left(\ast \right)}}^{\alpha}\left(\tau \right) \) are defined in the usual manner [1,2,3] in terms of either sinusoidally weighted time averages of lag products or same for probabilistic expected values of lag products. For CS x(t) with period T_{1}, we have \( \left\{\alpha \right\}=\left\{h/{T}_1;h=\mathrm{some}\ \mathrm{integers}\right\} \); and, for polyCS x(t), we have
For example, the expected value of the lag product with lag τ is given by
where the superscript (∗) denotes optional conjugation [1,2,3] of data (such as the baseband complexvalued representations of realvalued bandpass signals) that is represented in terms of complex data values, and the cyclic autocorrelation is given (ideally) by the limit of the sinusoidally weighted time average of the lag product, as the averaging time T (ideally, for polyCS x(t)) approaches infinity:
The usual estimate of this statistical function, obtained from the data x(t), is given by
where T is the finite length of the timeaveraging interval used. Readers are warned that the presentation here follows the convention in [5] for which the asymmetric lag product x(t + τ)x^{(∗)}(t) is adopted, whereas the earlier treatments in all references herein by the Author follow the alternative convention based on the symmetric lag product x(t + τ/2)x^{(∗)}(t − τ/2). Because of this difference in convention, the asymmetric lag cyclic autocorrelations in [5] and herein differ from the symmetric lag autocorrelations in the original work [1,2,3] in that the former equals the latter multiplied by the lagdependent phaseshifting factor exp(jπατ). The choice of convention used herein was dictated by the benefit gained by avoiding the need for translations back and forth between this paper and the important complementary source [5] and the directly related upcoming publications by the author of [5], and also because because τ/2 is generally undefined for discrete time.
In the one special case for which x(t) is CS (not polyCS), and the probabilistic expectation operation is used as in Eqs. 3 and 4, the theory reveals that a finite averaging time equal to the period of CS, \( T={T}_1 \) (or any nonzero integer multiple thereof) suffices in Eq. 4. There is no need for an infinite amount of timeaveraging to obtain the idealized result. However, if the probabilistic expectation is not used, then (ideally) infinite averaging time is required to obtain the mathematically idealized cyclic autocorrelations.
In contrast to x(t), the cyclic autocorrelations of the timewarped data y(t), whether defined with or without (cf. [10]) the probabilistic expectation operation, are generally zero, \( {R}_{yy^{\left(\ast \right)}}^{\alpha}\left(\tau \right)=0 \), for all the values of α in Eq. 2, except possibly \( \alpha =0 \), and for all other nonzero values of α.
Before proceeding, it is clarified here that, although the notation used does not reveal any dependence of the timewarping function ψ(⋅) on the phenomenon characterized by x(⋅), there is nothing in the theory or method presented here that prohibits such dependence, with one exception that is explained in Section 6. The only limitation on the nature of such dependence is that, to be physically viable, the dependence must (according to generally—but not unanimously—accepted principles of cosmology) be causal, meaning that the dependence \( \psi (t)=\psi \left(t;\left\{x(v):v<t\right\}\right) \) is possible, but v in this expression cannot be allowed to exceed t. This clarification can be summarized as follow:
The manner in which present cyclicity of a phenomenon departs from being regular can depend on past behavior of the phenomenon.
(The mathematical question of whether or not this mathematical model should be modified to allow v to equal t is not addressed here.)
As a final introductory remark on time warping and cyclicity, let us take into account the fact that data obtained from measurement/observation of physical phenomena cannot, in reality, exhibit exact RSC. This property is a mathematical idealization of physical reality. The extent to which data departs from exact RSC sets an upper limit on how long sinusoidally weighted time averages (with sinusoid frequencies approximately equal to the data’s cycle repetition frequency and its harmonics) of the data, and/or timeinvariant nonlinear transformations of the data, can be integrated (without dividing the integral by the integration time to produce an average value) before the magnitude of the result stops growing with a linear trend with increasing integration time. This upper limit is here referred to as the cycle coherence time (CCT)—the maximum length of time over which the cycle frequency is stable. (This is distinct from the cyclic coherence time, which is defined to be the width of the cyclic autocorrelation function of the data—the maximum length of time separation (lag) for which time samples are cyclically correlated.)
The objective of the data processing methods presented herein is to increase the cycle coherence time (CCT) of datarender the data’s statistical cyclicity more regular or less irregular (increase RSC or decrease ISC)  enough to be able to achieve coherent cyclic processing gain sufficient for the informationextraction task at hand.
Generally speaking, when unintentional time warping of data exhibiting some level of inherent regularity of statistical cyclicity decreases the data’s CCT (renders it less RSC or more ISC), the purpose of the dewarping described herein is to recover the longer CCT. But the same dewarping methods (but possibly better interpreted in this case as warping methods) can produce a substantial increase in CCT when the cyclicity is inherently irregular even though the unprocessed data has not been subjected to any time warping; in fact, useful levels of CCT can be obtained in some cases even when the original data exhibits such highly irregular statistical cyclicity that its CCT is negligible to start with.
Dewarping to restore cyclostationarity
If y(t) is dewarped using the inverse of the warping function that has transformed RSC in the data into ISC, the regular statistical cyclicity present in x(t) is recovered:
or, changing the variable’s label from s to t, we obtain
More generally, assuming that ψ^{−1}(t) is completely or, at least, partially unknown, it is in principle possible to estimate it from the observed data y(t) by searching for the particular function φ(t) (an estimate of ψ^{−1}(t)) that maximizes the strength of a measurement of some cyclic feature, such as the cyclic autocorrelation function, for the candidate dewarped data
at one or more values of lag τ and cycle frequency α, where \( {x}_{\varphi }(t)=y\left(\varphi (t)\right)=x\Big(\psi \left(\left[\varphi (t)\right]\right)=x(t) \) for \( \varphi (t)={\psi}^{1}(t) \). In some applications, doing this jointly for appropriate multiple values of τ and α can improve the quality of the estimate. In such cases, the most appropriate values for τ may be different from one value of α to another. In practice, such values may be determinable only by experimentation with trial values. Nevertheless, it is stated here, on the basis of decades of experience, that more often than not the magnitude of \( {R}_{xx^{\left(\ast \right)}}^{\alpha}\left(\tau \right) \) peaks at \( \tau =0 \) for physically realistic models of x(t).
For any valid cycle frequency α and lag value τ for which \( {R}_{xx^{\left(\ast \right)}}^{\alpha}\left(\tau \right) \) is not zero and not negligibly small, the propertyrestoral optimization proposed here is
where \( {\widehat{R}}_{x_{\varphi}}^{\alpha}\left(\tau \right) \) is a shorthand notation (the double subscript used in Section 3 is replaced with a single subscript from this point forward) for a measurement (estimate) of the cyclic autocorrelation of x_{φ}(t) obtained from a finite timeaveraging interval (and, of course, no expectation operation):
If no valid cycle frequency α for x(t) is known, then this parameter also must be searched over in the optimization Eq. 8. One possibility for initializing the estimate of α is described in Section 11, where there also is described a possibility for initializing φ.
The values of cycle frequency α for which Eq. 8 is a valid objective function for dewarping include any/all cycle frequencies for which the cyclic correlation coefficient for x(t) is nonnegligible (not much less than unity in magnitude). An alternative to the singlecycle objective function Eq. 8 is a multicycle objective function which can be either a sum over cycle frequencies of squared magnitudes of cyclic autocorrelation functions or a sum of the complex values of cyclic autocorrelations. The latter may perform best, but it may be impractical in many cases because of the need for equalizing the phases of the signal component in each term in order to obtain coherent addition (cf. literature on maximumlikelihood multicycle detectors). Another alternative is to sum squared magnitudes of cyclic autocorrelations over multiple lag values τ (for either one or multiple values of α). As illustrated in the example presented in Section 14, multiple harmonically related values of α could be useful as could a range of lag values τ centered at \( \tau =0 \). However, in that example, the strongest cyclic autocorrelation value occurs at the first harmonic and at a lag of zero.
Substituting Eq. 7 into Eq. 9 yields the measured statistic whose squared magnitude is the performance functional to be maximized w.r.t the candidate dewarping function φ:
It is important to recognize that, although the above concept and method is presented as if the observed data arose from the time warping of other data that exhibited regular cyclicity prior to time warping, there is in fact no need for this conceptual model—no need for there to be an underlying physical mechanism exhibiting regular statistical cyclicity that is then transformed into irregular statistical cyclicity by some actual timewarping process. Direct sources of ISC (e.g., an EKG from a beating heart or many other biological functions that naturally produce ISC, or some longterm climate, geological, or celestial data, etc.) can, in principle, be dewarped in many cases even if it were not warped to start with. However, in this case, we should say it can be “warped” not “dewarped” since there is no original warping to be removed. To summarize:
The objective addressed by the theory and method presented here is twofold:

(i)
To convert naturally occurring ISC in data into RSC (or at least increase the data’s CCT) by time warping—thereby rendering the converted data (more) amenable to CS and/or polyCS data processing techniques, algorithms, and theory;

(ii)
To dewarp time in data that exhibited RSC prior to having been subjected to time warping—thereby increasing the data’s CCT, rendering it more amenable to CS and/or polyCS data processing techniques, algorithms, and theory.
The optimization method based on Eq. 8 is an example of a propertyrestoral method for blind adaptation (learning without training data). Introductions to cyclostationarity restoral for blind adaptive spatial filtering and frequencyshift spectral filtering for suppression of additive noise and interfering signals, and to joint cyclostationarity restoral for timedifferenceofarrival estimation in the presence of additive noise and interfering signals are presented in [1,2,3, 13, 14].
Warping compensation instead of dewarping
It is shown here that the search for an optimum dewarping function \( \varphi \equiv {\widehat{\psi}}^{1}\cong {\psi}^{1} \), by the method described in Section 4, can be transformed into an equivalent search for an optimum warping compensation function \( {\varphi}^{1}\equiv \widehat{\psi}\cong \psi \). Such a function, once found, can then be inverted if it is desired to dewarp the data. By using the definition
the measured statistic Eq. 10 to be used for optimization of φ can be reexpressed as follows (using the change of variables \( u=\varphi (t) \)):
where T denotes the length of the averaging interval T ≜ [t_{o}, t_{o} + T] (with some abuse of notation) and, similarly, for the dewarped averaging interval:
and where the approximation in the last line of Eq. 12 is
This approximation is accurate when φ(t) is accurately approximated as linear over intervals no longer than the width of the function \( {R}_x^{\alpha}\left(\cdot \right)\equiv {R}_{xx^{\left(\ast \right)}}^{\alpha}\left(\cdot \right) \), the cyclic coherence time.
The inverse φ^{−1}(t) of the candidate dewarping function is a candidate warpingcompensation function. Equation 12 indicates that an estimate of the warping function \( {\varphi}^{1}\equiv \widehat{\psi}\cong \psi \), from which its derivative \( {\dot{\varphi}}^{1}(u)\cong \dot{\psi}(u) \) can be obtained, can be used to compensate for warping in the data by time warping the sinusoids and scaling, in a timevarying manner, their amplitudes and the lags used in the data to compute the cyclic autocorrelations. One can use Eq. 12 in Eq. 8 and search over φ^{−1} ≅ ψ instead of φ ≅ ψ^{−1}, to directly find the warpingcompensation function φ^{−1} ≅ ψ; or, one can use Eq. 10 in Eq. 8 to search directly for the datadewarping function φ ≅ ψ^{−1}. The relative advantages and disadvantages of these two theoretically approximately equivalent approaches are expected to involve somewhat complicated tradeoffs between algorithmic efficiency and estimation accuracy. For an iterative search algorithm of the sort described in Section 10, the efficiency depends on computational complexity and storage requirements per iteration, and the number of iterations required for convergence. There are tradeoffs among these three efficiency parameters for a specified level of estimation accuracy. And there are also tradeoffs between estimation accuracy and algorithmic efficiency. Especially important is the need for schemes, such as extensive diverse initializations of the iterative algorithm, which avoid mistaking substantially suboptimum local maxima (of which there can be many) for the desired global maximum. These important topics on search algorithm research are outside the scope of this paper.
Error analysis
Substituting Eq. 1 into Eq. 10 and using an estimate \( \widehat{\alpha} \) of a cycle frequency yields
If φ is not exactly equal to the inverse of ψ, then there is some dewarpingfunction error, which is denoted by \( {e}_{\varphi}\triangleq \varphi {\psi}^{1}\equiv {\widehat{\psi}}^{1}{\psi}^{1} \). This error may be due to error in estimating ψ(t) and or error in inverting the estimated ψ(t) or error in estimating ψ^{1}(t) directly.
In terms of this error, we have
in which e(t) denotes the time dewarping error created by the dewarpingfunction error:
Using this error definition, Eq. 13 can be reexpressed as
Assuming that \( {\widehat{\psi}}^{1}(t)=\varphi (t) \) and, therefore, e(t) is not statistically dependent on x(t) (this is, at best, an approximation when φ(t) is determined from x(t) in some dataadaptive manner, because then there is a deterministic relationship between e(t) and x(t)), the probabilistic expected value (w.r.t the probability density function for x(t)) of Eq. 16 is given by
where
This expression Eq. 17a also holds if e(t) is statistically dependent on x(t) provided that the expectation is conditional on the x(t)‐dependent e(t) being any particular function of time. For an exact estimate of a cycle frequency, \( \widehat{\alpha}={\alpha}_o \), Eq. 17a reduces to
For a signal model with a specified set of cyclic autocorrelation functions \( \left\{{R}_x^{\alpha}\left(\tau \right)\right\} \), indexed by cycle frequency, Eqs. 17a and 17b can be used to study the sensitivity of the expected value of the objective function in Eq. 8 to the dewarping error and/or cycle frequency error.
The second term in Eq. 17b is called cycle leakage [2] (the term cyclic leakage used by some authors is conceptually misleading—the leakage is not cyclic; it represents the amount of the cyclic feature for each and every cycle frequency α and strength and phase \( {R}_x^{\alpha}\left(\tau \right) \) that leaks into the measurement of the feature with cycle frequency \( \widehat{\alpha}={\alpha}_o \) and strength and phase \( {R}_x^{\alpha_o}\left(\tau \right) \)). If the cyclicity, with cycle frequency α − α_{o}, in the product of the two factors in the sum in Eq. 17b that depends on t through the quantities e(t) and \( {\left\langle \dot{e}(t)\right\rangle}_{\tau } \) is negligible, this leakage term approaches zero as the averaging time T grows without bound. The first term in Eq. 17b is a time average of the actual cyclic autocorrelation prior to time warping in x(t) and subjected to timevariant lag shift and complexamplitude scaling.
If \( \widehat{\alpha}=\alpha \) is invalid for every cycle frequency α exhibited by x(t), then the first term in the right member of Eq. 17b vanishes, and the value of the left member is due entirely to cycle leakage—the second term in the right member with α_{o} replaced by \( \widehat{\alpha} \). Also, it is noted that for sufficiently slowly varying e(t), defined by \( \left{\left\langle \dot{e}(t)\right\rangle}_{\tau}\right<<1 \), Eq. 17b is closely approximated by
in which the lag smoothing is negligible and the weighting function in the first term of the right member is a timeindependent scalar β that can be reexpressed as \( \beta ={\left\langle \cos \left(2{\pi \alpha}_oe(t)\right)\right\rangle}_T+j{\left\langle \sin \left(2{\pi \alpha}_oe(t)\right)\right\rangle}_T \). It can be seen that, if e(t) has an approximately even fractionoftime amplitude density function, then the first term in β dominates the second term; the same result holds without this evenness assumption if the error is small enough, say e(t) < 1/8α_{o}, and in this case the dominant term is close to 1. Therefore, the lower the cycle frequency α_{o} is, the larger the dewarping error that can be tolerated without significant attenuation of the actual cyclic autocorrelation, provided that α_{o}e(t) << 1. In the event that \( \left{\left\langle \dot{e}(t)\right\rangle}_{\tau}\right<<1 \) is not satisfied, the first term in Eq. 18a is still a close approximation if \( {R}_x^{\alpha_o}\left(\tau \right) \) varies very little over the range of \( {\left\langle \dot{e}(t)\right\rangle}_{\tau}\tau \) for fixed τ; this is satisfied, regardless of the size of the error e(t), if \( \tau =0 \) is selected for use.
For \( \tau =0 \), Eq. 17a reduces to
Generally speaking, it is to be expected that the more ψ(t) or ψ^{−1}(t) deviates from t (the stronger the warping or required dewarping), the larger e_{φ}(t) and therefore e(t) is likely to be and, as a consequence revealed by Eq. 17a, the weaker the cyclic autocorrelation of the dewarped data is likely to be. Interestingly, the scale for quantifying the size of the timing error e(t) can be seen from the first term in Eq. 17b to be determined by the cycle frequency estimate \( \widehat{\alpha} \)=α_{o}.
For exact dewarping, e(t) ≡ 0, Eq. 17b reduces exactly to
and the expected value of the estimated cyclic autocorrelation is equal to the actual cyclic autocorrelation prior to time warping in x(t) plus the cycle leakage term, which is inversely proportional to π(α − α_{o})T. In the remaining sections of this paper, T will be used in place of T to denote the length of the integration interval.
Basisfunction expansion of dewarping function
To reduce the infinitedimensional optimization problem in Eqs. 8 and 10 (searching over all functions φ(t) defined on the time interval [t_{o}, t_{o} + T] for any start time t_{o}), we can use the finite dimensional approximation
where \( {\left({c}_k(t);t\in \left[{t}_o,{t}_o+T\right]\right)}_{k=1}^K \) comprise a linearly independent set of functions chosen according to any available information about the timewarping function ψ(t) or its inverse ψ^{−1}(t), and where a^{T} denotes the row vector obtained by matrix transposition of the column vector a. For example, knowing nothing more than the spectral bandwidth of ψ^{−1}(t), it is known that an optimum (minimumdimension) set of basis functions that spans the space of all functions of duration no more than T and positivefrequency bandwidth of no more than \( {B}_{\psi^{1}} \) Hz consists of \( K=4{B}_{\psi^{1}}T \) prolate spheroidal wave functions [15]. For all other sets of functions (that are not equivalent to this set in the sense of not being just K linearly independent linear combinations of the members of this set), larger values of K are required to span this same space.
If the functions used in Eq. 19a are chosen to be orthogonal, then the mean squared value of the error \( {e}_{\varphi }(t)\triangleq {\widehat{\psi}}^{1}(t){\psi}^{1}(t) \) is minimized by K independent minimizations of this same mean squared error w.r.t. to the K unknowns {a_{k}} executed in any order. But this does not imply that the mean squared value of the error e(t) defined by Eq. 14 behaves similarly. In general, a full joint minimization of this mean squared error w.r.t. all K unknowns {a_{k}} must be executed. Yet, a perturbation analysis suggests that, for sufficiently small e_{φ}(t), K independent minimizations of the mean squared value of the error e(t) using orthogonal basis functions yields approximately the same result as a single joint minimization.
With no knowledge at all about ψ^{−1}(t), except that it varies smoothly, harmonically related sinusoids or polynomials may be reasonable choices for {c_{k}(t)}.
However, since ψ(t) will essentially always contain the additive term γt, where \( \gamma =1 \) unless there exists a constantvelocity Doppler effect in the data y(t), and since ψ(t) will also contain the term ηt^{2} if the data is affected by constant acceleration, then \( {\widehat{\psi}}^{1}(t) \) may contain related terms like μt or νt^{1/2}. Consequently, even when sinusoids are used as basis functions, a more efficient approximate representation of ψ^{−1}(t) may be obtained by adding a few such terms. For example, Eq. 19a can be replaced with
where the vector a has dimension K + 2 with the first two elements being \( {a}_{1}=\nu \) and \( {a}_0=\mu \). If \( \nu =0 \), then the dimension can be reduced to K + 1 and, if \( \mu =1 \), then \( {a}_1=1 \) is fixed. In the case of Eq. 19b, \( {B}_{\psi^{1}} \) in the requirement \( K\ge 4{B}_{\psi^{1}}T \) might be taken to be the bandwidth of the component \( {\widehat{\psi}}^{1}(t)\mu t\nu {t}^{1/2} \) of \( {\widehat{\psi}}^{1}(t) \), which could be more welldefined than the bandwidth of \( {\widehat{\psi}}^{1}(t) \).
Using Eq. 19, the set of equations Eq. 8 through Eq. 10 Eqs. 19a or 19b reduces to
where
Once the optimum vector of coefficients \( \mathbf{a}={\mathbf{a}}_o \)is found from Eq. 8, the data transformation from Eq. 7,
approximately dewarps y(t) to produce an approximation to the CS (or polyCS) data x(t). Stated another way, the regularity of cyclicity of \( y\left({\mathbf{a}}_o^T\mathbf{c}(t)\right) \) is higher than that of y(t).
If the largest value of lag τ, at which the statistical dependence of x(t) and x*(t − τ) is not negligibly small, is denoted by τ_{max} , then all pairs of lag products that are separated in time by at least τ_{max} + τ will be statistically independent. In this case, a generally applicable condition on the integration time T for obtaining a statistically reliable estimate of \( {R}_x^{\alpha}\left(\tau \right) \) for all \( \left\tau \right\le {\tau}_{\mathrm{max}} \) is \( \sqrt{T}>>\sqrt{2{\tau}_{\mathrm{max}}} \). Here τ_{max} upper bounds the coherence time of the data, but \( {\tau}_{\mathrm{max}}+\left\tau \right\le 2{\tau}_{\mathrm{max}} \) upper bounds the coherence time of the lag product of the data. In fact, the coefficient of variation of the estimate (the ratio of its standard deviation to the magnitude of its mean) is, under relatively broad conditions, roughly equal to \( \sqrt{\left({\tau}_{\mathrm{max}}+\left\tau \right\right)/T}\le \)\( \sqrt{2{\tau}_{\mathrm{max}}/T} \). When possible, a value of \( \sqrt{2{\tau}_{\mathrm{max}}/T} \) as small as 10% (\( T=200{\tau}_{\mathrm{max}} \)) or even smaller is generally desirable; however, if the data with warped cyclicity (call it the signal) is corrupted by additive noise, with a signaltonoise ratio (SNR) of average powers or mean squared values that is not sufficiently high, then T may need to be considerably larger.
The approximation B_{x} ≅ 1/τ_{max} is generally useful for the positivefrequency bandwidth of the power spectral density function of x(t). (The exact relationship depends on the exact functional shape of the PSD and cyclic autocorrelation, and the particular definitions of width B_{x}, τ_{max}.) Using this in the above reliability condition, together with the accuracy condition \( K\ge 4{B}_{\psi^{1}}T \) (where, for Eq. 19b, \( {B}_{\psi^{1}} \) is the bandwidth of the component ψ^{−1}(t) − μt − νt^{1/2}), yields the alternative expression
where for, high SNR, the symbol >> as used here means at least 100 times greater as discussed above. The larger the ratio of bandwidths \( {B}_{\psi^{1}}/{B}_x \), the larger the number K of basis functions required to dewarp the data, unless the warping function is known except for the values of a “few” parameters, as illustrated below with two examples. To quantify >> in relation Eq. 23 for SNR that is not high, one needs to know the value of SNR. This is addressed in Section 13.
For applications involving lowSNR data (e.g., SNR as low as 0 dB down to, say, − 20 dB), which is one of the reasons CS processing is of interest [2, 3, 7, 8], one may need a timebandwidth product B_{x}T as large as, say, 10,000 to 1,000,000, instead of only 200 as in the case of high SNR.
As explained in Section 10, the iterative search algorithm proposed there is most practical when an analytical expression for the gradient vector of the objective function in Eq. 20 is available. Using standard differentiation methods for complex functions of a real variable, the following gradient expression can be derived from Eq. 21:
For the case in which φ(t) is given by Eqs. 19a or 19b, we have
Equations 24 and 25 are valid as written as long as the vector a is real valued, provided that one simply interprets the gradient symbol to mean the sum of the gradients of the real and imaginary parts of the function: \( \nabla \left(\mathrm{re}+j\mathrm{im}\right)=\mathrm{\nabla re}+j\mathrm{\nabla im} \). However, when a is complex valued (as it will be when c(t) is complex valued, e.g., complex sinusoids), the theory of complex gradient operators is needed to obtain the correct modification of these equations, cf. [16].
Inversion of warping functions
To obtain the best results, the choice for the basis functions \( {\left[{c}_k(t)\right]}_{k=1}^K \) that is most efficient should be sought. For example, if the functional form of a time warp is known, it is sometime possible to deduce the functional form of its inverse. If the timewarp is a timevarying delay and/or advance, \( \psi (t)=t+\delta (t) \) as it is when the warping is due to the Doppler effect resulting from possibly timevarying velocities and/or accelerations of data sensors and/or sources, then by using the definition s ≜ ψ(t), we obtain the equation \( {\psi}^{1}(s)=s\delta \left({\psi}^{1}(s)\right)=t+\delta (t)\delta (t)=t \)for the inverse. In some cases of practical interest, the equation \( {\psi}^{1}(s)=s\delta \left({\psi}^{1}(s)\right) \)can be solved for ψ^{−1}(s). Some examples follow. But first, it is mentioned that the approximate inverse ψ^{−1}(s) ≅ s − δ(s) can be quite accurate if the constraint δ(t)/t << 1 is satisfied sufficiently strongly for the range of time values of interest.
For constant velocities of a source and sensor moving along a single straight line, \( \delta (t)=a\left(t+b\right) \)and the solution to the above equation is \( {\psi}^{1}(s)=\left(s ab\right)/\left(1+a\right)={a}^{\prime}\left(s{b}^{\prime}\right) \). But, more generally, constant velocities lead to a quadratic equation for δ(t) having 3 coefficients that are quadratic (or lowerorder) functions of elapsed time, and quadratic (or lowerorder) functionals of the velocity vectors and initialposition vectors. For constant accelerations δ(t) is the solution to a quartic equation, whose four coefficients are quartic (or lowerorder) functions of elapsed time, and quadratic (or lowerorder) functionals of the acceleration vectors and initial velocity and position vectors.
For the sake of simplicity in demonstrating exact inversion, a time advance that grows quadratically with time, \( \delta (t)={at}^2 \) is considered here; then, we have
This quadratic equation has the solution
which can be verified by substituting \( s=t+{at}^2 \). If we knew the timewarp had this functional form, we could simply use this form and maximize the objective function w.r.t. the single unknown parameter a. For example, the search algorithm presented in Section 10 could be used by calculating the gradient of the objective function w.r.t. a in \( {\widehat{\psi}}^{1} \) . Otherwise, we could choose a set of polynomials of order K or any other basis functions as in Eqs. 19–21.
As another example, if the time warping is believed to be a timevarying compression and/or expansion of time, \( \psi (t)= t\varepsilon (t) \), say a linearly growing compression \( \varepsilon (t)= at \), then we have \( \psi (t)={at}^2=s \) and \( {\psi}^{1}(s)=\sqrt{s/a} \). Again, if this form were known, we could use it and maximize the objective function w.r.t. the unknown parameter a. Otherwise, we could choose a set of polynomials of order K or any other basis functions as in Eqs. 19–21.
Basisfunction expansion of warpingcompensation function
By analogy with the approach described in Section 7, we can use a finite dimensional approximation to reduce the infinitedimensional optimization problem in Eqs. 8 and 12: searching over all functions φ^{−1}(t) defined on the time interval [φ^{−1}(t_{o}), φ^{−1}(t_{o} + T)] for any start time t_{o}. The analog of Eq. 19b is
Although the set of basis functions used in Eq. 19 for representing φ(t) may very well be the same as those used in Eq. 26 for φ^{−1}(t), they also can be different. If they are the same, the vectors of coefficients a and b will certainly differ from each other. It is expected that, when using the concepts in this paper, one would typically choose to use either Eq. 19 or Eq. 26, but not both except possibly for the objective of comparing these two approaches and selecting the one that seems to perform best according to any criteria the user may select.
Substituting Eq. 26 and its consequence \( {\dot{\varphi}}^{1}(t)={\mathbf{b}}^T\dot{\mathbf{c}}(t) \) into Eq. 12 yields
Under the condition that the interval of integration in Eq. 27a is not substantially different from [t_{o}, t_{o} + T] on a percentage of overlap basis, Eq. 27a is usefully approximated by
which has the advantage of not requiring inversion of the function \( {\widehat{\varphi}}^{1}(t)={\mathbf{b}}^T\mathbf{c}(t) \) to obtain \( \widehat{\varphi}(t) \) at every search point b. In fact, no function inversions are required by Eq. 27b. An example for which this condition leading to Eq. 27b might not be met is a substantial expansion or contraction of time over the entire interval [t_{o}, t_{o} + T]. But, even then, a substantial change in the length of the interval of integration need not have a significant impact on the gradient.
The gradient w.r.t. b of the squared magnitude of the statistic Eq. 27 is given by
where
In Eq. 29, the integrand is given by
within which we can use the equation
to obtain the reexpression
For both approaches described in this Section 9 and Section 7, the continuous time data represented by the mathematics here must be timewarped or dewarped to achieve the indicated timewarping compensation in Eq. 31a or dewarping in Eq. 25 in order to evaluate the gradient at every iteration of the iterative gradientascent search method described in Section 10. Assuming that timesampled data is used in the search algorithm for optimization described in the Section 10, the rewarping is implemented by reinterpolating the data. The computation and storage cost here depend significantly on the amount of data being processed. Using the minimum timesampling rate 4B_{x}, that avoids overlap of both aliased cycles and aliased spectral components, and a timebandwidth product of \( {B}_xT=200 \), we generally need about 800 time samples. But for low SNR (e.g., 0 dB or less), the number needed can be considerably larger (see Eq. 42).
Because K ≥ 4B_{ψ}T, where B_{ψ} is the bandwidth of the component ψ(t) − γt − ηt^{2} of ψ(t), the stronger the inequality T >> 2/B_{x} that is required due to low SNR, the larger K, the number of basis functions, must be: K ≥ 4B_{ψ}T >> 8B_{ψ}/B_{x}. This is the counterpart, for warping compensation, to the requirement Eq. 23 for dewarping.
If the computational cost of the data interpolation is excessive, the following approach to circumventing data interpolation altogether should be considered. The term \( u+\tau /{\mathbf{b}}^T\dot{\mathbf{c}}(u) \) in Eqs. 27–31a represents the only timeinterpolation of the data that is required when using the warpingcompensation method. When the cyclic autocorrelation of x(t) is nonnegligible at zero lag \( \tau =0 \) (which is where the maximum commonly occurs), one can choose to use only this value of lag, in which case \( \tau /{\mathbf{b}}^T\dot{\mathbf{c}}(u)=0 \) and the need for data interpolation vanishes. As one example, applications where \( \tau =0 \) actually produces a maximum cyclic autocorrelation magnitude include those for which the signal consists of a periodic pulse train with random amplitudes and (i) flat pulses having width ≤ T_{o}/2 (e.g., Eq. 43 with all \( {h}_n=1 \) and p(t) equal to a rectangle of width W ≤ T_{o}/2), or (ii) arbitrary pulses having nonnegative Fourier transforms. In contrast, for flat pulses having width W > T_{o}/2 (case (iii)), the value of the cyclic autocorrelation of x(t) at \( \tau =0 \) approaches zero as W → T_{o}.
For \( \tau =0 \), Eq. 31a reduces to
In applications where the cyclic autocorrelation at zero lag is too weak to be used for warping compensation (e.g., case (iii) in the example above, with W close to T_{o}), but is sufficiently strong at one or more nonzero values of lag (e.g., \( \tau ={T}_o/2 \) in the same example with \( W={T}_o \)) for which the timevarying lag \( \tau /{\mathbf{b}}^T\dot{\mathbf{c}}(u) \) has dynamic range over u ∈ [t_{o}, t_{o} + T] that is small compared with some representative value, say τ/g, one can consider replacing \( \tau /{\mathbf{b}}^T\dot{\mathbf{c}}(u) \) with the constant τ/g, rounded off to the nearest integer multiple of the datasampling timeincrement, which eliminates the need for data interpolation. For example, if \( {\mathbf{b}}^T\mathbf{c}(u)= gu+\varepsilon (u) \) and \( \left\dot{\varepsilon}(t)\right<<g \), then \( {\mathbf{b}}^T\dot{\mathbf{c}}(u)\cong g \). In this case, Eq. 31a reduces to the close approximation:
As another simplifying approximation, if the timewarping function varies slowly enough, relative to the cycle frequency, \( \left\dot{\mathbf{c}}(u)\right<<2\pi \left\alpha g\mathbf{c}(u)\right \), then the \( \dot{\mathbf{c}}(u) \) term in Eq. 31b can possibly be deleted without having much impact on the gradient.
Iterative gradientascent search algorithm
Let J(a) be an objective function—such as that in Eqs. 8 and 10, or Eqs. 20 and 21, or Eqs. 20 and 27—that is maximized when some vector of parameters such as a in Eq. 21 or b in Eq. 27, generically denoted by a in this section, take on optimum values. There are many varied options for optimization algorithms. Solely for purposes of illustration, the algorithm selected here to optimize a is Cauchy’s iterative gradientascent search algorithm, modified by use of an alternative stepsize sequence: the candidate solution \( {\widehat{\mathbf{a}}}^{(i)} \) at iteration i is updated to obtain the next candidate solution \( {\widehat{\mathbf{a}}}^{\left(i+1\right)} \) at iteration i + 1 by moving from the ith candidate location in the direction of steepest ascent on the hypersurface J(a) (when the stepsize parameter is positive), which is the direction of the gradient vector evaluated at the ith location:
where ∇J(a) = [∂J(a)/∂a_{1}, ∂J(a)/∂a_{2}, …, ∂J(a)/∂a_{K}]^{T} is the gradient vector and μ(i) is the step size at iteration i. The following formula for the stepsize sequence is one of many that have been proposed for iterative gradientascent algorithms:
It has been reported to produce rapid convergence in a computationally efficient manner [17]. This step size can be reexpressed as
from which it can be seen that the current step size is a scaled version of the previous step size, and the scalar consists of two factors: (i) the reciprocal of the size of the change in the gradient vector from previous to current step and (ii) the inner product between the previous gradient vector and the unit vector with direction equal to that of the difference between the previous and current gradient vectors. Consequently, if there is a large change in the gradient vector, then the first factor scales down the current step size from the previous one. The second factor can both scale the step size up or down and change its sign, which happens when the current gradient is larger and oppositely directed relative to the previous gradient. This possibility of occasional gradient descent, instead of consistent gradient ascent, means that Eq. 32, with stepsize sequence Eq. 33, is generally a gradientascent algorithm but can occasionally produce a gradient descent to find a better point from which to resume ascending.
An alternative to the singleformula approach in Eq. 33 to specifying a stepsize sequence, which has been said to better accommodate troublesome surfaces, is to alternate between two or more formulas for the stepsize sequence [18].
The stepsize algorithm Eq. 33 was used in the simulations reported in Section 14 simply as an example. The search algorithm Eqs. 32–33, like many search algorithms, may be challenged by either multimodal objective functions containing local peaks above hyperplanes of any dimension M from 1 to N and/or otherwise troublesome surfaces, such as those containing long narrow ridges of any dimension M from 2 to N. In some such cases, another search technique may be used in concert with Eqs. 32 and 33. There are many varied options among which are those that simply select multiple starting locations \( \left\{{\widehat{\mathbf{a}}}_n^{(0)}:n=1,2,\dots, N\right\} \) on a possibly uniform grid of points covering what is considered to be the smallest admissible region of the search space and run the iteration Eqs. 32 and 33 for each and every starting location, and then simply compare all the local maxima found and select the largest one. This is called “bruteforce” or “exhaustive” initialization.
Synchronizationsequence methods
Two methods are described here, one for initializing the iterative gradientascent algorithm described in Section 10 and another for replacing that algorithm with an elegant twostep method.
Another approach to initializing the iterative gradientascent method of Section 10, which can be used instead of the bruteforce method, is as follows. A synchronization sequence is defined to be a point process for which the time locations of the points are the times at which some particular repeating feature in the data exhibiting irregular cyclicity is detected. The feature must be one that occurs, for the most part, once every cycle of the data exhibiting irregular cyclicity, and the phase at which it occurs in the cycle (the fraction of the cycle period that has elapsed when the feature occurs) must be, for the most part, approximately constant for best results. The assumed identifiability of such features limits applications of this method, but its simplicity merits its mention here.
Examples of such features include peaks, doublets (a positive (or negative) peak followed by a negative (or positive) peak), an end point of a “quiet subinterval” or “dead zone,” and the start time of a decaying oscillatory burst. For example, such a feature might be detectable in an EKG from one pair of sensors with a particular placement on the body while the data from another pair of sensors at some other particular placement is that which is to be analyzed. Or, the synchronization sequence could simply be the time of sunrise and/or sunset or other observable cyclic event like the time of full eclipse of Sun by Earth's Moon as observed from some specified location on Earth; or the time of onset of some specific easily identifiable phase of a cyclic chemical process or any other process that is cyclic. As discussed below, the more features per cycle that can be identified, the better.
One can synthesize a smooth dewarping function \( {\widehat{\psi}}^{1}(t) \) from this point process. One approach, for the case in which the cycle frequency is unknown and there is one feature per cycle being detected is to first solve for the bestfitting set of equally spaced time points \( {\left\{\delta +{nT}_o\right\}}_{n=1}^N \) to the measured set of unequally spaced points \( {\left\{{t}_n\right\}}_{n=1}^N \). If the sum of squared differences between the values of these two sets of points is minimized w.r.t. the fixed timeseparation value (period) T_{o} and the timing offset δ,
then the optimum values are given by
The implicit form of Eq. 34b is interesting because the solution is fully described in terms of temporal averages and correlations of the timeseries {t_{n}} and {n}, and the mean square value of {n}; but, because these two simultaneous equations are linear, they can easily be explicitly solved for each of \( \widehat{\delta} \) and \( {\widehat{T}}_o \).
As an aside, this is a possible method for initializing the unknown cycle frequency, \( \widehat{\alpha}=1/{\widehat{T}}_o \) or, possibly \( \widehat{\alpha}=2/{\widehat{T}}_o \) even if some other approach to initializing the search over the coefficient vector a is to be used.
As a second step, one can find a smooth dewarping function \( {\widehat{\psi}}^{1} \) that at least approximately satisfies \( {\widehat{\psi}}^{1}\left(\widehat{\delta}+n{\widehat{T}}_o\right)={t}_n,n=1,2,\dots, N \). This could be done by simply doing a least squares fit of the linear combination of the basis functions in Eq. 19 to the point process at the N measured time points:
Being a linear least squares problem, the solution is obtained by simply solving a set of N linear equations in K unknowns, a, obtained by equating to zero the gradient of the quadratic objective function in Eq. 34c, with \( {\widehat{\psi}}^{1}(t)={\mathbf{a}}^T\mathbf{c}(t) \) substituted in. This approach cannot be expected to perform well if N is not large enough compared with K. However, the required value of K typically increases with increasing length T of data record, so the ratio N/K cannot be increased simply by increasing T. Instead the average number of features identified per interval of length \( {\widehat{T}}_o \) must be increased, if possible.
If there are M identifiable features per cycle, then we can simply replace the single objective function in Eq. 34a with a sum of M such functions, each with its own timing offset δ^{(m)}and featureoccurrence times \( {t}_n^{(m)} \) and all sharing the same average period T_{o}:
In this case, the solution Eq. 34b is replaced with
Similarly, Eq. 34c becomes
The performance can be expected to improve as the inequality NM/K >> 1 is strengthened. An example of \( M=3 \) is the wellknown QRS complex in an EKG signal, which is repeated with every heartbeat. As in the case of the implicit solution Eq. 34b, Eq. 35b is a set of M + 1 simultaneous linear equations that are fully described in terms of temporal averages and correlations of \( \left\{{t}_n^{(m)}\right\} \) and {n}, and the mean square value of {n}, and that are easily solved explicitly.
Besides using synchronization sequences \( {\left\{{t}_n^{(m)}\right\}}_{n,m=1}^{N,M} \) for initialization of the iterative algorithm of Section 10, the initial time dewarping candidate \( {\widehat{\psi}}^{1}(t) \) specified by Eq. 34c or Eq. 35c also can possibly be used to assess which of the conditions on ψ(t), specified in this paper for validating approximations that are to be used in designing the search algorithm and predicting performance, are satisfied—although the solution to Eq. 36 below might be better suited.
The second method based upon synchronization sequences is revealed by the suggestion that the function \( {\widehat{\psi}}^{1}(t) \) found in this elegant manner could conceivably be accurate enough to serve as the final dewarping function. Similarly, a warping function can be found by replacing Eq. 34c with Eq. 36
or its generalization for M > 1. When this approach is used, performance might be improved by identifying and discarding outliers in the data set\( \left\{{t}_n^{(m)}\right\} \).
Unfortunately, there are many applications in which repeating features are hidden in erratic background fluctuations, preventing identification of such features and ruling out the methods of this section.
Pace irregularity
As inferred in preceding sections, dewarping or warping compensation may not be possible for severe cases of time warping. In addition, there also are types of irregular cyclicity that are not due to time warping or cannot be modeled as timewarped regular cyclicity. In some such cases, it is possible that none of the methods described in this and preceding sections would be able to adequately reduce the irregularity of cyclicity. To illustrate, the following baseline signal model is considered:
where the pulses {q_{n}(t)} are random, independent, and identically distributed, and the synchronization times {t_{n}} for some repeating feature in y(t) or in the probability density function for the random process y(t), due to the behavior of q_{n}(t) at say t_{∗}, satisfy \( \psi \left({t}_n\right)={t}_{\ast }+{nT}_o \). This is expected to be a useful model for electrocardiograms with timevarying heart rate and corresponding timevarying width of the pulse complex within each beat. In general, signals of this type can be dewarped using the methods discussed in preceding sections, since
which is cyclostationary. However, if the baseline signal model is changed to the following paceirregular model
then attempting to dewarp (using the symbol θ, instead of s as in Section 8, for ψ(t)) yields
which is not cyclostationary as a function of θ unless the timewarping function is linear, \( \psi (t)=\omega t \) for some constant ω, in which case the original data y(t) also is cyclostationary.
The paceirregular model Eq. 37a, and associated rotationangle model (with unwrapped angle θ), as Eq. 37b shall be called, is useful for random vibrations y(t) from rotating machinery with a fault point in some rotating component. In this model, the pulses or bursts {q_{n}(t)} represent the machine structure’s vibration response (typically damped ringing), which is modeled as independent of the times of occurrence of the causative impulsive shocks from the rotating fault. The advancing rotation angle can be expressed in terms of instantaneous frequency as follows:
The shapes of the bursts {q_{n}(t)} are not affected in the model Eq. 37a by the warping which is determined by ω(t). Only their occurrence times t_{n} are affected. In fact, these occurrence times can be interpreted as warped versions of the equally spaced (unwrapped) angles {2πn + θ_{∗}} at which the fault excites the system: \( {t}_n={\psi}^{1}\left({\theta}_n\right)\triangleq {\psi}^{1}\left(2\pi n+{\theta}_{\ast}\right) \) where ψ^{−1} (not ψ) is the warping function, and \( \psi (t)\equiv \theta (t) \) is the dewarping function.
Given ω(t) and using \( \psi (t)=\theta (t) \), the warping function \( {\psi}^{1}\left(\theta \right)=t \) can, in principle, be solved for. For example, for \( \omega = at \), the equation \( \psi (t)=\theta (t)=\left(1/2\right){at}^2+\theta \left({t}_o\right) \) can be solved to obtain \( {\psi}^{1}\left(\theta \right)=\sqrt{\left(2/a\right)\left[\theta \theta \left({t}_o\right)\right]} \). Or, for \( \omega =\exp (at) \), \( {\psi}^{1}\left(\theta \right)=\left(1/a\right)\ln \left(a\left[\theta \theta \left({t}_o\right)\right]\right) \).
When the speed of rotation \( \omega = d\theta (t)/ dt \) is constant (pace is regular), both y(t) and x(θ) are cyclostationary. But when the speed changes with time, neither of these signals are cyclostationary! Moreover, there is no dewarping function that will render either of y(t) or x(θ) cyclostationary! Nevertheless, depending on the shapes of the bursts {q_{n}(t)}, if they do not overlap each other too much, it may be possible to measure {t_{n}}, fit an estimated period \( {\widehat{T}}_o \) to these measurements \( \left\{{\widehat{t}}_n\right\} \) (see Section 11), and then timeshift the individual bursts from \( \left\{{\widehat{t}}_n\right\} \) to \( \left\{n{\widehat{T}}_o\right\} \). If there were no errors in \( \left\{{\widehat{t}}_n\right\} \), then the burstshifted signal would be cyclostationary. Also, if the errors in \( \left\{{\widehat{t}}_n\right\} \) are independent and identically distributed random variables, the burstshifted signal would still be cyclostationary, but with lower degree of cyclostationarity (see Section 13). If the errors in \( \left\{{\widehat{t}}_n\right\} \) are small, this procedure can substantially increase the CCT (decrease the ISC) of the data, whether or not the burstshifted signal is exactly cyclostationary.
More generally, for an irregularpaced signal model such as Eq. 37a, time warping as in Eq. 37b affects the pacemaker’s rate and, as desired, can convert the irregular pulsetimes of occurrence to regular pulseangles (in the case of rotating machinery) of occurrence \( \left\{{\theta}_n\right\}=\left\{2\pi n+{\theta}_{\ast}\right\} \); but, Eq. 37b also reveals that this time warping also affects the time scale of the individual paced pulses or bursts \( \left\{{q}_n(t)\right\}=\left\{{q}_n\left({\psi}^{1}\left(\theta \right)\right)\right\}\triangleq \left\{{\tilde{q}}_n\left(\theta \right)\right\} \) and the warped pulses \( \left\{{\tilde{q}}_n\left(\theta \right)\right\} \) are no longer identically distributed. Consequently, neither the irregularpaced signal nor the timewarped regularpaced signal are cyclostationary. This is particularly important to the study of rotating machinery vibrations when the RPM (meaning revolutions per minute or rotations per minute) varies with time too fast to be treated in the data analysis as locally constant (meaning all vibration transients—e.g., from machine faults—have died away before the RPM changes substantially, in which case quasistatic approximations can yield accurate results).
It follows that the recently proposed angletime (twovariable) model for vibration signals from rotating machinery [19] cannot be cyclostationary in angle or time unless the RPM is constant or only slowly varying. But, if it is constant, then nothing is gained from the angletime model because angle is proportional to time and, consequently, the proposed order is just the scaled classical cycle frequency and the proposed orderfrequency spectral correlation function [19] is just the classical spectral correlation function, with a scaled cycle frequency.
Furthermore, it is well known that time averages cannot accurately approximate expected values for nonstationary signals that are not either slowly nonstationary or CS or polyCS or almostCS. Therefore, the proposed angletime model, which is a generally nonstationary stochastic process (generally not stationary or CS or polyCS or even timewarped CS), for rotating machinery with nonslow variation in RPM, does not provide a basis for a probabilistic theory of empirical signal processing because the probabilistic parameters cannot be estimated using empirical time averages and cannot provide a viable avenue for extending the theory of CS to vibration signals from rotating machinery with rapidly changing RPM. Similarly, the related concept of a cyclononstationary signal model [19], besides being burdened with an unfortunate name, cannot do any better because, again, the idealizedmodel characteristics obtained from expected values cannot be accurately approximated with empirical time averages. Any attempt to develop a sound probabilistic theory for the timeaverage processing of empirical data for such nonstationary processes is destined to fail. This is not a new result [20], but it is a result that should be more broadly known and understood by signal processing researchers wishing to add a novel theoretical flavor to their work:
The parameters in generally nonstationary stochastic process models cannot be accurately estimated using timeaverages [20].
But there is a way to obtain a vibration signal with rapidly timevarying RPM that is polyCS: If the variation of RPM with time is periodic, then the nonstationarity can be cyclic—the vibration signal can be polyCS with a period of CS that is varied periodically as in a frequencymodulated sine wave with a periodically timevarying frequency.
This provides motivation for machine testing with RPM intentionally varied periodically; and it is hereby suggested that such periodicRPM test protocols be investigated for purposes of exciting machine faults in a manner that enhances their detectability.
Design guidelines
Two key parameters in the basisfunction approach to the optimization problem described in Sections 4, 5, 6, 7, 8, 9, and 10 are the integration time T and the model order (number of basis functions) K. The parameter T must be large enough to suppress additive noise that corrupts the signal and to adequately average the erratic (or random) fluctuations in the signal itself in order to reveal the cyclic autocorrelation present. At the same time, this parameter must be no larger than necessary to keep the number of basis functions K required in the model of the timewarping function down to a minimum for those applications where the warping function is not known to within the unknown values of a small fixed set of parameters. As explained in Sections 7 and 8, Eq. 23 ideally needs to be satisfied for the dewarping method described in Section 7, and the counterpart of Eq. 23, in which ψ^{−1} is replaced with ψ ideally needs to be satisfied for the warpingcompensation method described in Section 9. But just how large does the ratio of the LHS to the farRHS of Eq. 23 need to be in order to satisfy the “much greater than” requirement? One approach to answering this question is presented here.
Qualitatively, the lower the degree of cyclostationarity (DCS) of the measured data after ideal dewarping is, the longer the integration time must be. The following results can be straightforwardly obtained from the fundamentals of the theory of CS developed in [2]. The DCS of the ideally dewarped measured noisy data \( \tilde{x}(t)\triangleq y\left({\psi}^{1}(t)\right) \), where \( y(t)=s\left(\psi (t)\right)+n(t) \) from which \( \tilde{x}(t)=s\left(\psi \left[{\psi}^{1}(t)\right]\right)+n\left({\psi}^{1}(t)\right)=s(t)+\tilde{n}(t) \), is defined to be the complexvalued correlation coefficient (this is also called the cyclic correlation coefficient and is one of several definitions of DCS—the utility of each definition is strongly applicationdependent (see [21])—
(usage of the shorthand \( {R}_x^{\alpha}\left(\cdot \right)\equiv {R}_{xx^{\left(\ast \right)}}^{\alpha}\left(\cdot \right) \) from above continues here and below). It can be shown that
where \( \mathrm{SNR}\triangleq {R}_s(0)/{R}_{\tilde{n}}(0) \) is the ratio of mean squared signal to mean squared noise, and \( {\rho}_s^{\alpha}\left(\tau \right)\triangleq {R}_{ss^{\left(\ast \right)}}^{\alpha}\left(\tau \right)/{R}_{ss^{\ast }}(0) \) is the cyclic correlation coefficient (assuming the signal has zero mean value or, equivalently, contains no finitestrength additive sinewave components), which has magnitude less than or equal to unity.
A key objective to be met for the optimization to be effective is to ensure that the coefficient of variation (CV) for the estimated cyclic autocorrelation for the dewarped data is small compared with unity, say 1/10 as an example target:
The CV can be shown [2, 5] to be approximated by
Substituting Eq. 39 into Eq. 41 transforms the requirement in Eq. 40 into
which can reexpressed as
Because the order of the timewarping model, which is the number of basis functions K, required to obtain a good fit to the time dewarping function component ψ^{−1}(t) − μt − νt^{1/2} can be no smaller than \( 4{B}_{\psi^{1}}T \) (assuming the basis spans the set of signals having duration T and bandwidth \( {B}_{\psi^{1}} \)), K is, from Eq. 42, lower bounded by
Equality in Eq. 43 might well suffice when the optimum (most efficient) set of basis functions, which is the smallest set that spans the space of all possible timewarping or warpcompensating functions, is used. For example, for the space defined by a maximum possible timebandwidth product (e.g., not exceeding \( {B}_{\psi^{1}}T \)), the optimum set of basis functions are the prolate spheroidal wave functions [15]. For any other basis, K will generally need to be larger than \( 4{B}_{\psi^{1}}T \) for this particular space of possible warping functions. It follows from Eq. 43 that the smallest the model order can be is the minimum of 1 and \( 400{B}_{\psi^{1}}/{B}_s \). Thus, we have the following guideline:
The smaller the ratio of the bandwidth of the dewarping function to the bandwidth of the signal is, the less computationally costly the optimization can be.
One would hope for a bandwidth ratio no larger than about 5% to 10%, in which case K need not exceed 20 to 40 (for the above selected example target value of CV).
The averaging time requirement in Eq. 42 applies generally to both the cyclicityrestoral and cyclicityirregularitycompensation approaches introduced in Sections 4, 5, 6, 7, 8, and 9. However, the modelorderrequirement in Eq. 43 applies only to the former (dewarping) method of Section 7. For the latter (warping compensation) method of Section 9, the condition \( K\ge 4{B}_{\psi^{1}}T \) used to derive Eq. 43 must be replaced with K ≥ 4B_{ψ}T, in which case, we obtain
Generally speaking, there does not appear to be any specific relationship between the bandwidths of a function ψ(t) and its inverse ψ^{−1}(t). For this reason and others, there is no general guideline presented here for determining which of the methods of Sections 7 and 8 is more efficient (requires smaller model order K) or is otherwise superior for a particular application. Nevertheless, one can, in principle, apply both methods for multiple values of K and/or different sets of basis functions and compare results. Two of various optional criteria for selecting the “best” method are (i) that which produces the smaller CV for a specified model order and (ii) that which produces the smaller model order for a specified CV. (It is not generally true that if the specified CV is smaller than but close enough to the smallest [over these two methods] CV for a specified K, then the smallest K [over these two methods] for this specified CV will be achieved with the same method.)
It is emphasized that the condition on averaging time given in Eq. 42 is necessary and sufficient for obtaining a low value for the CV, and the conditions on model order given in Eqs. 43 or 44 are generally necessary for obtaining a close approximation to ψ^{−1} or ψ in cases for which there is no knowledge of the functional form of these functions. As explained in Section 6, at least partly, the more ψ(t) or ψ^{−1}(t) deviates from t or more generally from any trend such as ct + dt^{2} or gt + ht^{1/2} (i.e., the stronger the warping or required dewarping), the weaker the cyclic autocorrelation of the data dewarped with a candidate estimate of ψ^{−1}(t) is likely to be. And the weaker it is, the poorer the quality of the optimization of a based on maximizing it is likely to be.
The extent of deviation of ψ(t) or ψ^{−1}(t) from a trend is for the most part captured by the bandwidths of these deviations, denoted by B_{ψ} and \( {B}_{\psi^{1}} \). Consequently, Eqs. 43 and 44 capture for the most part the impact of these deviations on the required values for the parameters T and K. For example, scaling the derivative of the deviation Δ_{ψ}(t) ≜ ψ(t) − trend (without changing the range of Δ_{ψ}(t)), which requires time compression or expansion, scales its bandwidth B_{ψ} by the same amount: \( d{\Delta}_{\psi}\left(\beta t\right)/ dt=\beta \left(d{\Delta}_{\psi }(t)/ dt\right) \) and \( {B}_{\psi \left(\beta t\right)}=\beta {B}_{\psi (t)} \).
Due in part to the fact that there is generally no amount of data (integration time T) and/or model order K that will produce an arbitrarily small specified CV, we have the following guideline:
Some amount of experimentation with parameter values, T and K, and basis functions will typically be required to obtain the best attainable results.
The performance of any particular method applied to any particular data can generally be expected to exhibit a minimum CV, for any specified set of basis functions and any available computational precision, at some particular pair of values for T and K (unless the warping function is of known form with a small fixed number of unknown parameters, regardless of the value of T—in which case, the larger T is, the smaller the CV will generally be); values larger than these optimum values for T and K can be expected to degrade performance.
Summarizing, the integration time T required is dictated largely by the condition given by Eq. 42, which is independent of the order K of the model \( \widehat{\psi}(t) \) for ψ(t). On the other hand, the model order required, for a good fit of \( \widehat{\psi}(t) \) to ψ(t), with no knowledge about ψ(t) other than its bandwidth, is dictated by the requirement K ≥ 4B_{ψ}T. Given a value of T large enough to produce a sufficiently small CV when the exact ψ^{−1}(t) or ψ(t) is used for dewarping or warping compensation (call this a target value of CV), the choice for a value of K affects primarily how well the model \( \widehat{\psi}(t) \) or \( {\widehat{\psi}}^{1}(t) \) can fit ψ^{−1}(t) or ψ(t) and therefore, how close to the target value of CV the actual value is. The value of K can, in principle, be chosen as large as needed, which is generally dictated by T, to obtain any desired precision of fit. The impact of large values of K is the effect it has on the convergence of the iterative search algorithm described in Section 10 or the accuracy of the model fit using any method for optimizing the vector of parameters a or b.This is a numerical issue, unlike the statistical issue of reliability, characterized here in terms of the CV. While the performance characteristics described in this section are promising in terms of the suggested apparent breadth of applicability of the methods introduced here, there are almost certainly limitations on applicability depending on the bandwidths B_{s} and B_{ψ} or \( {B}_{\psi^{1}} \) which further research should seek to characterize.
Numerical example
Because the viability of cyclostationarity exploitation in data processing in many fields of science and engineering has been amply demonstrated in the literature in recent decades using realworld data, the only purpose of this section is to demonstrate that the theory of cyclicity restoration and/or irregular cyclicity compensation by time dewarping or timewarp compensation, presented herein, is itself viable.
Experimental setup
The signal model to be used for the example is the following pulseamplitude/pulsewidth modulated pulse train with uniform spacing between pulse starting times:
where the two sequences of random variables \( \left\{{g}_n=\pm 1,\mathrm{iid}\ \mathrm{uniform}\ \right\} \) and \( \left\{{h}_n=1,2,3,\mathrm{iid}\ \mathrm{uniform}\right\} \) are statistically independent of each other, and the nominal pulse shape p(t) in this pulse stream is as shown in Fig. 1.
The timesampling increment is T_{s}, the pulse repetition period is \( {T}_o=160{T}_s \), and the signal bandwidth is approximated by the reciprocal of the width of the autocorrelation function [2]: B_{s} ≅ 12/T_{o}. The averaging time (datarecord length) is \( T=32,768{T}_s\cong 205{T}_o \) includes 205 cycles. It can be shown that the cyclic correlation coefficient is \( \left{\rho}_s^{\alpha}\left(\tau \right)\right\cong \)1.
The measured data y(t) contains the signal, timewarped by ψ(t), in additive white Gaussian noise n(t): y(t) = s(ψ(t)) + n(t) with \( SNR=+10 \) dB (ratio of mean squared values of s and n equals 10) in case A (strong signal), and with \( SNR=10 \) dB in case B (weak signal). From Eq. 1, we have \( x(t)=s(t)+n\left({\psi}^{1}(t)\right) \).
The timewarping function is given by the sum of unwarped time t and a nonperiodic function
where \( {\omega}_o=1/10{T}_o \), and the bandwidth of the component ψ(t) − t is approximately B_{ψ} ≅ ω_{o}/2π, which yields B_{ψ}/B_{s} ≅ 0.002. The warping function estimate is given by \( \widehat{\psi}(t)={\mathbf{b}}^T\mathbf{c}(t) \) from Eq. 26 with \( \gamma =1 \) and \( \eta =0 \). From Eq. 44, we require \( K>4{B}_{\psi }T=13>0.48/{\left{\rho}_s^{\alpha}\left(\tau \right)\right}^2\cong 0.5 \) provided that the basis functions span the space of functions with duration T and bandwidth B_{ψ} ≅ ω_{o}/2π. By using partial knowledge of ψ(t) below, K < 13 becomes feasible.
Taking advantage of approximate knowledge of ψ(t), we select the \( K=4 \) basis functions, \( {c}_1=\cos \left({\omega}_{\ast }t\right) \), \( {c}_2=\sin \left({\omega}_{\ast }t\right) \), \( {c}_3=\cos \left(3{\omega}_{\ast }t\right) \), \( {c}_4=\sin \left(3{\omega}_{\ast }t\right) \), with fundamental frequency ω_{∗} = 1.001ω_{o} (whose frequencies do not exactly match the true frequencies in ψ(t)) for case A, and with ω_{∗} and 3ω_{∗} replaced with the exact values ω_{o} and (3 + π/1000)ω_{o} for case B.
Discussion of results
The results of executing the iterative gradientascent optimization algorithm Eqs. 32 and 33 to estimate the vector of basisfunction coefficients b in Eq. 26 with \( \gamma =1 \), \( \eta =0 \), and \( K=4 \), using the gradient expression Eq. 31a with the conjugation choice \( \left(\ast \right)=\ast \), are shown in the eight figures in Section 14.2. Because the objective function Eq. 8 is highly multimodal, a substantial computational effort is needed to find the best initialization of the iteration. This costly task was circumvented by selecting a starting vector for b known to be in the vicinity of the optimum vector. Also, because the warping function is of the form \( \psi (t)=t+\varepsilon (t) \), with ε(t) << t moderately well satisfied, the approximate formula \( {\widehat{\psi}}^{1}(t)\cong t\widehat{\varepsilon}(t) \) for the inverse of the estimate \( \widehat{\psi}(t) \) was used as an expedient and, for ε̂, the estimate \( \widehat{\varepsilon}(t)=\widehat{\psi}(t)t \) was used. Quantitatively, \( \left\varepsilon (t)\right<1/5{\omega}_o=320{T}_s \); so a sufficient requirement for ε(t) << t is t/T_{s} >> 320, and the range of t/T_{s} used is [0, 32768]. Therefore, the requirement is met for only the latter 90% of the data, suggesting this may not be a highly accurate approximation and better results than those shown may be obtained using a more accurate approximation to the inverse of \( \widehat{\psi}(t)\kern0em \), or restricting t to be greater than 320.
It can be seen that time warping with ψ(t) (Fig. 2) significantly suppresses the cyclicity of the signal s(t) (Figs. 3, 4, 5, 6), but that ψ(t) can be estimated quite accurately from noisy data (Fig. 2), and inverted and used to dewarp the noisy warped signal well enough to substantially restore its cyclicity (Figs. 7, 8, 9). These results reveal the substantial noise tolerance exhibited by cyclostationary signals. Using the peak value of the cyclic correlogram (which is the quantity the optimization algorithm seeks to maximize) as a metric, it can be seen that the dewarping method for a signal with power level of only 1/10 that of the noise (Fig. 9) performs almost as well as it does for a signal with power level of 10 times that of the noise (Fig. 8).
Remaining challenges
Compared with the above example, there are, no doubt, substantially more challenging examples that arise in many fields of science involving data exhibiting irregular statistical cyclicity. Although the approach presented here is quite tolerant of noisy data, it is expected to be sensitive to the extent of the irregularity of the cyclicity. Timewarping functions with too large of a bandwidth can result in a required model order that may be too high (according to some as yet unidentified criterion). In addition, some important types of irregular cyclicity, such as pace irregularity, cannot be modeled in terms of time warping, as discussed in Section 12. Additional numerical examples are provided in [22, 23].
Conclusions
It is shown in this work that statistical inference from timeseries data based on enhancement or restoral of the property of cyclostationarity can be performed to achieve two data processing preparatory tasks; this achievement enables further processing of the time series data based on exploitation of cyclostationarity once it has been enhanced or restored and thereby identified by determining one of its cycle frequencies. The preparatory tasks are (1) determination of a time dewarping function or its equivalent timewarping compensation function and (2) use of that determined function to increase the degree of cyclostationarity thereby rendering it more amenable to cyclostationarity exploitation techniques that are well known to be effective for various types of statistical inference and decision objectives, particularly in situations where the signal of interest present in the data is masked by noise and/or interfering signals also present in the data.
Since its inception 30 to 40 years ago (cf. [1,2,3,4,5, 7,8,9,10,11]), cyclostationarity exploitation has proven to be an unusually versatile tool for extracting information that is, in some sense, hidden or buried in the available data. The achievement of the work reported here is the extension of this important and now widely used paradigm for signal processing from data exhibiting regular cyclicity to more challenging data exhibiting only irregular cyclicity. Strictly speaking, exactly regular cyclicity exists only in mathematics. Measurement or observation data obtained from the physical world can exhibit only irregular cyclicity. Depending on the phenomenon giving rise to the data, the degree of regularity in the cyclicity can be very high, moderate, very low, or absent altogether. Although cyclicity is ubiquitous in our world, as a consequence of rotation of Earth about its axis, its revolution about the Sun, the Moon’s revolution about Earth, other planets’ revolutions about the Sun, and, indeed, the astrodynamics of stars, galaxies, etc.—which has only recently been convincingly argued is a result of the central role of electromagnetism in the behavior of the Universe—the degree of irregularity in this naturally occurring cyclicity is more often than not too high to be ignored. This means the efficacy of cyclostationarity exploitation can often be substantially limited if the degree of irregularity is not decreased through time dewarping. And this typically requires statistical inference of an appropriate dewarping function.
As a consequence of the theoretical framework developed in this paper, the cyclostationarity paradigm can be expected to be extended, through statistically inferred time dewarping or timewarp compensation, from its present broad and diverse array of applications to a considerably more ubiquitous field of applications.
Highlights of converting irregular cyclicity to regular cyclicity

Conversion of irregular cyclicity in timeseries data to regular cyclicity is demonstrated.

Data with regular cyclicity can be modeled as cyclostationary.

The cyclostationarity paradigm for statistical inference has proven to be a rich resource.

Cyclostationarity exploitation offers noise and interferencetolerant signal processing.

Cyclostationarity exploitation can now be extended to many more fields of science/engineering.
Change history
25 October 2018
Following publication of the original article [1], the author noticed that the equation on page 9, right column, 14th line from the bottom was incorrect. The correct equation is mentioned below.
Abbreviations
 CCT:

Cycle coherence time
 CS:

Cyclostationary/Cyclostationarity
 CV:

Coefficient of variation
 DCS:

Degree of cyclostationarity
 ISC:

Irregular statistical cyclicity
 polyCS:

Polycyclostationarity
 RSC:

Regular statistical cyclicity
References
 1.
WA Gardner, in Introduction to Random Processes with Applications to Signals and Systems. Cyclostationary processes (Macmillan, New York, Ed. 1, 1985; McGrawHill, New York, Ed. 2, 1990)
 2.
WA Gardner, in Statistical Spectral Analysis: A Nonprobabilistic Theory. Part II: periodic phenomena (PrenticeHall, Englewood Cliffs, 1987)
 3.
WA Gardner, Cyclostationarity in Communications and Signal Processing (IEEE Press, Piscataway, 1994)
 4.
WA Gardner, Stationarizable Random Processes, IEEE Transactions on Information Theory, IT24 (1978), pp. 8–22
 5.
A Napolitano, Generalizations of Cyclostationary Signal Processing: Spectral Analysis and Applications (Wiley/IEEE Press, West Sussex, 2012)
 6.
William A. Gardner. A comprehensive tutorial website under construction. (2017). https://www.cyclostationarity.com. Accessed 1 Aug 2018.
 7.
WA Gardner, A Napolitano, L Paura, Cyclostationarity: half a century of research. Signal Process. 86, 639–697 (2006)
 8.
WA Gardner, Exploitation of spectral redundancy in cyclostationary signals. IEEE Signal Process. Mag. 8, 14–36 (1991)
 9.
William A. Gardner, University of California, Davis–Webpage, 2000; link: http://faculty.engineering.ucdavis.edu/gardner/publications/. Accessed 1 Apr 2018.
 10.
WA Gardner, Two alternative philosophies for estimation of the parameters of timeseries. IEEE Trans. Inf. Theory 37, 216–218 (1991)
 11.
WA Gardner, WA Brown, Fractionoftime probability for timeseries that exhibit cyclostationarity. Signal Process. 23, 273–292 (1991)
 12.
J Leskow, A Napolitano, Foundations of the functional approach for signal analysis. Signal Process. 86, 3796–3825 (2006)
 13.
BG Agee, SV Schell, WA Gardner, Spectral selfcoherence restoral: a new approach to blind adaptive signal extraction using antenna arrays. Proc. IEEE 78, 753–767 (1990)
 14.
WA Gardner, CK Chen, Signalselective timedifferenceofarrival estimation for passive location of manmade signal sources in highly corruptive environments. IEEE Trans. Signal Process. 40, 1168–1184 (1992) [Some of the quantitative performance evaluation results from simulations in Sec IV of Part II of this reference are misleading because the TDOA to be estimated is an integermultiple of the timesampling interval, which inadvertently reduces TDOA estimation error variance by quantizing to zero all individual errors less than half the sampling interval in magnitude. Particularly suspect are the mean squared errors curves that cross the CRB in Figs. 6(c), (d), 7, and 8.]
 15.
LE Franks, Signal Theory (PrenticeHall, Englewood Cliffs, 1969)
 16.
K KreutzDelgado, The Complex Gradient Operator and the CRCalculus (UCSD, San Diego, 2009) link: https://arxiv.org/pdf/0906.4835.pdf
 17.
J Barzilai, JM Borwein, Twopoint step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)
 18.
H Zhang, WW Hager, A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14, 1043–1056 (2004)
 19.
D Jerome Antoni, SB Abboud, in Cyclostationarity: Theory and Methods, Lecture Notes in Mechanical Engineering, ed. by F Chaari, J Leskow, A Napolitano, A SanchezRamirez. Timeangle periodically correlated processes (Springer International Publishing, 2014), pp. 3–14
 20.
WA Gardner, Correlation estimation and timeseries modeling for nonstationary processes. Signal Process. 15, 31–41 (1988)
 21.
GD Zivanovic, WA Gardner, Degrees of cyclostationarity and their application to signal estimation and detection. Signal Process. 22, 287–297 (1991)
 22.
A Napolitano, WA Gardner, Algorithms for Analysis of Signals with TimeWarped Cyclostationarity, Proceedings of 50^{th} Asilomar Conference on Signals, Systems, and Computers (2016), pp. 539–543
 23.
A Napolitano, Timewarped almostcyclostationary signals: characterization and statistical function measurements. IEEE Trans. Signal Process. 64, 5526–5541 (2017)
Acknowledgements
The author expresses his gratitude to Professor Antonio Napolitano for performing the simulations reported in Section 14 and for his helpful technical comments on the manuscript.
Author’s contributions
The author read and approved the final manuscript.
Author information
Affiliations
Corresponding author
Ethics declarations
Competing interests
The author declares that he has no competing interest.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional information
The original version of this article was revised: The equation on page 9, right column, 14^{th} line from below was incorrect. This equation has now been corrected.
Appendix
Appendix
The use of cyclostationary data models in science
As a simple means of assessing the utility of the concept of cyclostationarity in various fields of science and engineering, a web search using https://scholar.google.com/, performed in April 2018, was based on just under 50 nearly distinct applications areas in science and engineering, and the search terms were chosen with the intent of being minimally redundant: minimum number of hits, each of which results from more than one application area. The results are shown in Table 1, where it can be seen that the total number of hits was about 136,000. Analysis showed that the hits grew from a trickle of 1 to 2 figures per decade in the 1960s/1970s to a total of 6 figures over the ensuing half century. The same is true, with 5 figures, for the search performed on the single general search term “cyclostationary OR cyclostationarity”. Also, as shown in Table 2, a search was performed using just over 20 search terms that represent partiallyredundant general subjects in science and engineering: that is, there were substantial numbers of hits, each of which result from more than one subjected. The total number of hits was about 258,000. These hits also grew from a trickle per decade in the 1960s/1970s to a total of 6 figures over the ensuing half century. The number of hits for just the compound term comprised of the adjective and corresponding noun “cyclostationary OR cyclostationarity” was over 25,000 and has grown by a factor of approximately 4 every decade since the 1960s. To facilitate use of this data, it has been ordered alphabetically by search application (specific and general) in Table 1 or search subject (specific and general) in Table 2 and numerically by number of hits in Tables 3 and 4. Despite the concerted effort to use the searchterm operators AND and OR judiciously and to select the search applications in a manner that minimizes the likelihood of more than one term producing the same hit (here called “searchterm redundancy”), the search results obtained are suspicious. Given that there are only about 25,000 hits for the search subject “cyclostationary OR cyclostationarity” then, by “ANDing” this term with each of the approximately 50 approximately nonredundant applications, the total of all the hits for the “ANDed” terms should not exceed 25,000 by very much, yet it is about 5 times larger than this! This large of a total requires that either 1) the result of 25,000 is artificially limited by some search algorithm employed (not taken into account here out of ignorance) being used by the Google Scholar search engine, or 2) the AND and OR operators are not functioning correctly, or 3) the search application areas are much more redundant than expected—or possibly all three of these potential causes are in effect. The results of future analysis and possible refinement of this search study is planned to be made available in [6].
Table 5 highlights a particular scientific field in which cyclicity is central, and Table 6 highlights a few applications in which data measurement/analysis is enhanced by artificially instilling cyclicity into the data. These include use of optical spectral cloning for ultrafast/wideband photonic implementations of signalprocessing algorithms (e.g., for radio frequency data) that were previously limited to slower and more narrowband electronic implementation and use of spectral redundancy insertion, such as spectral cloning, for introducing some level of immunity to noise and/or interference when used in conjunction with frequencyshift filtering.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Gardner, W.A. Statistically inferred time warping: extending the cyclostationarity paradigm from regular to irregular statistical cyclicity in scientific data. EURASIP J. Adv. Signal Process. 2018, 59 (2018). https://doi.org/10.1186/s1363401805646
Received:
Accepted:
Published:
Keywords
 Cyclostationarity
 Irregular cyclicity
 Rhythmicity
 Signals
 Timeseries
 Time warping