### 2.1 Primary estimates

Let *S*_{p}(*f*) be the power spectral density of an observed, time dependent process *x*(*t*). The process is sampled at *N* ascending time instances *t*_{i}, with *i*=1…*N*. The entire duration of the signal is *T*, where 0≤*t*_{1}<*t*_{N}≤*T*. The sampling is either at random time instances *t*_{i} with real values from a continuous domain or at quantized time instances with the fundamental time step *Δ**t*, but with missing samples. Note that *T* is not determined finally by a data set with irregular sampling instances. Only *T*≥*t*_{N} is fix. The exact duration of the signal can be defined later, according to post-processing requirements like the spectral resolution desired for the discrete numerical spectral estimation.

Let *S*_{s}(*f*) be a bias-free estimate of the power spectral density of the observed signal. For the random sampling in continuous time, the observed signal is understood as the series of measurements *x*_{i}=*x*(*t*_{i}). For nominally regular sampling with missing samples, the observed signal is understood as the series of valid values only. Neglecting any possible errors due to a periodic continuation of the signal, the particular estimate of *S*_{s}(*f*) can be obtained, e.g., as the Fourier spectrum

$$ S_{s}(f)=\frac{T}{N^{2}}\left|{\sum\limits_{i=1}^{N} x_{i}\mathrm{e}^{-2\pi\mathbf{i} ft_{i}}}\right|^{2} $$

(1)

with the imaginary unit **i** or with other methods like Lomb-Scargle’s method [8, 9] or generalized Lomb-Scargle’s method [45] for data having a non-zero mean value. Note, that *S*_{s}(*f*) deviates from *S*_{p}(*f*) because irregular sampling influences the spectral content of the observation as outlined below.

Dissolving the square in Eq. (1) leads to

$$\begin{array}{*{20}l} {}S_{s}(f)&=\frac{T}{N^{2}}\left\{\left[\sum\limits_{i=1}^{N} x_{i} \cos(2\pi f t_{i})\right]^{2}\right.\\ &\left.\quad+ \left[-\sum\limits_{i=1}^{N} x_{i} \sin(2\pi f t_{i})\right]^{2}\right\} \end{array} $$

(2)

$$\begin{array}{*{20}l} &=\frac{T}{N^{2}}\left\{\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N} x_{i} x_{j} \left[\cos(2\pi f t_{i}) \cos(2\pi f t_{j})\right.\right.\\ &\left.\left.\quad+ \sin(2\pi f t_{i}) \sin(2\pi f t_{j})\right]{\vphantom{\sum\limits_{j=1}^{N}}}\right\} \end{array} $$

(3)

$$\begin{array}{*{20}l} &=\frac{T}{N^{2}}\left\{\sum\limits_{i=1}^{N}\sum\limits_{j=1}^{N} x_{i} x_{j} \cos\left[2\pi f (t_{i} - t_{j})\right]\right\}. \end{array} $$

(4)

The last line with the double sum can be rewritten as

$$ =\frac{T}{N^{2}}\left\{\sum\limits_{i=1}^{N} x_{i}^{2} + \sum\limits_{i=1}^{N} \sum\limits_{\substack{j=1\\j\neq i}}^{N} x_{i} x_{j} \cos\left[2\pi f (t_{i} - t_{j})\right]\right\} $$

(5)

with separate summations over self-products \(x_{i}^{2}\) on one side and cross-products *x*_{i}*x*_{j} with *i*≠*j* on the other side.

For uninterrupted equidistant sampling, the probability *P*(*t*_{i}) of getting a valid sample at time *t*_{i} is unity at any sampling instance *t*_{i}=*i**Δ**t* with the sampling interval *Δ**t* and zero between the samples. Since the self-products \(x_{i}^{2}\) in Eq. (5) can be built directly from the respective samples *x*_{i}, the respective probability of such self-products occurring at time *t*_{i} is identical to the probability for the occurrence of the sample itself, namely *P*(*t*_{i}). Therefore, the probability of self-products also is unity at any sampling instance *t*_{i}=*i**Δ**t* and zero between the samples.

Cross-products *x*_{i}*x*_{j} with *i*≠*j* instead require two samples at two different times *t*_{i} and *t*_{j}. The respective joint probability of getting such cross-products *P*(*t*_{i},*t*_{j}) then is the product of the probability *P*(*t*_{i}) of having the first sample at *t*_{i} and the conditional probability *P*(*t*_{j}|*t*_{i}) of having another sample at *t*_{j} under the condition of having had the first sample at *t*_{i}.

$$ P(t_{i},t_{j})=P(t_{i})P(t_{j}|t_{i}) $$

(6)

However, for uninterrupted equidistant sampling, the probability of having another sample with a delay of *t*_{j}−*t*_{i} that equals an integer multiple of the sampling interval *Δ**t* is also unity. Hence, with *t*_{i}=*i**Δ**t* and *t*_{j}=*j**Δ**t*, finally cross-products between different samples and self-products of single samples have the same probability, provided the sampling is equidistant and uninterrupted.

With any kind of irregular sampling, the probability *P*(*t*_{i}) of having a valid sample at a sampling time *t*_{i}=*i**Δ**t* is either less than one for nominally equidistant sampling with missing samples or it changes its physical dimension to a probability density for irregular sampling in a continuous time domain. However, the probability of self-products at a certain time *t*_{i} is still identical to that probability of the sample itself *P*(*t*_{i}) and also Eq. (6) still holds for cross-products. But with irregular sampling unfortunately, the respective conditional probabilities or conditional probability densities *P*(*t*_{j}|*t*_{i}) of getting another sample at time *t*_{j} after the sample at time *t*_{i} also differ from unity and may additionally vary for different delays *t*_{j}−*t*_{i}. Under these conditions, Eq. (5) averages over self- and cross-products of varying delays, which all have different probabilities or probability densities and contribute differently to the estimate of the spectrum if applied to irregularly sampled data. This finally leads to the observed bias.

Let further be *R*_{s}(*τ*) the correlation function of the observed signal. This one corresponds to *S*_{s}(*f*) via the Fourier transform incorporating Wiener-Khinchin’s theorem [46]. For a numerical realization via the discrete Fourier transform (DFT) resp. the inverse discrete Fourier transform (IDFT), both *R*_{s} and *S*_{s} must be sampled regularly. For that, a fundamental time increment *Δ**τ* and a frequency increment *Δ**f* are defined. While *Δ**τ* can be chosen arbitrarily for a randomly sampled signal in continuous time, *Δ**τ* equals the primary time step *Δ**t* of the signal for equidistant sampling. For the purpose of better clarity, in the following, the time step will be denoted as *Δ**τ* commonly, dropping the discrimination between time series with steps of *Δ**t* and correlation functions with steps of *Δ**τ*. The frequency increment in all cases is \(\Delta f=\frac {1}{T}\), defined by the assumed duration of the signal *T*, where *T* can be chosen either slightly larger than *t*_{N} or significantly larger corresponding to an optional zero padding of the signal. In the present study, *T*=*M**Δ**τ* is chosen with an integer *M*, where *T* is close to *t*_{N}, e.g., the difference is within one time step *Δ**τ*. That leads to the correspondences

$$ S_{s}(f_{j})=\Delta\tau \cdot \text{DFT}\left\{R_{s}(\tau_{k})\right\}=\Delta\tau \cdot \sum\limits_{k=-\left\lfloor \frac{M}{2}\right\rfloor}^{\left\lfloor \frac{M-1}{2}\right\rfloor} R_{s}(\tau_{k}) \mathrm{e}^{-2\pi \mathbf{i} f_{j}\tau_{k}} $$

(7)

$$ R_{s}(\tau_{k})=\frac{1}{\Delta\tau} \cdot \text{IDFT}\left\{S_{s}(f_{j})\right\}=\frac{1}{T} \cdot \sum\limits_{j=-\left\lfloor \frac{M}{2}\right\rfloor}^{\left\lfloor \frac{M-1}{2}\right\rfloor} S_{s}(f_{j}) \mathrm{e}^{2\pi \mathbf{i} f_{j}\tau_{k}} $$

(8)

with \(f_{j}=j\Delta f, j=-\left \lfloor \frac {M}{2}\right \rfloor \ldots \left \lfloor \frac {M-1}{2}\right \rfloor \) and \(\tau _{k}=k\Delta \tau, k=-\left \lfloor \frac {M}{2}\right \rfloor \ldots \left \lfloor \frac {M-1}{2}\right \rfloor \), where ⌊*x*⌋ is the largest integer smaller than or equal *x*.

### 2.2 Bias correction

The sampling schemes investigated here yield either a continuous distribution of sampling intervals (exponential distribution for purely random sampling times or with deviations from the exponential distribution for correlated random sampling) or a discrete distribution (unity at the sampling interval of *Δ**τ* and zero otherwise for uninterrupted equidistant sampling or with further integer multiples of *Δ**τ* occurring with samples missing). All investigated sampling schemes have in common, that the sampling function, which defines the sampling times *t*_{i}, has a certain randomness, namely the sampling instances occur at random times. For continuously distributed sampling times, the sampling instances themselves are randomly distributed. Regarding equidistant data sets with missing or invalid samples, the randomness applies to the availability or validity of the individual samples. In the following, continuous and discrete distributions of the respective sampling intervals are distinguished to ensure a unique discrimination between these two cases of either sampling in a continuous time domain or nominally equidistant sampling with missing instances. This is apart from possible correlations between the sampling instances resp. between the sampling intervals, which additionally cause deviations from a purely random sampling in each of these two cases.

Let the mean sampling rate be *α* and let the mean number of samples per time unit *Δ**τ* be *α*^{′}=*α**Δ**τ*. For a discrete distribution of sampling intervals, *α*^{′} is also the probability of a sample *x*_{i} being valid, which complies with 0≤*α*^{′}≤1. Self-products \(x_{i}^{2}\) of samples then also occur with the probability of *α*^{′} as explained above. Cross-products *x*_{i}*x*_{j} can be made only from two different samples occurring at different time units *t*_{i}=*i**Δ**τ* and *t*_{j}=*j**Δ**τ* with *i*≠*j*. Assuming independence between sampling times, the probability of having sampling time *t*_{j} after sampling time *t*_{i} becomes *P*(*t*_{j}|*t*_{i})=*P*(*t*_{j}), namely *α*^{′}. The probability of getting cross-products *P*(*t*_{j},*t*_{i}) in Eq. (6) then finally becomes *α*^{′}^{2}. For a continuous distribution of independent sampling intervals with a mean sampling rate of *α*, the number of samples per time unit *Δ**τ* is Poisson distributed with the mean value of *α*^{′}=*α**Δ**τ*, where *α*^{′} can also be larger than one. In this case, the mean number of self-products \(x_{i}^{2}\) per time unit again is *α*^{′}, and the mean number of cross-products *x*_{i}*x*_{j} within two *different* time units again is *α*^{′}^{2} following Eq. (6), provided all sampling instances within the two time units are independent.

Both of these values, the probability of self-products *α*^{′} and the probability of cross-products from two different time units *α*^{′}^{2}, are identical for the two cases, that of discrete distribution of sampling intervals and that of continuous distribution of sampling intervals for purely random occurrence of samples without correlations between the sampling instances. In contrast to the case of discrete distribution of sampling intervals, for a continuous distribution of sampling intervals, the occurrence of more than one sample within one time unit is possible. Therefore, there also exists a probability of cross-products occurring from different samples within the *same* time unit, which receives further attention at a later point.

Let *β*_{k} be the mean number of self- and cross-products per time unit counted in *R*_{s}(*τ*_{k}). If only cross-products from different time units were counted and no correlation between the sampling intervals is assumed, from each pair of time units, one would obtain *α*^{′}^{2} pairs of samples on average. Within the measurement time *T*=*M**Δ**τ* one then has *M*−|*k*| such pairs of time units of lag time *τ*_{k}=*k**Δ**τ*. For long enough data sets, small enough absolute lag times |*k*|≪*M* or periodic continuation of the signal, one can assume a mean number of *M* such pairs of time units, each yielding *α*^{′}^{2} pairs of samples on average. Finally, the expected number *M**α*^{′}^{2} of such cross-products counted in *R*_{s}(*τ*_{k}) divided by the length of the data set, which is *M* time units long, yields the expected mean number of cross-products per time unit *β*_{k}=*α*^{′}^{2}.

Let further *β**k*′ be *β*_{k} normalized by *α*^{′}^{2}. Then for independent sampling and if only cross-products from different time units are counted *β**k*′ becomes unity. Any other influence, self-products, correlation between sampling instances, or cross-products within one time unit causes *β**k*′ deviating from unity. In any case, *β**k*′ is the ratio of the mean number of pairs of samples expected in *R*_{s}(*τ*_{k}) including all effects causing *β**k*′ deviating from unity and the expected number of cross-products only from different time units and from independent sampling instances. A prediction of all coefficients *β**k*′ for a given sampling scheme then can be used to balance the different probabilities of self- and cross-products for each lag time *τ*_{k} of the primary estimate of the correlation function *R*_{s}(*τ*_{k}) by normalization with *β**k*′ yielding an improved estimate \(\hat {R}_{s}(\tau _{k})\) applying

$$ \hat{R}_{s}(\tau_{k})=\frac{1}{\beta'_{k}}R_{s}(\tau_{k}). $$

(9)

The general procedure for bias correction then requires to transform the primary estimate of the spectrum into the appropriate correlation function. The correlation function then can be corrected for its bias based on appropriate correction factors *β**k*′, followed by another Fourier transform of the improved correlation function back into a spectrum.

The determination of the values *β**k*′ depends on the particular sampling scheme. An analytical derivation requires a priori knowledge about the specific rules of the particular sampling process. Purely random sampling without any correlations between the sampling instances is a dedicated case, where only *β*0′ deviates from unity. This allows to perform the intended bias correction directly on the spectrum, as shown in the following subsection. With correlations between the sampling instances, the correction of the spectrum is more complicated and all values *β**k*′ are required. Analytical derivations for the following test cases with correlations of the sampling instances are given later, namely below the introduction of the particular simulation procedure. These derivations are limited to the shown test cases and they are no longer valid with other sampling schemes. As a universal alternative, the correction coefficients can also be estimated numerically directly from the data taken, as shown in the next but one subsection.

### 2.3 Purely random sampling with independent sampling instances

For continuous distribution of sampling intervals and with independent sampling instances, all cross-products are independent, which leads to *β**k*′=1 for all *k*≠0. For *k*=0, a number of *n* samples in a time unit *Δ**τ* delivers *n*^{2} cross- and self-products, where the number *n* of samples in that time unit is Poisson distributed with the probability

$$ P(n)=\frac{\alpha'^{n}}{n!}\mathrm{e}^{-\alpha'} $$

(10)

and with the mean value of *α*^{′}. The mean number of cross- and self-products for *k*=0 then becomes

$$ \beta_{0}=\sum\limits_{n=0}^{\infty} n^{2} P(n)=\alpha'+\alpha'^{2}, $$

(11)

which, after normalization with *α*^{′}^{2}, finally leads to

$$ \beta'_{k}=\left\{\begin{array}{ll}1+\frac{1}{\alpha'}&\text{for}\ k=0\\ 1&\text{otherwise.}\end{array}\right. $$

(12)

For discrete distribution of sampling intervals and with independent randomly occurring missed samples, again all cross-products are independent, which leads to *β**k*′=1 for all *k*≠0 also here. In contrast to the previous case, where multiple sampling instances were possible within one time unit, in nominally equidistant sampling, only one or zero samples can occur per time interval. Therefore, *R*_{s}(0) can include only self-products, which occur with the mean rate of *α*^{′}. After normalization with *α*^{′}^{2}, this finally leads to

$$ \beta'_{k}=\left\{\begin{array}{ll}\frac{1}{\alpha'}&\text{for}\ k=0\\ 1&\text{otherwise.}\end{array}\right. $$

(13)

In both cases, continuous and discrete distribution of sampling intervals without correlations, only *β*0′ deviates from unity. The corresponding correlation function can be corrected at lag time zero by

$$ \hat{R}_{s}(0)=\frac{1}{\beta'_{0}}R_{s}(0) $$

(14)

Due to the correspondence between the power spectral density and the correlation function given in Eq. (7), a correction of *R*_{s}(0) as in Eq. (14) leads to an offset correction of the power spectral density

$$ \hat{S}_{s}(f_{j})=S_{s}(f_{j})-\Delta\tau\left(1-\frac{1}{\beta'_{0}}\right)R_{s}(0) $$

(15)

for all frequencies *f*_{j}.

Fortunately, the procedure directly corrects the spectral estimates, which are the focus of this investigation. On the other hand, the procedure involves the corresponding correlation function. However, only the value *R*_{s}(0) is needed. Since the spectral estimates have been obtained directly from the data, a procedure without values of the correlation function would be preferable. In that case, the transformation of the spectra into corresponding correlation functions could be dropped. For this purpose, the mean signal power

$$ P_{s}=\frac{1}{N}\sum\limits_{i=1}^{N} x_{i}^{2} $$

(16)

is used instead, where \(\hat {R}_{s}(0)=\frac {R_{s}(0)}{\beta '_{0}}\approx P_{s}\). A deviation results from the fact that in *R*_{s}(0) or \(\hat {R}_{s}(0)\) in addition to self-products also cross-products of samples within single time units of *Δ**τ* may occur, while only self-products are counted in *P*_{s}. However, this deviation occurs only for continuous distributions of sampling intervals, and it becomes significant only for large values of *α*^{′} above one. Fortunately, for continuous distributions of sampling intervals, the interval *Δ**τ* can be chosen arbitrarily small, ensuring *α*^{′} to be sufficiently smaller than one. In the limit of infinitesimal small *Δ**τ*,*R*_{s}(0) and *β*0′*P*_{s} become the same and the correction of the spectrum becomes

$$ \hat{S}(f)=S(f)-\Delta\tau \left(\beta'_{0}-1\right)P_{s}. $$

(17)

Using the derivation of *β*0′ from above, for random sampling with continuous distribution of sampling intervals this reduces to

$$ \hat{S}_{s}(f)=S_{s}(f)-\frac{P_{s}}{\alpha} $$

(18)

and for randomly and independent missing samples with discrete distribution of sampling intervals to

$$ \hat{S}_{s}(f)=S_{s}(f)-P_{s}\left(\frac{1}{\alpha}-\Delta\tau\right). $$

(19)

### 2.4 Empirical estimates of the correction coefficients

For unknown spectral composition of the sampling function or that of the data gaps, the correction procedure with theoretical values of *α*^{′} and *β**k*′ is not practicable because the values *β*_{k} resp. *β**k*′ are not known a priori. In that case, the number of self- and cross-products and finally *β**k*′ can be estimated for each lag time *τ*_{k} individually by directly estimating the correlation function of the sampling function. For that, Eq. (1) for the primary estimate of the spectrum and Eq. (8) for the appropriate correlation function can be reused with the sampling times *t*_{i} from the observed signal, this time with all values *x*_{i} replaced by a constant value of one. Since Lomb-Scargle’s method has problems with signals of constant value, instead the Fourier spectrum is generally used for the empirical estimation of *β**k*′, yielding

$$\begin{array}{*{20}l} \beta'_{k}&=\text{IDFT}\left\{\frac{M}{N^{2}}\left|{\sum\limits_{i=1}^{N} \mathrm{e}^{-2\pi\mathbf{i} f_{j} t_{i}}}\right|^{2}\right\}\\ &=\frac{1}{N^{2}} \cdot \sum\limits_{j=-\left\lfloor \frac{M}{2}\right\rfloor}^{\left\lfloor \frac{M-1}{2}\right\rfloor} \left|{\sum\limits_{i=1}^{N} \mathrm{e}^{-2\pi\mathbf{i} f_{j} t_{i}}}\right|^{2} \cdot \mathrm{e}^{2\pi \mathbf{i} f_{j}\tau_{k}} \end{array} $$

(20)

again with \(f_{j}=j\Delta f, j=-\left \lfloor \frac {M}{2}\right \rfloor \ldots \left \lfloor \frac {M-1}{2}\right \rfloor \) and \(\tau _{k}=k\Delta \tau, k=-\left \lfloor \frac {M}{2}\right \rfloor \ldots \left \lfloor \frac {M-1}{2}\right \rfloor \).

### 2.5 Mean value

The procedures given here can be applied with no changes to data with or without a mean value. However, in contrast to equidistant sampling, a non-zero mean value in combination with irregular sampling increases the estimation variance of the derived spectra. Therefore, a possible mean value in real data should be estimated and removed from the data before spectral analysis. Unfortunately, the estimation and removal of the mean value is another bias source for the estimated correlation function and finally for the spectrum, as has been analyzed in [47, 48] including appropriate corrections. To avoid interference with additional bias sources, the following test simulations are done with mean-free processes only. Accordingly, only Lomb-Scargle’s method is used for direct comparison with the Fourier spectrum, while generalized Lomb-Scargle’s method has not been considered further.

### 2.6 Simulation

To demonstrate the ability of the estimation routines to derive bias-free estimates of the power spectral density, a moving-average stochastic process of order 200 is generated from white noise, sampled in four different ways and analyzed in Monte-Carlo simulation runs. The coefficients of the moving-average process are chosen such that the generated signals have an artificial spectrum with an exponentially increasing slope and with a distinct dip in the observed frequency range. Each run of the Monte-Carlo simulation generates a signal of such spectral characteristics with a total length of 200 tu (time units). To avoid the influence of further bias sources like incorrect assumptions of a periodic continuation of the signals, each signal fits together on both ends, justifying the assumption made above. The signals have no mean value and a standard deviation of 2 au (amplitude units). Then, four sampling schemes are applied to the time series: For the two cases with sampling instances from continuous time, interpolated values between the discrete samples of the simulated series are obtained by Whittaker-Shannon interpolation formula [49, 50]. (a) Purely random samples are taken from the interpolated time series independent from each other with a mean rate of *α*=0.5 tu^{−1}. (b) Random samples are taken from the interpolated time series including a minimum distance of *d*=0.5 tu between successive samples roughly mimicking processor dead times in laser Doppler applications. The mean sampling rate remains *α*=0.5 tu^{−1}. For the two test cases with nominally equidistant sampling, each sample of the simulated time series gets a corresponding weight of either one or zero from a random process to mimic outliers and data gaps. (c) Individual samples are taken out independently, with a probability of 50 *%*. (d) Series of valid and invalid samples are specified, where the state of validity changes with a probability of 20 *%* at each time step. The last procedure also yields 50 *%* invalid samples on average, where the length of valid data or that of invalid data has an exponential distribution with a mean of five samples and the sequence of weights gets correlated. However, in all four cases a mean sampling rate of *α*=0.5 tu^{−1} is set. Figure 2 illustrates the sampling schemes for the four test cases and the appropriate classification according to the definition in the abstract. The arrangement of the four test cases will remain the same for all following figures in the Section 3.

### 2.7 Appropriate correction coefficients for the test cases

For the two cases (a) and (c) with purely random sampling and independent sampling instances, the offset of the spectrum can be determined from the mean data rate and the spectrum can be corrected directly by removing the predicted offset as given in Eqs. (18) resp. (19). More effort is needed to derive the correction coefficients *β*_{k} and *β**k*′ for the two cases (b) and (d) with correlation between the sampling intervals. Since all properties of the signal generation including the rules of the sampling process are known, particular bias correction coefficients can be derived analytically. The following derivations are valid for the particular sampling schemes used as examples in the present simulation only. Other sampling schemes need their own derivations.

For case (b) with random sampling in continuous time with a continuous distribution of sampling intervals and with a minimum time *d* between successive samples, let *P*_{0}(*n*) be the probability to have *n* samples within one time unit *Δ**τ*. For *n* samples in a time unit, the number of pairs of samples (self- and cross-products) is *n*^{2}. The mean number *β*_{0} of pairs between all samples within a time unit is then derived by summation over all possible numbers *n* of samples as

$$ \beta_{0}=\sum\limits_{n=0}^{\left\lfloor\frac{1}{d'}\right\rfloor+1} n^{2} P_{0}(n) $$

(21)

with the normalized minimum time between samples \(d'=\frac {d}{\Delta \tau }\). To derive the probabilities *P*_{0}(*n*), also samples before the investigated time unit must be considered, because their delay times influence the probabilities of following samples within the actually investigated time unit. The probability to have no preceding sample effecting the actually investigated time unit is *P*_{e}=1−*α*^{′}*d*^{′} (*e* - empty). The probability to have the actually investigated time unit fully covered by a delay time of a preceding event is *P*_{f}=*α*^{′} max{0,*d*^{′}−1} (*f* - full). In this case, no events can occur within the actually investigated time unit. This case occurs only, if *d*^{′}>1. The probability to have the actually investigated time unit partially covered by the delay time from a preceding sample event is *P*_{p}=*α*^{′} min{1,*d*^{′}} (*p* - partial). The probability to have *n* samples in a time unit *Δ**τ* consists of the sum of these three cases, yielding

$$ P_{0}(n)=P_{e}(n)+P_{f}(n)+P_{p}(n), $$

(22)

where *P*_{e}(*n*),*P*_{f}(*n*), and *P*_{p}(*n*) are the probabilities to obtain *n* samples within the investigated time unit with either no preceding sample to be considered, time unit fully covered by the delay time of a preceding sample or partially covered. The probabilities of these three cases finally are

$$\begin{array}{*{20}l} P_{e}(n)&=\left(1-\alpha' d'\right) \left[P'\left(n,1-d'\max\left\{0,n-1\right\}\right)\right.\\ &\left.\qquad\qquad\qquad- P'\left(n+1,1-nd'\right)\right] \end{array} $$

(23)

$$ P_{f}(n)=\left\{\begin{array}{ll}\alpha' \max\{0,d'-1\}&\text{for}\ n=0\\ 0&\text{otherwise}\end{array}\right. $$

(24)

$$\begin{array}{*{20}l} P_{p}(n)&=\int\limits_{0}^{\min\{1,d'\}} \alpha' \left[ P'\left(n,x-d'\max\left\{0,n-1\right\}\right)\right.\\&\left.\qquad\qquad\qquad- P'\left(n+1,x-nd'\right)\right]\,\mathrm{d} x \end{array} $$

(25)

where *P*^{′}(*n*,*x*) is the probability to have at least *n* samples within the (normalized) fraction *x* of a time unit, which is

$$ P'(n,x)=1-\sum\limits_{j=0}^{n-1}\frac{(\alpha^{\prime\prime} x)^{j}}{j!}\mathrm{e}^{-\alpha^{\prime\prime} x} $$

(26)

with the normalized rate of further samples *α*^{′′} in the remaining time between dead time intervals after assumed samples

$$ \alpha^{\prime\prime}=\frac{\alpha'}{1-\alpha' d'} $$

(27)

For the mean number *β*_{k} of pairs of samples between two different time units (*k*≠0), only cross-products between any sample within one time unit and any sample in the other time unit can contribute. Therefore, one sample is assumed at time *t*_{a} in one time unit and another sample is assumed at time *t*_{b} in a different time unit, which is *k* time units away from the initial one. Because the autocorrelation is symmetric, only *k*>0 is discussed. For *k*<0,|*k*| can be used instead of *k* to derive all required parameters. The sample at *t*_{a} can occur at any time within its time unit with a mean rate of *α*. Normalization with the time unit *Δ**τ* yields \(t^{\prime }_{a}=\frac {t_{a}}{\Delta \tau }\). Then, the sample at \(t^{\prime }_{a}\) occurs with the mean rate of *α*^{′}. The occurrence of the sample at \(t^{\prime }_{b}=\frac {t_{b}}{\Delta \tau }\) depends on the number *n* of further samples between \(t^{\prime }_{a}\) and \(t^{\prime }_{b}\). However, reducing the time to the remaining fraction between all dead times of other samples, the mean rate of the sample at *t*_{b} is *α*^{′′} as in the case for *k*=0 above. The number of further samples *n* between \(t^{\prime }_{a}\) and \(t^{\prime }_{b}\) and their dead times *n**d*^{′} plus the dead time of the initial sample at \(t^{\prime }_{a}\), finally leads to the dependence between the sampling instances. Without limitation of generality, \(t^{\prime }_{a}\) shall occur in time unit number zero. The mean number of pairs of samples between the two time units then becomes

$$\begin{array}{*{20}l} \beta_{k}&=\int\limits_{-0.5}^{0.5}\,\int\limits_{\left|{k}\right|-0.5}^{\left|{k}\right|+0.5} \mathrm{d} t'_{a} \mathrm{d} t'_{b} \alpha' \alpha^{\prime\prime} \sum\limits_{n=0}^{\left\lfloor\frac{t'_{b}-t'_{a}}{d'}-1\right\rfloor}\\ &\qquad{P(n,t'_{b}-t'_{a}-(n+1)d')} \end{array} $$

(28)

where *P*(*n*,*x*) is the probability to have *n* further samples within the (normalized) time interval *x*, which is

$$ P(n,x)=\frac{(\alpha^{\prime\prime} x)^{n}}{n!}\mathrm{e}^{-\alpha^{\prime\prime} x} $$

(29)

From the mean number of pairs of samples *β*_{k} the correction coefficients *β**k*′ (for all *k*, including *k*=0) can be obtained as

$$ \beta_{k}'=\frac{\beta_{k}}{\alpha'^{2}} $$

(30)

For case (d) with correlated data gaps in discrete time and with a discrete distribution of sampling intervals, the probability is derived that a valid sample occurs at a given time unit and another one occurs |*k*| time units away. The probability of having a valid sample at the first instance is *α*^{′}=0.5 in this simulation. For a given valid sample at the first instance, the probability of having another valid sample at the second instance depends on the number of possible changes *n* from valid data to invalid data and vice versa. Between the two instances up to |*k*| changes can occur with a mean number of changes per time unit of *c*^{′}=0.2 in this simulation. A change occurs at a given time instance with a probability of *c*^{′}, no change occurs with a probability of 1−*c*^{′}. An odd number of changes yields an invalid sample at the second instance, while an even number of changes yields a valid sample. The changes can occur at any time instance between 1*Δ**τ* and |*k*|*Δ**τ*, where the result depends on their number but not on their order. The mean number *β*_{k} of pairs of samples then becomes

$$\begin{array}{*{20}l} \beta_{k}&=\alpha'\sum\limits_{n'=0}^{\left\lfloor\frac{\left|{k}\right|}{2}\right\rfloor} \binom{\left|{k}\right|}{n} \left(1-c'\right)^{\left|{k}\right|-n} c{\prime}^{n}\\&=\frac{\alpha'}{2}\left[1+\left(1-2c'\right)^{\left|{k}\right|}\right] \end{array} $$

(31)

with *n*=2*n*^{′}, which finally leads to the correction coefficients

$$ \beta_{k}'=\frac{\beta_{k}}{\alpha{\prime}^{2}}=1+\left(1-2c'\right)^{\left|{k}\right|} $$

(32)