Rapid Binary Gage Function to Extract a Pulsed Signal Buried in Noise

The type of signal studied in this paper is a periodic pulse, with the pulse length short compared to the period, and the signal is buried in noise. If standard techniques such as the fast Fourier transform are used to study the signal, the data record needs to be very long. Additionally, there would be a very large number of calculations. The rapid binary gage function was developed to quickly determine the period of the signal, and the start time of the first pulse in the data. Once these two parameters are determined, the pulsed signal can be recovered using a standard data folding and adding technique.


INTRODUCTION
The motivation for this research was the need to extract pulsed signals that were buried in noise (signal-to-noise ratio less than one). The signals of interest have a "short" pulse duration compared to the period of the signal, and they are "weak" compared to the background noise. Another consideration when analyzing these signals was that the length of the recorded signal be as short as possible.
Classic techniques for recovering information about noisy signals are usually based on the Fourier transform [1], the periodogram [2], or simply folding and adding the data. When dealing with digitized data, certain precautions must be taken when the data are analyzed. The signal must be digitized based on the Nyquist sampling rate, which in turn is based on the highest desired frequency component of the signal. This is to avoid the appearance of false aliases in the Fourier transform. Generally, to avoid this, the analog signal is first lowpass filtered before being digitized [3].
Difficulties arise when recording and analyzing signals which have a pulse duration that is short when compared to the pulse period. Such a signal contains frequency components that are very high compared to the fundamental frequency of the signal. For pulsed signals, the digitizing rate must be sufficiently high so that the pulse is well defined. This means that the sampling rate is dictated by the pulse duration, and not the period of the signal. The use of filters to precondition this type of signal when it is buried in noise might in fact result in the pulse being completely filtered out of the data. Additionally, when the signal is buried in noise, the classic techniques, when applied to high frequency signals, require long data records. This in turn, necessitates very long times to acquire and analyze the data.
As an example, consider the problem of using signals measured from pulsars as the primary navigation aid for interplanetary travel, which could include the problem of fixing the Earth in its orbit around the sun. One navigational principle is based on using the Doppler shift of the pulsar  pulse rate measured from a number of pulsars. These shifts can be used to identify the direction of travel, and ultimately the position of the Earth or any other interplanetary space vehicle. One candidate pulsar is PSR B0329 + 54, which has a period of 0.715 seconds and a pulse width of approximately 8.7 milliseconds. An example of data captured from this pulsar using a 12-meter parabolic antenna is shown in Figure 1. Data capture for this data set started on 3/3/98 at 21:20:46 GMT, and the sample rate was 1000 samples per second. The time record shown in the figure should include several pulses, but these cannot be seen because the pulsed signal is completely buried in noise.
The spectral resolution of the pulse rate required for satisfactory interplanetary navigation using this pulsar is about 3 µHz. If classical FFT analysis were selected for this problem, the Nyquist sampling rate and FFT theory control the length of an individual time record to 1/(required spectral resolution). For this pulsar the required record length is 330 000 seconds, or nearly 4 days. Clearly, a single data record would be insufficient to extract the pulse from the noise. It is estimated that at least 100 averages would be required for satisfactory extraction. Thus, the total data capture would take at least 33 × 10 6 seconds, or 385 days. It is impossible to maintain the alignment of a single earthbound dish for this length of time since the rotation of the Earth typically limits data acquisition to less than half a day at a time. This aside, the process would eventually produce an average position of the Earth during data acquisition and the ability to locate the Earth within its orbit is lost. Thus, FFT-based methods cannot be used for this navigational application because the fix time of 385 days is comparable to the journey time! These and other considerations led to the development of a non-FFT-based technique to recover a pulsed signal buried in noise. The procedure developed in this paper can determine the pulse period using data acquired in only a few minutes, thus permitting close to real-time analysis.
The paper is divided into two parts. First, a method is presented that allows the rapid determination of the period of the signal, and the start time of the first pulse in the data. Second, the time-averaged wave form shape of the pulsed signal is recovered from the noisy data.

NOISY DATA
Consider a data signal D(t) which is the sum of a periodic pulsed signal F(t) and noise N(t). Example signals are shown in Figures 2a and 2b.
Notice that the vertical scales of Figures 2a and 2b are different. The pulsed signal, F(t), has a period of T F . The pulse width, T F /K F , is given in terms of the period, and a constant K F , with K F being called the pulse width parameter. It is expected that the beginning of the first pulse, in the recorded data, does not start at t = 0. The time t SF shifts the pulse in time to account for this. The signal F(t) shown in Figure 2a can be generated using the following equations: outside the pulse, In this equation, C N is the cycle number, C is the maximum number of cycles, and A F is the maximum amplitude of F(t). The pulse width parameter takes on the values K F ≥ 1. If K F = 1, then the pulse width is equal to the period. It is to be noted that K F probably will not be an integer.
The mean value of a pulsed signal is not necessarily equal to zero. The mean value of the example F(t) isf = 2A F /πK F , and is nonzero. Becausef may not be equal to zero, we do not average out the mean value of D(t) before analyzing the signal as is usually done [4]. This also means that a nonzero mean value of the noise would not be removed.
Mathematically, define where f (t) is the zero-mean time-varying part of the pulsed signal F(t), andf is the mean value. The zero-mean timevarying part of the noise, N(t), is ν(t), andν is the nonzero mean value. Using the above definitions, the observed data signal is given by For storage and analysis purposes, the data signal is digitized. The inverse of the sampling rate is h, with units of seconds/sample. The total number of data points is n = t R /h. Here, n is an integer, and t R is the total time of the record length.

DISCRETE CROSS-CORRELATION FUNCTION
The cross-correlation function gives the correlation between two signals, x(t) and y(t). When x and y are digitized, the discrete form of the cross-correlation function is employed. The digitized signals are x i and y i with i = 1, 2, . . . , n. The biased discrete cross-correlation function is then given bŷ In this equation, r is the lag number. Then rh is the lag time between y and x. The lag time rh is the digital equivalent of the continuous lag time τ. Notice, as r approaches n − 1 there are fewer and fewer terms which are added together. This results in a loss of accuracy for high lag times [1]. Two important properties of the cross-correlation function, [1,5], are In these relationships,R x andR y are the autocorrelation functions of x and y, respectively. Another property arises when both x and y are periodic, with the same period. For this special case, the cross-correlation function is also periodic, with the same period as x and y [1]. A significant disadvantage of using the autocorrelation function is the large number of calculations that need to be performed. It requires (n 2 + n)/2 multiplications, and (n 2 + n)/2 additions for the direct method, although FFT methods can reduce the number of multiplications to n log n.

MODIFIED DISCRETE CROSS-CORRELATION FUNCTION
For the purposes of this research, the discrete crosscorrelation function, (4), was modified. The x i term was replaced with the data, D i , with i = 1, 2, . . . , n. The y i term was replaced with a pulsed periodic reference function called the binary gage. It is defined as follows: The pulses occur during the times with Here, T is the period, and K is the pulse width parameter of the binary gage. The binary gage is illustrated in Figure 3. Notice that the binary gage is not a Walsh or related function [1,6] which are square waves with values of ±1. The binary gage only takes on the values 0 and 1 and has variable period T and pulse width parameter K.
The binary gage is digitized with the same h value as was used for the data. However, to increase accuracy, it is twice as long as the length of the data set. The digitized binary gage is G j , with j = 1, 2, . . . , n, . . . , 2n. Since the binary gage is twice as long as D i , the summation upper limit in (4) becomes n. Using these substitutions, and multiplying through by n, (4) becomes The effect of the noise on (9) can be illustrated by substituting D i = F i + ν i +ν. This gives G i+r , r = 0, 1, . . . , n − 1.
The first term on the right is the cross-correlation of the pulsed signal with the binary gage. When the period of the binary gage is the same as the period of the pulsed signal, the cross-correlation will have the same period. The second term on the right is the cross-correlation of the noise with the binary gage. The pieces of equipment used to receive, digitize, and store the data signal have finite band widths. Thus, ν i is band-limited white noise (pink noise). This means that the second term on the right will not exactly sum to zero. The third term is a result of the mean value of the noise not being zero. In this equation, it is implied that the noise is stationary and ergodic. Strictly speaking, the noise only has to have these features during the time of the data record.

RAPID BINARY GAGE FUNCTION
The purpose of the rapid binary gage function (RBGF) is to rapidly determine the period, T F , of the signal, and the start time, t SF , of the first pulse. The properties of the binary gage are used to reduce the number of computations. During the pulse, the binary gage has a value of one. Thus, there is no need to perform these multiplications. Outside the pulse, the binary gage has a value of zero. Thus, there is no need to perform these multiplications or adds. Equation (9) can now be written as The notation "i(G i+r = 1)" means to perform the adds only during the pulse of the binary gage. Otherwise, increment i to the next value.
There are significant computational savings for (11) compared to (4) and (9). First, there are no multiplications. Second, the RBGF will be periodic, with the period of F(t), when G(t) has the period of F(t). This means that the max-imum lag number r only needs to equal T/h. This results in a further savings of computations. The resulting number of adds is nT/Kh.
The use of the RBGF requires an estimate of K, the pulse width parameter of the binary gage. It also requires estimates of the low, T L , and high, T H , values of the period of the binary gage. The behavior of the RBGF is seen when a surface plot is generated. For each K, plot the values of RBGF versus T and rh. When T L and T H bracket T F , there will be a peak in the plot when T = T F and rh = t SF . An algorithm for calculating the RBGF is given in Algorithm 1.
Example RBGF plots are shown in Figures 4, 5, 6, and 7. These plots use the F(t) equation that is shown in Figure 2a. For all of the plots the parameters of F(t) were A F = 7, T F = 40 seconds, K F = 4, t R = 393 seconds; and the parameters of the binary gage were 30 ≤ T ≤ 50 seconds, 0 ≤ rh ≤ 30 seconds. Notice that in all four plots, the vertical axis scale is shortened compared to the other two axes. The purpose of this was to clarify the plots.
For Figure 4, the first pulse of the signal started at t SF = 0 seconds, and K = K F = 4. The maximum value of the RBGF is at T = 40 seconds, and rh = 0 seconds, which correspond to the parameters of F(t). The peak value of the RBGF is over 300, which is about 3 times the values of the lower regions of the plot. Thus, T K and t SF can easily be found from the location of the peak value of the RBGF using a simple peak finding program.
For Figure 5, the first pulse of the signal started at t SF = 20 seconds, and K = K F = 4. The maximum value of the RBGF is at T = 40 seconds, and rh = 20 seconds, which correspond to the parameters of F(t).
Since the binary gage pulse width parameter, K, is estimated, it may turn out that it is higher or lower than the signal pulse width parameter, K F . The effects of this are shown in the next two figures.
For Figure 6, K = 2 which is one half the value of K F = 4. This means that the binary gage pulse is twice that of the signal pulse. The first pulse of the signal started at t SF = 20 seconds. A ridge now appears in the contour plot. The ridge ends at T = 40 seconds, and rh = 20 seconds, which correspond to the parameters of F(t). Notice that the maximum value is still the same as Figures 4 and 5. This agrees with the theory.
For Figure 7, K = 8 which is twice the value of K F = 4, indicating that the binary gage pulse is one half that of the signal pulse. The first pulse of the signal started at t SF = 20 seconds. A ridge also appears in this contour plot. The maximum value of the ridge begins at T = 40 seconds, and rh = 20 seconds, which correspond to the parameters of F(t). Since there are fewer points to add together, the maximum value is lower than in the other three plots. However, the maximum value is still about 3 times larger than the lower values. The ridge can easily be located by using a ridge finding program.
The ridge introduces an ambiguity: which end of the ridge gives the period, T F , of the signal? The ambiguity can be resolved by doubling the pulse width parameter, K, of the binary gage, and replotting. Figures 4, 5, and 6 show that the RBGF takes on a maximum value for K ≤ K F . D( j): read in the magnitudes of the data, put them into the array D( j), t R = numeric value of the length of the data record (s), h = numeric value of the inverse of the signal digitization rate(s/sample), T L = numeric value of the lowest period of the binary gage (s), T H = numeric value of the highest period of the binary gage (s), K = numeric value of the binary gage pulse-duration-parameter, C = int(t R /T H ), minimum number of whole cycles in the data record. int(· · · ) = integer, r max = int(T L /h), maximum value of the lag number r, for T = T L to T H step h, set the period of the binary gage, for r = 0 to r max, set the value of the lag number, SUM = 0, SUM = temporary variable for summing, for C N = 1 to C, C N = the cycle number,

NUMERICAL EXAMPLE
The procedure is demonstrated on a numerically generated signal using parameters comparable to pulsar PSR B0329 + 54. Figure 8 shows part of the 60-second long signal that includes an 8.7-millisecond long half-sine pulse being repeated every 0.7415 seconds, with random noise added. The signal has a pulse width parameter K F = 85.2. Notice that the pulse width is very small compared with the period of the signal. The results of applying the RBGF to the full data set are shown in Figure 9. This figure shows data for a broad range of time delays and periods. The original pulse is identifiable as the single large peak. For comparison with earlier work in this paper, Figure 10 shows the same results, but with the area of attention focused around the peak of Figure 9. The noise in the signal has caused the peak to lose some of the straightforward form seen in the earlier figures. However, the maximum peak is correctly located at rh = 0.7415 seconds and T = 0.12 seconds. Thus the RBGF has correctly located the narrow, pulsed signal hidden in significant noise as was shown in Figure 8.

RECOVERED AVERAGE WAVE FORM
The signal-to-noise ratio of periodic signals can be increased by summing successive periods of the signal [1]. This is not a correlation process, but an averaging one. In order for the noise to average out, the statistical properties of the noise need to be independent of time during the time of the data record. That is, the noise process must be stationary and ergodic [7].   The time averaging process can be illustrated with the help of Figure 11. The wavy line is the amplitude of the recorded data, which includes both the noise and the signal of interest. The RBGF was used to determine the period, T F , of the periodic signal. Successive cycles in the data are indicated by the vertical dashed lines. The data in each cycle are added to each other, point-by-point. The first data points in each cycle are added to each other. Then the second data points in each cycle are added to each other. This process continues until the last data points in each cycle are added to each other. The results of the averaging process, as applied to the signal shown in Figure 2b, are plotted in Figure 12. D( j): read in the magnitudes of the data, put it into the array D( j), h = numeric value of the inverse of the signal digitization rate(s/sample), T F = numeric value of the period of F(t) in the data (s) as determined by the RBGF, t R = numeric value of the length of the data record (s), C = int(t R /T F ), the integer number of whole cycles of F(t) in the data; for i = 0 to int(T F /h), time of the data point is t = i * h (s), SUM = 0, temporary variable for summing, for C N = 1 to C, C N = the cycle number, sum = D(i + (C N − 1) * T F /h) + SUM, sum the data points, next C N Y (i) = SUM, magnitude of Y (t), t(i) = i * h, time, next i Algorithm 2: Algorithm to recover Y (t) from the noisy data.  Figure 11: Data D(t) versus time t.
The amplitude, Y (t), of the plot is not yet F(t), since C cycles of F(t) were added together, along with C times the mean value of the noise,ν. Since the data record is of finite length and the noise is band limited, the noise does not completely average to zero. This results in the slight waviness in the plot. An algorithm to compute Y (t) is given in Algorithm 2. The desired F(t) is derived from Y (t) as follows. The number of whole cycles of F(t) in the data is C = t r /T F , C being an integer. The magnitude of Cν is determined from the data or the plot. This is subtracted from Y (t) which leaves CF(t). This in turn is divided by C which gives the final signal F(t).

CONCLUSIONS
Noisy, pulsed signals of the type studied in this paper require very long data records if conventional techniques are used to analyze them. Additionally, the analysis requires a very large number of calculations. The RBGF was developed to very quickly determine the period of the signal, and the start time of the first pulse in the data, using a relatively short data set. This method works well even when the signal-to-noise ratio is much less than one. Once the period of the signal is determined, a conventional time averaging technique can be used to recover the wave form shape of the signal. Figure 12: Y (t) versus time t.