Prediction on widely factorizable signals

This article extends the solutions to the prediction problem for factorizable real random signals to the class of improper complex-valued random signals. For that, the concept of widely factorizable signals is introduced and several real examples of signals having a widely factorizable correlation function are presented. A widely linear processing is considered in the design of both linear and nonlinear prediction algorithms which are computationally feasible from the practical standpoint. These algorithms are valid for stationary as well as nonstationary signals and they can be applied from only the knowledge of the second-order properties of the augmented vectors involved, it not being necessary to know if the signal satisfies a state-space model.


Introduction
In recent research, estimation theory becomes very relevant within the complex random signals field. In fact, although traditionally the treatment of this type of problem has consisted in mere extensions of the vectorial real-valued estimation algorithms to the complex plane, a complex formalism has a special value in the description of some physical systems in such diverse fields as communications, oceanography, meteorology and optics among others (see, for example, [1] and the references therein).
The classic processing with such signals, called strictly linear (SL), assumes that the signals involved are proper. a However, this assertion is not always justified. The improper nature of some signals requires that the pseudocorrelation function must be taken into account in describing and characterizing their second order properties completely [2][3][4].
With this motivation, [2] introduces a new methodology called widely linear (WL) whose more notable characteristic is that it utilizes not only the observed signal but also its conjugate to obtain estimators with better behavior than the conventional ones in the sense of reducing the mean square estimation error. Moreover, this kind of processing has become very usual in the last decade for designing linear and nonlinear estimation algorithms from a discrete-time [1,[5][6][7][8][9] as well as a continuous-time perspective of the problem [10]. Specifically, focussing our attention on the discrete case, the recent books of Mandic and Goh [1] and Adali and Haykin [11] about WL adaptive systems can be considered as two reference texts in this area which provide a unified treatment of linear and nonlinear complexvalued adaptive filters.
On the other hand, knowledge of second-order statistics is a key assumption in solving estimation problems (see, for example, [12]). In practice this knowledge is available because second-order statistics of the problem have been measured experimentally or they are understandable enough from the physical mechanism [13]. In this framework, WL estimation algorithms for computing all types (filtering, prediction, and smoothing) of estimates have been devised in [8] for second-order stationary (SOS) signals, i.e., those signals with constant mean function and both correlation and pseudocorrelation functions only dependent on the difference of time instants. Although the WL estimation problem considered is very general, its applicability is limited to SOS signals.
In the real field, an alternative estimation methodology based on correlation information has been developed to solve very general linear and nonlinear estimation problems for the class of factorizable signals (see, for example, [14,15]). This type of signal is characterized by having a factorizable kernel which is a very general condition valid for stationary and nonstationary signals. This formulation does not require postulating a dynamical model for the signal and thus, it is useful whenever the physical mechanism generating the signal of interest is not known or if it impossible to determine (for instance, if it does not satisfy a state-space model).
In this article, our objective is to extend this last methodology to the complex plane. For that, we introduce a new class of signals, called widely factorizable, whose correlation function of the augmented vector formed by the signal and its conjugate is a factorizable kernel. Then, a WL processing is employed in the development of both linear and nonlinear prediction algorithms for widely factorizable signals observed in the presence of noise. With this aim, in Section 2 widely factorizable signals are defined and also the basic notation and concepts about complex-valued signals are summarized. Next, the linear augmented complex prediction problem is addressed in Section 3 where a recursive algorithm is provided for computing the optimal WL prediction estimate as well as its associated error. This section also includes three numerical examples which show the enhancement of the proposed WL predictor in relation to the SL solution. Finally, by following a similar reasoning to the extended Kalman filter (EKF), the nonlinear augmented complex prediction problem is solved in Section 4 where a numerical example is also developed in order to compare the proposed WL algorithm with two conventional nonlinear augmented complex techniques, the WL EKF and the WL unscented Kalman filter (UKF) suggested in [1]. Furthermore, these solutions are also compared with the SL EKF and SL UKF.

Preliminaries
This section covers some of the more basic notions associated with the complex-valued random signals field and introduces a new class of signals, called widely factorizable. Moreover, the notation and hypotheses held throughout the article are also established in this section.
First of all, note that all vectors will be denoted by bold small letters and bold capital letters will be used for matrices. Also, row k of any matrix A(·) will be denoted by a [k] (·). Furthermore, 0 p and 0 p × q represent the p-vector and the p × q-matrix, respectively, whose elements are all zeros.
Unless indicated to the contrary, throughout this article we consider an improper complex-valued signal {s(t i ), t i T}, T = {t 1 , t 2 , ...}, with zero-mean and correlation function r s (t i , t j ) = E[s(t i )s* (t j )], where the superscript '*' represents the complex conjugate.
The signal {s(t i ), t i T} is said to be factorizable if there exist two l-vectors a(t i ) and b(t i ) such that its correlation function r s (t i , t j ) can be expressed in the form where the superscript 'T' denotes the transpose. Note that this type of signal is very general and includes both stationary and nonstationary signals. However, a new class of signals is possible by imposing the condition of factorizable kernel on the correlation function R s (t i , t j ) = E[s(t i )s H (t j )], with the superscript 'H' denoting the conjugate transpose, of the augmented vector s(t i ) = [s(t i ), s* (t i )] T . This type of signal, called widely factorizable, is introduced next.
Definition 2.1 A signal {s(t i ), t i T} is said to be widely factorizable if and only if there exist two 2 × m-matrices A (t i ) and B(t i ) such that the correlation function R s (t i , t j ) of the augmented vector s(t i ) can be expressed in the form where the superscript 'H' denotes the conjugate transpose.
Note that condition (2) implies (1), however condition (1) does not assure that the correlation function of the augmented vector satisfies (2). As a consequence, we have that all widely factorizable signals are also factorizable but the converse does not hold.
Some illustrative examples of widely factorizable signals are the following: i) The rotation of a real factorizable zero-mean signal x(t i ) by an independent random phase θ, s(t i ) = e θ j x(t i ), with j = √ −1. Assume with a(t i ) and b(t i ) two real l-vectors. Thus, where θ (·) is the characteristic function of θ.
ii) The seismic ground acceleration can be represented by a uniformly modulated nonstationary process given by is a stationary process with zero mean and known second-order statistics, and d(t i ) is the time modulating function. Both x(t i ) and d(t i ) can be either real or complex-valued [16]. Thus, if d(t i ) is complex and x(t i ) is real and factorizable with correlation function given by (3) then, s(t i ) is widely factorizable being Common choices are d (t i ) = e j2πω 0 t i with ω 0 a constant frequency and x(t i ) the Ornstein-Uhlenbeck process.
On the other hand, if both d(t i ) and x(t i ) are complex and x(t i ) is also widely factorizable with iii) In electromagnetic theory, the time-varying position of the electric field vector can be represented as the following signal s (t i ) = Ae jw 0 t i + Be −jw 0 t i where the expressions for the random coefficients A and B can be found in [ [17], p. 7]. Hence, s(t i ) is widely factorizable with iv) A signal widely used in many areas of signal processing is a linear frequency modulation or chirp with random phase. The chirp process can be expressed as where θ is the random phase, a determines the starting instantaneous frequency of the chirp and β is the chirp rate. Then, the chirp process is widely factorizable with with c (t i ) = e jπ (2αti + βt 2 i ) and θ (·) the characteristic function of θ.
v) An application of the complex Ornstein-Uhlenbeck process is the description of the motion of the instantaneous axis of the Earth's rotation [18]. This motion has an 1 year period and if it is removed, there remains the socalled Chandler Wobble, which has a period of about 435 days (14 months). Kolmogorov proposed the complex stochastic process to describe the Chandler Wobble, i.e., the motion of the pole, where x(t i ) and y(t i ) are the coordinates of the deviation of the instantaneous pole from the North Pole. In that model the first term is a periodical component, and the second term ξ(t i ) is a complex Ornstein-Uhlenbeck process. It is not difficult to check that the signal s(t i ) is proper and then, where λ > 0 is the drift parameter and ω ℝ is the period.
On the other hand, we provide a simple example of a factorizable signal which is not widely factorizable: a zero-mean complex-valued signal {s(t i ), In the following, we also assume that the signal of interest s(t i ) is widely factorizable in the sense of Definition 2.1.
denotes the cross-correlation function between any two augmented signals s(t i ) and y(t i ), and r sy (t i , t j ) = E[s(t i )y H (t j )] represents the cross-correlation function between s(t i ) and the augmented vector y(t i ).

Linear augmented complex prediction
Assume that the signal s(t i ) established in Section 2 is observed through the following linear equation: where g(t i ) is a deterministic complex-valued function and v(t i ) is a doubly white noise b correlated with s(t i ). Moreover, the cross-correlation function and the augmented signal s(t i ) and the augmented noise v(t i ) is of the form where We consider the problem of obtaining the optimal (in the sense of minimizing the WL mean square error) estimator c of the signal s(t k ) as a function of the information given by the observations {y(t 1 ), ..., y(t n ), y*(t 1 ), ..., y*(t n )}, for t k ≥ t n . It is known that such an estimator can be expressed as a linear function of the set of augmented observations {y(t 1 ), ..., y(t n )} as follows [2] where the 2D vector h(t k , t j , t n ), called the impulse response function, satisfies the equation From (2) and (7), it is easy to check that r sy (t k , t j ) and R(t i , t j ) can be written as follows: Although the problem is completely determined from the computation of the impulse response function by solving Equation (9), our aim here is to provide a recursive algorithm for its computation. Next, the recursive formulas for computing the estimator (8) and its associated error p(t k |t n ) = E[|s(t k ) -ŝ(t k |t n )| 2 ] are devised.
Theorem 3.1 The optimal WL estimate ŝ(t k |t n ) defined in (8) can be recursively computed as follows: s(t k |t n ) = ψ [1] (t k ) (t n ), t k ≥ t n (12) where the q-vector (t n ) is recursively computed from the expression with the q × 2-matrix J(t n , t n ) given by the equation with the 2 × 2-matrix Ω(t n ) = Σ + [Γ(t n ) -F(t n ) Q (t n-1 )] F H (t n ) and the q × q-matrix Q(t n ) satisfying the recursive equation Moreover, the associated error is given by the expression p(t k |t n ) = r s (t k , t k ) − ψ [1] (t k )Q(t n ) ψ H [1] (t k ), t k ≥ t n (16) Proof. From (10) and (11), Equation (9) can be rewritten as h T (t k , t j , t n ) = ψ [1] Then, if we introduce a function J(t j , t n ) satisfying the equation we obtain that and then, substituting (18) in (8), and defining the function the Equation (12) for the optimal estimator is devised. Now, subtracting the Equation (17) for t n and t n-1 and taking (11) into account, we can write Thus, from (17), we have the relation As a consequence, subtracting the Equation (19) for t n and t n-1 and using (20) in the resulting equation, the recursive expression (13) is obtained.
Next, we proceed to derive expression (14) for J(t n , t n ). By taking t j = t n in (17) and using (11), we have where we have introduced the q × q-matrix Moreover, if we subtract Q(t n-1 ) from Q(t n ), and use (20) and (22) in the resulting expression, the recursive Equation (15) for Q(t n ) is derived. Finally, using (15) in (21), it is easy to check that J(t n , t n ) satisfies the expression (14).
Finally, in order to derive expression (16) for the error p(t k |t n ) associated with the above estimate, we remark that, from the orthogonal projection lemma, this function can be expressed as Then, substituting (12) in the above equation and using (17) and (19), we check that p(tk|tn) = rs(tk, tk)−ψ [1] (tk) E[ (tn) H (tn)] ψ H [1] (tk) = rs(tk, tk)−ψ [1] As a consequence, from (22), (16) is obtained. Remark 1 When {s(t i ), t i T} is a factorizable realvalued signal with correlation function of the form (1), and the observations of the signal verify the complexvalued linear Equation (6) with where c(t i ) and e(t i ) are vectors of dimensions l and l', respectively, and D(t i ) and F(t i ) are matrices of respective dimensions 2 ×l and 2 ×l ' , we obtain that Algorithm 3.1 holds, replacing the involved 2 × q-matrices Ψ(t i ), , and taking the matrices Remark 2 The efficiency of Algorithm 3.1 is closely related to the dimensions l, l ' , and m of the matrices involved in the factorizations (2) and (7). Indeed, the computational complexity of this algorithm is of order q, with q = m + l + l ' , and thus, it involves a further complication in implementation and an increased computational burden as q grows. Since the factorization of the covariance is not unique then, the key question is in choosing the one which minimizes the dimension q. There exist simple cases, as illustrated previously, where the factorization is easily obtained. Nevertheless, in those more complex cases where this factorization is not trivial, one can use several methods available in the literature to get a factorization with minimum dimension (see, e.g., [19]).

Numerical examples
The advantages of the proposed Algorithm with respect to the SL solution are illustrated here through three numerical examples. The first one involves real correlation matrices and analyzes the effectiveness of the WL processing with respect to the SL one in terms of the impropriety degree of the observations. In the second example, complex correlation matrices are considered and the resulting WL and SL estimation errors are graphically compared. Finally, the third example shows a real application to seismic signal processing.

Example
Let {x(t i ), t 1 ≤ t i ≤ t 100 }, with t i = i/100, i = 1, ..., 100, be an Ornstein-Uhlenbeck process with correlation function which is transmitted over a channel that rotates it by a standard normal phase θ and adds a doubly white Gaussian noise v(t i ) correlated with the signal with Thus, the signal of interest is s(t i ) = e θj x(t i ) and the observations y(t i ) are of the form (6) with g(t i ) = 1. Moreover, we assume that θ is independent of x(t i ) and v(t i ).
Note that the correlation matrix of the augmented signal s(t i ) can be expressed in the form (2), where A and B are as in (4) with l = 1, a(t i ) = e −t i , b(t i ) = e t i , and θ (-2) = θ (2) = e -2 . Moreover, the cross-correlation matrix between s(t i ) and v(t i ) is of the form (7) In this example, we consider the problem of computing the fixed-lead predictor ŝ(t k+10 |t k ). As a measure for comparing the performance of the WL and SL fixedlead predictors we use the one defined in [6], the mean square of the difference between both errors, for with τ varying within the interval [1, 2): with p τ (t k+10 |t k ) andp τ (t k+10 |t k ) denoting the WL and SL fixed-lead prediction errors, respectively, for every value τ. The results obtained are displayed in Figure 1  which shows that, not only the WL fixed-lead predictor presents a better behavior than the SL fixed-lead predictor but also the difference between both errors (in the mean square sense) increases with τ , and hence, the WL technique becomes more effective.

Example
Let {s(t i ), t 1 ≤ t i ≤ t 100 }, with t i = i/100, i = 1, ..., 100, be a signal of the form where A and B are complex random variables. In this example, the signal is assumed to be widely stationary, that is E  (5), the correlation matrix of the signal s(t i ) can be expressed in the form (2). Moreover, we consider that the observations y(t i ) are of the form (6) with g(t i ) = 1 and where the noise ν(t i ) is uncorrelated with the signal and its augmented variance matrix is On the basis of the set of observations {y(t 1 ), y(t 2 ), ..., y (t 100 )}, we consider the problem of computing the fixedlead predictor ŝ(t k+10 |t k ). Then, Algorithm 3.1 is used to obtain the WL fixed-lead prediction error p τ (t k+10 |t k ) which is compared with the SL fixed-lead prediction errorp τ (t k+10 |t k ) in Figure 2. As could be expected, this figure shows that the WL fixed-lead predictor presents a better behavior than the SL fixed-lead predictor. Finally, Figure 3 depicts the performance measure given by (24) with τ [1, 2) and Σ as in (23). Again, the improved precision attained with the WL fixed-lead predictor with respect to the SL fixed-lead predictor is observed as τ increases.

Example
As indicated in Section 2, uniformly modulated nonstationary processes are often used to model seismic records, especially acceleration records. The modulated nonstationary process is given by s(t i ) = d(t i )x(t i ), where x(t i ) is a stationary process with zero mean and known second-order statistics, and d(t i ) is the time modulating function. A stochastic earthquake model commonly used for x(t i ) is the Kanai-Tajimi process (see, e.g., [20]). It is well-known that the Kanai-Tajimi earthquake model is covariance equivalent with the subset of the ARMA(2,1) model corresponding to a unit value of the spring-dashpot input ratio [21]. For firm ground conditions, at moderate epicentral distance, Kanai and Tajimi have suggested specific values for the parameters in the equation of motion in continuous time whose corresponding discrete ARMA(2,1) model is [21] x( with σ 2 e = 39.08 and Δt = t i -t i-1 = 0.02 s. This ARMA model has a factorizable correlation function given by (3). Moreover, we have chosen the modulating function d(t i ) = e jt i and the augmented error covariance matrix of the form with τ [0, 1). Here we have studied the filtering problem. Figure 4 illustrates the enhancement of the WL filter in relation to the SL one by using the measure    Figure 3 Performance of fixed-lead predictors. Quadratic mean of the difference between the WL and SL fixed-lead prediction errors.
Similar to the previous examples, we observe the superiority of the WL estimate as τ increases.

Nonlinear augmented complex prediction
Given the same conditions on the signal established in Section 2, we suppose that the ob-servation process can be given by a nonlinear relation of the form where z(·) is a complex-valued nonlinear function and the signal s(t i ) is uncorrelated with the noise ν(t i ). As in the linear case, the aim here is to estimate the signal s (t k ) on the basis of the observation set {y(t 1 ), ..., y(t n ), y* (t 1 ), ..., y*(t n )}, with t k ≥ t n .
The solution to this problem is addressed by following a similar philosophy to the EKF [22]. Specifically, following the basic idea of the EKF the nonlinear function z(s(t n ), t n ) is linearized at each time instant by a first-order Taylor series expanded about the estimated signal ŝ(t n |t n-1 ) Consequently, we can proceed to approximate the nonlinear observation Equation (25) as shown bȳ wherē y(t n ) = y(t n ) − z(ŝ(t n |t n−1 ), t n ) + g(t n )ŝ(t n |t n−1 ) and g(t n ) = ∂z(s, t n ) ∂s s=ŝ(t n |t n−1 ) and thus, s(t k ) can be estimated in terms of the set of observations {ȳ(t 1 ), . . . ,ȳ(t n ),ȳ * (t 1 ), . . . ,ȳ * (t n )} from the relation (26). For that, the formulas given in Algorithm 3.1 for the WL predictor can be used. Note that in the case of the signal and observation noise being uncorrelated, Algorithm 3.1 holds with . Then, as in the EKF, in the resulting formulas we also use the linearized observation function (27) in place of the previous function g(t n ) and the term G(t n )A(t n )(t n-1 ) is replaced by the vector z ŝ t n |t n−1 , t n = z ŝ t n |t n−1 , t n , z * ŝ t n |t n−1 , t n T Next the formulas of the proposed Algorithm are summarized.
Theorem 4.1 A nonlinear augmented complex predictor of the signal s(t k ) based on the set of nonlinear observations {y(t 1 ), ..., y(t n ), y* (t 1 ), ..., y*(t n )} of the form (25) can be determined through the equation where the m-vector (t n ) is recursively computed from the expression (t n ) = (t n−1 ) + J(t n , t n )[y(t n ) − z(ŝ(t n |t n−1 ), t n )] G(t n ) = diag(g(t n ), g*(t n )), with g(t n ) = ∂z(s, t n )/∂s| s=ŝ(t n |t n−1 ) , and Q(t n ) satisfies the recursive equation Remark 3 Similarly to Remark 1, if the signal {s(t i ), t i T} is a real-valued signal with factorizable kernel of the form (1) which is observed through a complex-valued nonlinear equation of the form (25), Algorithm 4.1 holds with a [1] (t i ) = a T (t i ), g(t i ) = [g(t i ), g*(t i )] T and replacing the 2 × q-matrices A(t i ) and B(t i ) by the vectors a(t i ) and b(t i ), respectively.

Numerical example
In this Example, the estimation of a real signal on the basis of a set of complex-valued observations is considered. Specifically, let {s(t i ), t 1 ≤ t i ≤ t 100 } be a real Wiener process with unit variance parameter and the observation equation  Figure 4 Performance of filters. Quadratic mean of the difference between the WL and SL filtering errors.
where v(t i ) = e θj u(t i ), with θ a standard normal phase and u(t i ) a white Gaussian noise with unit spectral height and uncorrelated with the signal s(t i ).
In this case, the correlation function of the signal s(t i ) can be expressed in the form (1) with a(t i ) = 1 and b(t i ) = t i .
With the aim of examining the good behavior of the WL solution proposed in Algorithm 4.1, the estimation error of the WL filter is compared with the errors associated with SL and WL conventional Algorithms. Specifically, the standard EKF and UKF and the WL EKF and WL UKF proposed in [1] have been implemented. For that, we use the fact that the signal s(t i ) obeys the state equation with w(t i ) a centered Gaussian signal with variance parameter 10 -2 .
On the other hand, for computing the estimation errors, Monte Carlo simulations have been performed. Figure 5 shows the results obtained with 5000 sample paths, confirming the better behavior of a WL processing in the nonlinear estimation problem. In fact, the dashed line represents the errors associated with the SL EKF and SL UKF (the differences between them are negligible) and the solid line depicts the errors associated with the WL UKF, WL EKF and the filter given in Algorithm 4.1 (again the differences between them are negligible). Obviously, the similar behavior shown here by the three WL filters has not to be repeated in other examples. As occurs in the standard estimation techniques, there is not a best nonlinear estimator either. In each application one has to pick the appropriate nonlinear estimation method. Really, in every particular case one has to choose the estimator which is found to best trade off various properties such as estimation accuracy, ease of implementation, numerical robustness, and computational burden [22]. Note that unlike UKF and EKF, Algorithm 4.1 does not require a state space model but only the knowledge of the second-order statistics of the processes involved.