Finite Sample FPE and AIC Criteria for Autoregressive Model Order Selection Using Same-Realization Predictions

A new theoretical approximation for expectation of the prediction error is derived using the same-realization predictions. This approximation is derived for the case that the Least-Squares-Forward (LSF) method (the covariance method) is used for estimating the parameters of the autoregressive (AR) model. This result is used for obtaining modified versions of the AR order selection criteria FPE and AIC in the finite sample case. The performance of these modified criteria is compared with other same-realization AR order selection criteria using simulated data. The results of this comparison show that the proposed criteria have better performance.


Introduction
Let a real AR process be given by x(n) = a 1 x(n − 1) + · · · + a p x n − p + ε(n), (1) where p is the order of the process, a 1 , . . . , a p are the real coefficients of the process, and ε(n) is an independent and identically distributed (i.i.d.) random process with zero mean and variance σ 2 ε . The process ε(n) is the input process to the AR model. It is assumed that x(·) is mean and covariance ergodic; so the poles of the AR model are inside the unit circle.
Suppose that x(1), . . . , x(N) are consecutive samples of a sample function of the AR process x(·) given by (1). In addition, suppose that an arbitrary nonnegative integer q is considered as the true process order. The usual estimator for a nonrandom set of parameters is the maximum likelihood estimator (MLE) [1]. However, the exact solution of the MLE for the parameters of an AR(q) process is difficult to obtain [1]. If N q, each of the AR estimation methods Least-Squares-Forward (LSF), Least-Squares-Forward-Backward (LSFB), and Burg and Yule-Walker [1][2][3] is an approximation of the MLE. In this paper, we use the LSF method to estimate a model as x(n) = a 1 x(n − 1) + · · · + a q x n − q + ε(n), (2) where a i 's are the LSF estimates of the AR parameters. Residual variance, denoted by S 2 (q), is a measure of the fitness of the above model to the data that have been used for estimating the parameters. It is defined as ( In the literature, two different kinds of predictions under model (1) are considered. For independent-realization predictions, the aim is to predict the future of another independent series which has exactly the same probabilistic structure as the observed one. One special feature of this type of prediction is that its mathematical analysis is relatively easy. The independent-realization prediction error is defined as follows: where y(·) is another independent series which has exactly the same probabilistic structure as x(·), and E y {·} is the expectation operator over this independent series. However, for the practitioner, the emphasis is usually placed on samerealization predictions, that is, on the prediction of x(N + h), h ≥ 1 given x(1), . . . , x(N). The same-realization prediction error is defined as follows: In (4) and (5), a i 's are the LSF estimates of a i 's using x(1), . . . , x(N). Note that in the expectation in (4) y(n) is independent of the a i 's, but in the expectation in (5), x(N + h) depends on the a i 's. This difference causes the prediction errors in (4) and (5) to be different. Almost all existing AR order selection criteria have been derived for independent-realization data. The independency is not a natural property for time series data, because when new observations of a time series become available, they are usually dependent on the previous data. So far, few time series model selection theories have been established without this unnatural assumption. In addition, the use of most of these model order selection criteria has not been justified for the same-realization case. Akaike information criterion (AIC) [4], final prediction error (FPE) criterion [5], Bayes information criterion (BIC) [6], minimum description length (MDL) criterion [7], Kullback information criterion (KIC) [8], corrected AIC (AIC C ) [9], and corrected KIC (KIC C ) [10] are examples of the criteria that have been derived for the large sample and independent-realization case. In [11,12], Ing and Wei justified the use of FPE and AIC as AR order selection criteria in the large sample and same-realization case. Ing and Wei [11] presented a theoretical verification that AIC and FPE are asymptotically efficient (in the sense of the mean square prediction error) for same-realization predictions. When the underlying AR model is known to be stationary and of infinite order, Ing and Wei [11] showed that the values of the expectations of the squared prediction error for independent-and samerealization cases, with the order selected by FPE or AIC, have the same asymptotic expressions. Recently, Ing [12] removed the infinite-order assumption and verified the asymptotic efficiency of several information criteria, including FPE and AIC, for both finite-and infinite-order stationary AR models.
The order selection criteria obtained in the large sample and independent-realization case are not dependent on the method of estimation of the AR parameters. In other words, these criteria are identical for all AR parameter estimation methods. However, it is well known that, in the finite sample and independent-realization case, the performance of the order selection criteria depends on the parameter estimation method and it is necessary to derive different order selection criteria for each parameter estimation method [13][14][15][16][17][18].
In this paper, we derive a new estimate of the samerealization prediction error in the finite sample case and for the LSF parameter estimation method. We will use this new estimate to derive same-realization versions of FPE (FPEF) and AIC (AICF) in the finite sample case.
The remainder of this paper is organized as follows. In Section 2, a new theoretical approximation is derived for the expectation of same-realization prediction error in the case that the LSF method is used. Based on this approximation, the FPEF criterion is introduced. In Section 3, the AICF criterion is introduced. In Section 4, simulated data are used for comparing the performance of the proposed order selection criteria with that of the existing criteria. In Section 5, the conclusions of this paper are discussed.

Estimation of the Same-Realization Prediction Error
Suppose that we have N observation data of the AR model defined by (1) as x(1), x(2), . . . , x(N). We define the following vectors and matrices for the case that the candidate order q is greater than or equal to the true order (q ≥ p), and T stands for a transposed operation: It follows from (1), (6), (7), (9), and (10) that We use the LSF method to obtain the least-squares estimate of β A as follows: The one-step same-realization prediction error is given by where x q (N + 1) is the linear predictor of x(N + 1) given x(1), . . . , x(N), that is, It follows from (1), (6), and (8) that Substituting (15) and (16) into (14), we can rewrite (14) as Using (12) and (13), x T (N)b A can be written as Combining (17) and (18), we have The value of the first term in the right-hand side of (19) is given as In (19), we know that ε(N + 1) is independent of the past samples of x(1), x(2), . . . , x(N), and ε(1), ε(2), . . . , ε(N) for the AR process. In addition, the expres- , and the third term of the right-hand side of (19) is equal to zero: It follows from (12) that the value of the second term of the right-hand side of (19) is equal to The expression Therefore, using (20), (21), and (23), we can rewrite (19) as Taking the expectation of PE(q) over the vector x, it follows from (24) that It can be seen from (7)- (10) that Substituting (26) into (25), we obtain It is assumed that x(·) is a zero mean and covariance ergodic process. So, we define the covariance matrix R and its estimate R in the following way: Using (29) and (27), we have To obtain a value for (30), we make the assumption that for sufficiently large (N − q): | RR −1 − I q | < 1, where |A| is the Euclidean norm of matrix A. Under this assumption, it is shown in [19] that When N −q tends to infinity, the third term in the right-hand side of (32) tends towards zero faster than the other terms. So, for large enough values of It is shown in [16,17] that Combining (33) and (34), we obtain The above relation can be used for estimating PE(q) in the following way: It is reasonable to choose the integer q that minimizes this estimate of prediction error as the appropriate order for the AR process. So, we propose the finite sample FPE (FPEF) criterion as The FPE criterion, which is an asymptotic estimate of the independent-realization prediction error, is defined in the following way [5]: As we mentioned earlier, the criterion defined by (38) is also used in the independent-realization case. When N q, the FPE criterion given by (38) is an approximation of FPEF. In the asymptotic case N → ∞ the criteria FPE and FPEF become identical.

AICF Criterion
We now give a mathematical derivation for finite sample AIC (AICF) criterion in the same-realization case starting from Kullback-Leibler (K-L) information. The K-L information is a measure of the distance between the true and the approximating pdfs for the data generated by the true pdf.
The K-L information for the approximating model g(x | θ) is given by [20] where x is a vector of data generated by the true pdf f . We look for a uniqe value of parameter vector θ, denoted by θ 0 , that minimizes the K-L information. So, θ 0 is the solution to the following optimization problem [20]: where Θ is the parameter space. In fact, we have to find the model order and the parameter values that minimize (39). The expression in the right-hand side of (39) is not dependent on the observed data and f is unknown. So, we cannot compute (39) for different values of θ and determine the value θ 0 that minimizes the K-L information. Therefore, we rewrite the K-L information in the following way: The first term in the right-hand side of (41) is not dependent on θ. So, instead of minimizing (41), we can minimize the following term: Note that the expression (42) is not dependent on x , but it depends on θ. The parameter vector θ is unknown. So, in order to obtain an estimate of (42) we can replace θ by its estimate. Suppose that N observations x(1), x(2), . . . , x(N) are available. Thus, for each model order, it is reasonable to replace θ by its maximum likelihood estimate θ which is a function of the observed data. As θ is the maximum likelihood estimate of θ, this replacement minimizes (42) for each value of the model order. Now, in a slightly simplified notation, minimizing the K-L information changes to minimizing the Kullback-Leibler index given by where θ is the maximum likelihood estimator of θ given the observed data vector x , and g[x | θ] is the likelihood function. Usually, instead of using the exact maximum likelihood estimator of θ, an approximation of it is used as θ in (43). Now, we solve our AR model order selection problem in the same-realization case by minimizing the Kullback-Leibler index Δ given by (43). We assume that the input process ε(n) to the AR model is white Gaussian noise (WGN). In our problem, θ (the approximation of the maximum likelihood estimator of the AR parameter vector θ), x (the future data of the AR model that depends on the observed data), and x (vector of the observed data of the AR model) are defined as follows: Substituting (44)-(46) into likelihood function g(x | θ, x ), we obtain When the input process ε(n) to the AR model is white can be written as Taking logarithm from both sides of (48), we get Substituting (45) and (49) into (43), we have The parameter σ 2 ε in (50) can be replaced by the residual variance S 2 (q) given by (3). Thus, combining (14), (15), and (50), we obtain The prediction error PE(q) is unknown; so we replace it by its estimate given by (36) to obtain The order q that minimizes (52) can be selected as the best AR model order. So, omitting the constant ln(2π) from (52), we obtain the finite sample AIC (AICF) criterion for AR model order selection in the same-realization case as The AIC criterion, which was derived for the independentrealization case, is defined in the following way [4]:

Simulation Results
To investigate the effectiveness of the proposed criteria FPEF and AICF, we consider the problem of autoregressive model order selection for simulated data. The simulated data have been produced by AR(0), AR(1), AR(2), AR (3), and AR (7) processes defined as follows: where ε(n) is WGN with zero mean and variance σ 2 ε = 1. We can define the SNR parameter for each AR model as the ratio of the average output power of the AR model to σ 2 ε in dB. The SNRs of AR(0), AR(1), AR(2), AR(3), and AR(7) models are 0 dB, 10 dB, 10.13 dB, 5.11 dB, and 23 dB, respectively.
For each of the four models given by (55), 5000 independent simulation runs of 20 consecutive samples of the AR process were generated. In each simulation run, the first nineteen samples were used for estimating the coefficients of the AR model, and the last sample was used for computing the prediction error. Candidate orders were considered from 0 to 8. For each candidate order, the prediction error has been approximated by averaging over 5000 simulation runs. This true prediction error and the estimated prediction errors given by FPE and FPEF are shown for AR(0), AR(1), AR(2), AR(3), and AR(7) models in Figures 1, 2, 3, 4, and 5, respectively. These figures illustrate that when q/N is not small, the FPEF gives an estimate of the prediction error that is much better than FPE. Table 1 gives the mean of the prediction error for the AR order selection criteria FPE, FPEF, AIC, and AICF. The mean of the prediction error is computed for each AR order selection criterion by averaging the prediction error over 5000 intervals of simulated data for the orders that the AR     order selection criterion has selected. The maximum and minimum candidate orders are q max = 8 and q min = 0, respectively, and the number of generated samples in each simulation run is 20 (the first nineteen samples are used for estimating the coefficients of the AR model and the last sample is used for computing the prediction error). The mean of the minimum prediction errors (MPE) that are possible in each run is also computed. It can be seen from Table 1 that the performance of AICF is better than the other criteria.

Conclusion
So far, few time series model selection theories have been established for same-realization prediction. In this paper, a new theoretical approximation was derived for the expectation of same-realization prediction error by using the LSF method for estimation of the AR parameters. Using this approximation and the approximation given in [16,17] for the expectation of residual variance, the FPE and AIC criteria for AR order selection in the same-realization case were modified. The modified FPE and AIC criteria were called FPEF and AICF, respectively. Simulation results show that the bias in the estimates that FPEF gives for the prediction error is less than that of FPE. The performance of FPEF and AICF in AR model order selection was compared with FPE and AIC in the finite sample case. The results of this performance comparison showed that the performance of FPEF is better than FPE, and the performance of AICF is better than all other criteria. In the large sample case, the performance of FPEF and AICF is approximately identical to those of FPE and AIC, respectively.