On the solution of the rational matrix equation X

We study numerical methods for finding the maximal symmetric positive definite solution of the nonlinear matrix equation, where is symmetric positive definite and is nonsingular. Such equations arise for instance in the analysis of stationary Gaussian reciprocal processes over a finite interval. Its unique largest positive definite solution coincides with the unique positive definite solution of a related discrete-time algebraic Riccati equation (DARE). We discuss how to use the butterfly algorithm to solve the DARE. This approach is compared to several fixed-point and doubling-type iterative methods suggested in the literature.


Introduction
The nonlinear matrix equation where Q = Q T ∈ R n×n is positive definite and L ∈ R n×n is nonsingular, arises in the analysis of stationary Gaussian reciprocal processes over a finite interval. The solutions of certain 1-D stochastic boundary-value problems are reciprocal processes. For instance, the steady-state distribution of the temperature along a heated ring or beam subjected to random loads along its length can be modeled in terms of such reciprocal processes. A different example is a ship surveillance problem: given a Gauss-Markov state-space model of the ship's trajectory, it is desired to assign a probability distribution not only to the initial state, but also to the final state, corresponding to some predictive information about the ship's destination. This has the effect of modeling the trajectory as a reciprocal process. For references to these examples see, e.g., [24]. The problem considered here is to find the (unique) largest positive definite symmetric solution X + of (1). This equation has been considered, e.g., in [14,17,20,25,28,30]. In [14], the set of Hermitian solutions of (1) is characterized in terms of the spectral factors of the matrix Laurent polynomial L(z) = Q + Lz − L T z −1 . These factors are related to the Lagrangian deflating subspace of the matrix pencil In particular, one can conclude from the results in [14,Section 2] that this matrix pencil does not have any eigenvalues on the unit circle and that the spectral radius ρ(X −1 + L T ) is less than 1 as I X + spans the stable Lagrangian deflating subspace of G − λH. Alternatively, one could rewrite (1) as the discrete Lyapunov equation X + − (X −1 + L T ) T X + (X −1 + L T ) = Q. As Q and X + are positive definite, we get ρ(X −1 + L T ) < 1 from the discrete version of the Lyapunov stability theorem (see, e.g., [22, p. 451]). Moreover, it is shown in [14], that the unique largest positive definite solution of (1) coincides with the unique positive definite solution of a related Riccati equation. For this, it is noted in [14] that if X solves (1), then it also obeys the equation X = f (f (X)) = Q + F (R −1 + X −1 ) −1 F T with F = LL −T and R = L T Q −1 L = R T positive definite. Using the Sherman-Morrison-Woodbury formula to derive an expression for (R −1 +X −1 ) −1 , we obtain a discrete-time algebraic Riccati equation (DARE). Because (F, I) is controllable and (F, Q) is observable, a unique stabilizing positive definite solution X * exists [21,Theorem 13.1.3]. This unique solution coincides with that solution of (1) one is interested in. DAREs appear not only in the context presented, but also in numerous procedures for analysis, synthesis, and design of control and estimation systems with H 2 or H ∞ performance criteria, as well as in other branches of applied mathematics and engineering, see, e.g., [1,3,4,21,35].
In [14] essentially three ideas for solving (1) have been proposed. The straightforward one is a basic iterative algorithm that converges to the desired positive definite solution X + of (1). Essentially, the algorithm interprets equation (1) as a fixed point equation and iterates X i+1 = f (X i ); see Section 2.1 for more details.
The second idea is to compute the desired solution from the stable Lagrangian deflating subspace of G − λH. If we can compute Y 1 , Y 2 ∈ R n×n such that the is the desired solution of (1). (In order to distinguish the unique largest positive definite symmetric solution of (1) obtained by the different algorithms discussed, we will use different subscripts for each approach.) The third idea is to compute the desired solution via the unique solution X * of the DARE. The solution X * can be found by direct application of Newton's method for DAREs [17,18,21,27]. However, comparison with the basic fixed point iteration is not favorable [17,Section 5]. Therefore, this approach of solving the DARE is not considered here. Instead we will compute its solution via the stable deflating subspace of an associated matrix pencil. As R is positive definite, we can define As (F, I) is controllable, (F, Q) is observable, and Q and R −1 are positive definite, M − λN has no eigenvalues on the unit circle; see, e.g., [21]. It is then easily seen that M − λN has precisely n eigenvalues in the open unit disk and n outside. Moreover, the Riccati solution X * can be given in terms of the deflating subspace of M − λN corresponding to the n eigenvalues λ 1 , . . . , λ n inside the unit disk using the relation where Λ ∈ R n×n with the spectrum σ(Λ) = {λ 1 , . . . , λ n }. Therefore, if we can compute Y 1 , Y 2 ∈ R n×n such that the columns of Y 1 Y 2 span the desired deflating subspace of M − λN , then is the desired solution of the DARE (3). See, e.g., [21,23,27], and the references therein.
Hence, two of the ideas stated in [14] how to solve (1) can be interpreted as the numerical computation of a deflating subspace of a matrix pencil A − λB. This is usually carried out by a procedure like the QZ algorithm. Applying the numerically backward stable QZ algorithm to a matrix pencil results in a general 2n × 2n matrix pencil in generalized Schur form from which the eigenvalues and deflating subspaces can be determined.
Both matrix pencils to be considered here (G − λH and M − λN ) have a symplectic spectrum, that is, their eigenvalues appear in reciprocal pairs λ, λ −1 . They have exactly n eigenvalues inside the unit disk, and n outside. Sorting the eigenvalues in the generalized Schur form such that the eigenvalues inside the unit disk are contained in the upper left n × n block, the desired deflating subspace can easily be read off and the solution X , resp. X * , can be computed. (This method results in the popular generalized Schur vector method for solving DAREs [29].) Due to roundoff errors unavoidable in finite-precision arithmetic, the computed eigenvalues will in general not come in pairs {λ, λ −1 }, although the exact eigenvalues have this property. Even worse, small perturbations may cause eigenvalues close to the unit circle to cross the unit circle such that the number of true and computed eigenvalues inside the open unit disk may differ. Moreover, the application of the QZ algorithm to a 2n × 2n matrix pencil is computationally quite expensive. The usual initial reduction to Hessenberg-triangular form requires about 70n 3 flops plus 24n 3 for accumulating the Z matrix; each iteration step requires about 88n 2 flops for the transformations and 136n 2 flops for accumulating Z; see, e.g., [31]. An estimated 40n 3 flops are necessary for ordering the generalized Schur form. This results in a total cost of roughly 415n 3 flops, employing standard assumptions about convergence of the QZ iteration (see, e.g., [16,Section 7.7]).
The use of the QZ algorithm is prohibitive here not only due to the fact that it does not preserve the symplectic spectra, but also due to the costly computation. More efficient methods have been proposed which make use of the following observation: M − λN of the form (5)  Hence, as we are dealing with real symplectic pencils, the finite generalized eigenvalues always occur in pairs if they are real or purely imaginary or in quadruples otherwise. Although G − λH as in (2) is not a symplectic matrix pencil, it can be transformed into a very special symplectic pencil G − λ H as noted in [25]. This symplectic pencil G − λ H allows the use of a doubling algorithm to compute the solution X . These methods originate from the fixed point iteration derived from the DARE. Instead of generating the usual sequence {X k }, doubling algorithms generate {X 2 k }. This class of methods attracted much interest in the 1970s and 80s, see [31] and the references therein. After having been abandoned for the past decade, they have recently been revived by a series of papers, e.g. [12,25].
To be more specific, define The transformation G − λ H →G − λH is called a doubling transformation. The doubling algorithm consists of applying the doubling transformation repeatedly. An important feature of this kind of transformation is that it is structure-preserving [9], eigenspace-preserving [6,9,26], and eigenvalue-squaring. In [25], an appropriate doubling transformation for the symplectic pencil G − λ H is given. The resulting algorithm has very nice numerical behavior, with a quadratic converge rate, low computational cost and good numerical stability. Essentially the same algorithm was proposed in [28] using a different motivation. See Section 2.2 for more details. Alternatively, a doubling algorithm could be applied directly to the DARE (4). This is discussed in Section 2.2.1.
Here we propose to compute the desired solution X * via an approximate solution of the DARE (3) by the (butterfly) SZ algorithm applied to the corresponding symplectic pencil [10,11,13]. This algorithm is a fast, reliable and structure-preserving algorithm for computing the stable deflating subspace of the symplectic matrix pencil M − λN (5) associated with the DARE. The matrix pencil M − λN is first reduced to the so called symplectic butterfly form, which is determined by only 4n − 1 parameters. By exploiting this special reduced form and the symplecticity, the SZ algorithm is fast and efficient; in each iteration step only O(n) arithmetic operations are required instead of O(n 2 ) arithmetic operations for a QZ step. We thus save a significant amount of work. Of course, the accumulation of the Z matrix requires O(n 2 ) arithmetic operations as in the QZ step. Moreover, by forcing the symplectic structure, the abovementioned problems of the QZ algorithm are avoided. See Section 3 for more details.
Any approximate solution X computed, e.g., with one of the methods described above, can be improved via defect correction. This is considered in Section 4. Finally, in Section 5 we compare the different algorithms for solving (1) discussed here.

The Fixed Point Iteration
As suggested in [14], the equation (1) can be solved directly by turning it into a fixed point iteration with initial condition X 0 = Q. In [14], it is shown that the sequence {X i } converges to the unique positive definite solution X + of (1). This convergence is robust as for any positive there exists a neighborhood Υ of X + such that for any initial condition X 0 ∈ Υ, the sequence generated by (6) remains in a ball of radius centered in X + and converges to X + . Moreover, the sequence generated by (6) converges to X + for any positive definite initial condition X 0 as well as for any initial condition such that X 0 ≤ −LQ −1 L T . The convergence rate is related to the spectral radius ρ(X −1 + L T ). The convergence is linear, but, if ρ(X −1 + L T ) is close to 1, the convergence may be very slow. See also [17,Section 2].
An inverse free variant of the fixed point iteration is possible. However, the algorithm is not always convergent, [17, last paragraph, Section 3].
Our implementation of the fixed point iteration first computes the Cholesky The total flop count for one iteration step is therefore 7 3 n 3 flops, as the first step involves about n 3 3 , the second one n 3 , and the last one n 3 flops.
In many applications, rather than the solutions of matrix equations themselves, their factors (such as Cholesky or full-rank factors) are needed; see, e.g., [5,31]. Moreover, subsequent calculations can often be performed using the factors which usually have a much better condition number. Therefore, it may be desirable to have such a method that computes such a factor directly also for (1) without ever forming the solution explicitly. Such a method can also easily be derived based on the fixed point iteration (1). As all iterates are positive definite, it is natural here to use their Cholesky factors. Assuming we have a Cholesky Note that the Q-factor is not needed as it cancels: An LQ factorization for the specially structured matrix in (7) is implemented in the SLICOT 1 subroutine MB04JD. Employing this, the factorized fixed point iteration yielding the sequence Y i of Cholesky factors of X i requires 3n 3 flops per iteration and is thus slightly more expensive than the fixed point iteration itself. Additionally, n 3 3 flops for the initial Cholesky factorization of Q are needed.

The Doubling Algorithm
As already observed in [14], the solution X of (1), Hence, the desired solution X can be computed via an appropriate deflating subspace of G − λH. This could be done by employing the QZ algorithm. But the following idea suggested in [25] achieves a much faster algorithm.
Assume that X is the unique symmetric positive definite solution of (1). Then it satisfies (8) Then it follows that where The pencil G − λ H is symplectic as GJ G T = HJ H T . (As G and H are not symplectic themselves, the butterfly SZ algorithm described in the next section can not be employed directly in order to computed the desired deflating subspace of G − λ H.) It is easy to see that X satisfies (9) if and only if the equation has a symmetric positive definite solution X.
In [25], it is suggested to use a doubling algorithm to compute the solution X of (9). An appropriate doubling transformation for the symplectic pencil (9) is given. Applying this special doubling transformation repeatedly the following structure-preserving doubling algorithm (SDA) arises: As the matrix Q i − P i is positive definite for all i [25], the iterations above are all well defined. The sequence Q i+1 will converge to X. Thus, the unique symmetric positive definite solution to (1) can be obtained by computing Essentially the same algorithm was proposed in [28] using a different motivation. Both papers [25,28] point out that this algorithm has very nice numerical behavior, with a quadratic converge rate, low computational cost and good numerical stability. The algorithm requires about 6.3n 3 arithmetic operations per iteration step when implemented as follows: first a Cholesky decomposi- and C −T i L T i are computed (both steps require n 3 arithmetic operations), finally L i+1 , Q i+1 , P i+1 are computed using these products (4n 3 arithmetic operations if the symmetry of Q i+1 and P i+1 is exploited). Hence, one iteration step requires 19 3 n 3 arithmetic operations.
Despite the fact that a factorized version of the doubling iteration for DAREs has been around for about 30 years, see [31] and the references therein, the SDA (11) for (1) can not easily be rewritten to work on a Cholesky factor of Q i due to the minus sign in the definition of the Q i 's.

A doubling algorithm for (5)
As explained in the introduction, the solution X of (1) can also be obtained from the deflating subspace of the pencil (5). In [2] a doubling algorithm for computing this solution has been developed as an acceleration scheme for the fixed-point iteration from (4), Using the notation introduced here, that algorithm (here called SDA-DARE) can be stated as follows (see [12]): The algorithm requires 44 3 n 3 flops: the matrix multiplications in (13) and (18) require about 2n 3 flops each, the computation of the symmetric matrices in (16) and (17) comes at about 3n 3 flops, the decomposition of W costs 2 3 n 3 flops, and the computations in (14) and (15) require 2n 3 flops each. Its quadratic convergence properties are analyzed in [12]. Compared to the doubling algorithm discussed in the previous section, this algorithm is more costly: 19 3 n 3 flops versus 44 3 n 3 flops, but it avoids using the inverse of Q. The inverse of L is used instead. Like the fixed point iteration, the doubling algorithm for DAREs can be rewritten in terms of (Cholesky) factors so that the iterates resulting from (17) in factorized form converge to a (Cholesky) factor of the solution. This has been known for decades (see [31] and the references therein), a slightly refined variant that computes a low-rank factor of the solution in case of rank-deficiency of X has recently been proposed in [7]. In contrast to the usual situation for DAREs where G and Q are often of low rank, no efficiency gain can be expected from such an implementation in our situation as G, Q, and X are all full rank matrices.

The butterfly SZ algorithm
As shown in [14], instead of solving the equation (1) one can solve the related DARE (3), One approach to solve this equation is via computing the stable deflating subspace of the matrix pencil from (5), i.e., Here we propose to use the butterfly SZ algorithm for computing the deflating subspace of M − λN . The butterfly SZ algorithm [11,13] is a fast, reliable and efficient algorithm especially designed for solving the symplectic eigenproblem for a symplectic matrix pencil M − λ N in which both matrices are symplectic; that is M J M T = N J N T = J. The above symplectic matrix pencil can be rewritten (after premultiplying by L 0 0 L −1 ) as where both matrices M = N T are symplectic. In [11,13] it is shown that for the symplectic matrix pencil M − λ N there exist numerous symplectic matrices Z and nonsingular matrices S which reduce M − λ N to a symplectic butterfly pencil A − λB: where C and D are diagonal matrices, and T is a symmetric tridiagonal matrix. (More generally, not only the symplectic matrix pencil in (19), but any symplectic matrix pencil M − λ N with symplectic matrices M , N can be reduced to a symplectic butterfly pencil). This form is determined by just 4n − 1 parameters. The symplectic matrix pencil A − λB is called a symplectic butterfly pencil. If T is an unreduced tridiagonal matrix, then the butterfly pencil is called unreduced. If any of the n − 1 subdiagonal elements of T are zero, the problem can be split into at least two problems of smaller dimension, but with the same symplectic butterfly structure.
Once the reduction to a symplectic butterfly pencil is achieved, the SZ algorithm is a suitable tool for computing the eigenvalues/deflating subspaces of the symplectic pencil A − λB [11,13]. The SZ algorithm preserves the symplectic butterfly form in its iterations. It is the analogue of the SR algorithm (see [10,13]) for the generalized eigenproblem, just as the QZ algorithm is the analogue of the QR algorithm for the generalized eigenproblem. Both are instances of the GZ algorithm [34].
Each iteration step begins with an unreduced butterfly pencil A−λB. Choose a spectral transformation function q and compute a symplectic matrixZ such thatZ −1 q(A −1 B)e 1 = αe 1 for some scalar α. Then transform the pencil to

This introduces a bulge into the matrices A and B. Now transform the pencil to
where A − λ B is again of symplectic butterfly form. S and Z are symplectic, and Ze 1 = e 1 . This concludes the iteration. Under certain assumptions, it can be shown that the butterfly SZ algorithm converges cubically. The needed assumptions are technically involved and follow from the GZ convergence theory developed in [34]. The convergence theorem says roughly that if the eigenvalues are separated, and the shifts converge, and the condition numbers of the accumulated transformation matrices remain bounded, then the SZ algorithm converges. For a detailed discussion of the butterfly SZ algorithm see [11,13]. Hence, in order to compute an approximate solution of the DARE (3) by the butterfly SZ algorithm, first the symplectic matrix pencil M − λ N as in (19) has to be formed, then the symplectic matrix pencil A − λB as in (20) is computed. That is, symplectic matrices Z 0 and S 0 are computed such that is a symplectic butterfly pencil. Using the butterfly SZ algorithm, symplectic matrices Z 1 and S 1 are computed such that is a symplectic butterfly pencil and the symmetric tridiagonal matrix T in the lower right block of S −1 1 BZ 1 is reduced to quasi-diagonal form with 1 × 1 and 2×2 blocks on the diagonal. The eigenproblem decouples into a number of simple 2 × 2 or 4 × 4 generalized symplectic eigenproblems. Solving these subproblems, finally symplectic matrices Z 2 , S 2 are computed such that where the eigenvalues of the matrix pencil φ 11 − λψ 11 are precisely the n stable generalized eigenvalues. Let Z = Z 0 Z 1 Z 2 . Partitioning Z conformably, the Riccati solution X * is found by solving a system of linear equations: This algorithm requires about 195n 3 arithmetic operations in order to compute the solution of the Riccati equation (and is therefore cheaper than the QZ algorithm which requires about 422n 3 arithmetic operations). The cost of the different steps of the approach described above are given as follows. The computation of L −1 Q and L −1 using an LU decomposition of L requires about 14 3 n 3 arithmetic operations. A careful flop count reveals that the initial reduction of M − λ N to butterfly form A − λB requires about 75n 3 arithmetic operations. For computing Z 0 , an additional 28n 3 arithmetic operations are needed. The butterfly SZ algorithm requires about O(n 2 ) arithmetic operations for the computation ofȂ − λB and additional 85n 3 arithmetic operations for the computation of Z (this estimate is based on the assumption that 2 3 iterations per eigenvalue are necessary as observed in [11]). The solution of the final linear system requires 14 3 n 3 arithmetic operations. Hence, the entire algorithm described above requires about 586 3 n 3 arithmetic operations.
However, it should be noted that in the SZ algorithm non-orthogonal equivalence transformations have to be used. These are not as numerically stable as the orthogonal transformations used by the QZ algorithm. Therefore, the approximate DARE solution computed by the SZ algorithm is sometimes less accurate than the one obtained from using the QZ algorithm. A possibility to improve the computed solution is defect correction as discussed in the next section.

Defect Correction
Any approximate solution X computed, e.g., with one of the methods described above, can be improved via defect correction. Let where X is the exact solution of (1), X = Q + LX −1 L T . Then Assume that ||E|| < 1/|| X −1 ||.
With the residual we thus have R( X) ≈ E + LE L T . By dropping terms of order O(||E|| 2 ), we obtain the defect correction equation Hence, the approximate solution X can be improved by solving (23) for E. The improved X is then given by X = X − E. Proof. Note that (23) is equivalent to the linear system of equations where ⊗ denotes the Kronecker product and vec(A) = [a 11 , . . . , a n1 ,  a 12 , . . . , a n2 , . . . , a 1n , . . . , a nn ] T is the vector that consists of the columns of A = [a ij ] n i,j=1 stacked on top of each other from left to right [19, Section 4.2]. As ρ( L) < 1, the assertion follows from σ( Note that Lemma 4.1 also follows from a more general existence result for linear matrix equations given in [30,Proposition 3.1]. In [17], essentially the same defect correction was derived by applying Newton's method to (1). Written in the notation used here, the defect correction equation derived in [17] reads It is easy to see that this is equivalent to (23). In [17], it is suggested to solve the defect correction equation with a general Sylvester equation solver as in [15]. In that case, the computational work for solving the defect correction equation would be roughly 18 times that for the basic fixed point iteration. But a more efficient algorithm which makes use of the special structure of (23) can be easily devised: first, note that (23) looks very similar to a Stein (discrete Lyapunov) equation. The only difference is the sign in front of E. With this observation and a careful inspection of the Bartels-Stewart type algorithm for Stein equations suggested in [8] and implemented in the SLICOT Basic Control Toolbox 2 function slstei (see also [32]), equation (23) can be solved efficiently with this algorithm when only a few signs are changed. This method requires about 14 times the cost for one fixed point iteration as the Bartels-Stewart type algorithm requires 32n 3 flops [31].

Numerical Experiments
Numerical experiments were performed in order to compare the four different approaches for solving (1) discussed here. All algorithms were implemented in Matlab Version 7.2.0.232 (R2006a) and run on an Intel Pentium M processor.
In particular, we implemented • the fixed point iteration as described in Section 2.1 which requires 7 3 n 3 arithmetic operations per iteration, • the doubling algorithm SDA as described in Section 2.2 which requires 19 3 n 3 arithmetic operations per iteration and uses the inverse of Q, • the doubling algorithm SDA-DARE as described in Section 2.2.1 which requires 44 3 n 3 arithmetic operations per iteration and uses the inverse of L, • the SZ algorithm as described in Section 3 which requires 586 3 n 3 arithmetic operations and uses the inverse of L.
Slow convergence of the fixed point iteration has been observed in, e.g., [14,17]. The convergence rate depends on the spectral radius ρ(X + L −T ). One iteration of the doubling algorithm SDA costs as many as 2.7 iterations of the fixed point iteration. In [25], no numerical examples are presented, in [28] only one example is given (see Example 2 below) in which the doubling algorithm is much faster than the fixed point iteration. Our numerical experiments confirm that this is so in general. The SZ algorithm costs as many as 84 iterations of the fixed point iteration, as many as 31 iterations of the doubling algorithm SDA and as many as 13 iterations of the doubling algorithm SDA-DARE.
Example 1. First, the fixed point equation approach as described in Section 2.1 was compared to the SZ approach as described in Section 3. For this, each example was first solved via the SZ approach. The so computed solution X * was used to determine the tolerance tol to which the fixed point iteration is run. That is, the fixed point iteration was stopped as soon as For the first set of examples Q and L were constructed as follows (using Matlab notation): [Q,R]= qr(rand(n)); Q = Q'*diag(rand(n,1))*Q; L = rand(n); 100 examples of size n = 5, 6, 7, . . . , 20 and n = 30, 40, 50, . . . , 100 were generated and solved as described above. The fixed point iteration was never run for more than 300 steps. The accuracy of the residual (24) achieved by the SZ approach was in general of the order of 10 −12 for smaller n and 10 −8 for larger n. But, as nonorthogonal transformation have to be used, occasionally, the accuracy can deteriorate to 10 −3 . In that case, defect correction as described in Section 4 or the fixed point iteration with starting matrix X 0 = X * can be used to increase the accuracy of the computed solution.
Next the doubling algorithm SDA was used to solve the same set of examples. Its iteration solves the equation (10), the desired solution X is obtained from the computed solution via (12). The iteration was run until the residuum was less than n · ||Q|| F · eps, where eps is Matlab's machine epsilon. This does not imply the same accuracy for the solution X of (1). Due to the back substitution (12), the final solution X may have a larger residual error. For these examples, only about 7 iterations where needed to determine an X which has about the same (or better) accuracy than the solution X * computed via the SZ algorithm. Therefore, for these examples, the doubling algorithm is certainly more efficient than the fixed-point iteration or the SZ algorithm.
Finally, the SDA-DARE algorithm was used to solve the same examples. As the iterates X i converge not only to the solution of the DARE (4), but also to the solution of the equation (1), the iteration is run until The average number of iterations needed for convergence is similar to that of the SDA algorithm, but each iteration here is more expensive than for the SDA algorithm. Hence, overall, for these examples, the SDA algorithm is the most efficient algorithm. For each n, for about two or three examples out of the 100 examples generated, the SDA-DARE did not quite achieve the same accuracy as the SZ algorithm: after about 5 iteration steps, the achieved accuracy just stagnated, usually only slightly larger than the accuracy achieved by the SZ algorithm. The matrices Q generated for these tests had a fairly small condition number 1 < κ 2 (Q) < 10 5 , and a small norm 0.3 < ||Q|| 2 < 1.
As can be seen from Table 2, the fixed point iteration performed much better for these examples, but the number of iterations necessary for convergence seems to be unpredictable. The doubling iteration SDA performs better than before, less iterations were needed for convergence. But while the iteration is run until the residual is less than n · ||Q|| F · eps, it is clearly seen here that this does not imply the same accuracy for the solution X of (1). The larger n is chosen, the worse the residual becomes compared to the residual tol obtained by the SZ algorithm. Hence, the SZ algorithm may require more arithmetic operations, but usually it generates more accurate solutions. For most examples, the SDA-DARE algorithm converges in about 5 iterations to the same accuracy as the SZ algorithm, hence it is much more efficient. But as before, for few examples, the SDA-DARE algorithm did not achieve the same accuracy as the SZ algorithm as it stagnated at an accuracy 10 · tol. Rarely, the algorithm stagnated after about 5 iteration at a much larger error.
In case, examples with ill-conditioned L are solved, the SDA-DARE and the SZ algorithm obviously will be a bad choice, while the fixed-point iteration and the SDA algorithm do not have any (additional) problems with ill-conditioned L.
Slow convergence for the fixed point iteration was already observed in [17], after 400 iteration steps one obtains the residual norm   and ||X + − X || F = 6.92 · 10 −11 .
Hence, the doubling iterations outperform the SZ algorithm here.

Conclusions
We have discussed several algorithms for a rational matrix equation that arises in the analysis of stationary Gaussian reciprocal processes. In particular, we have described the application of the SZ algorithm for symplectic pencils to solve this equation. Moreover, we have derived a defect correction equation that can be used to improve the accuracy of a computed solution. Several examples comparing the iterative methods with the SZ approach show that none of the methods discussed is superior. Usually, both doubling-type algorithms SDA and SDA-DARE compute the approximate solution very fast, but due to the back transformation step, the accuracy of the SDA algorithm can deteriorate significantly. On the other hand, the fixed point iteration is often very slow. The SZ approach needs a predictable computing time which is most often less than that of the fixed point iteration when a comparable accuracy is requested, but is usually much higher than for the doubling algorithms. The accuracy of the SZ approach is not always the best compared to the other methods, but in a number of examples, the doubling algorithms are unable to attain the same accuracy while the fixed point iteration is significantly slower.