Sensitivity-Based Pole and Input-Output Errors of Linear Filters as Indicators of the Implementation Deterioration in Fixed-Point Context

Input-output or poles sensitivity is widely used to evaluate the resilience of a filter realization to coefficients quantization in an FWL implementation process. However, these measures do not exactly consider the various implementation schemes and are not accurate in general case. This paper generalizes the classical transfer function sensitivity and pole sensitivity measure, by taking into consideration the exact fixed-point representation of the coefficients. Working in the general framework of the specialized implicit descriptor representation, it shows how a statistical quantization error model may be used in order to define stochastic sensitivity measures that are definitely pertinent and normalized. The general framework of MIMO filters and controllers is considered. All the results are illustrated through an example.


Introduction
The majority of control or signal processing systems is implemented in digital general purpose processors, DSPs (Digital Signal Processors), FPGAs (Field Programmable Gate-Array), and so forth. Since these devices cannot compute with infinite precision and approximate real-number parameters with a finite binary representation, the numerical implementation of controllers (filters) leads to deterioration in characteristics and performance. This has two separate origins, corresponding to the quantization of the embedded coefficients and the round-off errors occurring during the computations. They can be formalized as parametric errors and numerical noises, respectively. This paper is focused on parametric errors, but one can refer to [1][2][3][4] for roundoff noises, where measures with fixed-point consideration already exist or to [5] for interval-based characterization.
It is also well known that these Finite Word Length (FWL) effects depend on the structure of the realization. In state-space form, the realization depends on the choice of the basis of the state vector. This motivates us to investigate the coefficient sensitivity minimization problem. It has been well studied with the L 2 -measure [1,6]. However, this measure only considers how sensitive to the coefficients the transfer function is and does not investigate the coefficients quantization, which depends on the fixed-point representation used. In [6], the transfer function error is exhibited for the first time, however, only for quantized coefficients with the same binary-point position.
A common assumption in FWL error analysis is that the perturbations on the coefficients are independent and uniformly distributed random variables in the interval [− /2; /2] with some constant depending on the wordlength. As shown in Section 4.1, this range can be different for each coefficient and depends on the coefficient itself and some fixed-point choices for the implementation. In that sense, this paper takes in consideration the different binary-point position of the coefficients in order to define a new stochastic error measure.
Making use of the Specialized Implicit Framework proposed by the authors in [7], this paper extends the stochastic approach of [8] to a much larger class of realizations, in 2 EURASIP Journal on Advances in Signal Processing order to define and compute the transfer function and poles sensitivity (in both context of open-and closed-loop schemes).
The classical sensitivity analysis is introduced in Section 2 whereas the Specialized Implicit Framework is presented in Section 3. Section 4 exhibits the fixed-point implementation scheme and the new transfer function error, and Section 5 presents the pole error. A brief extension to closed-loop cases is shown in Section 6. The optimal realization problem is discussed in Section 7 with an example to illustrate theoretical results. Finally, some concluding remarks are given in Section 8.
Notations. Throughout this paper, real numbers are in lowercase, column vectors in lowercase boldface, and matrices in uppercase boldface. A * will denote the conjugate, A the transpose, A H the transpose-conjugate, tr(A) the trace operator, E{A} the mean operator, Re(A) the real part, and A × B the Schur product of A and B, respectively.

Classical Sensitivity Analysis
Classically, in the literature, the sensitivity analysis is performed on a state-space realization. Some other extended structures (like direct form, ρ-modal, δ-operator state-space, etc.) have been also studied, and specific sensitivity analysis has been performed for each structure.
Let (A, b, c, d) be a stable, controllable, and observable linear discrete time Single Input Single Output (SISO) statespace system, that is, x(k + 1) = Ax(k) + bu(k), y(k) = cx(k) + du(k), (1) where A ∈ R n×n , b ∈ R n×1 , c ∈ R 1×n , and d ∈ R. u(k) is the scalar input, y(k) is the scalar output, and x(k) ∈ R n×1 is the state vector at time k.
Its input-output relationship is given by the scalar transfer function h : C → C defined by 2.1. Transfer Function Sensitivity Measure. The quantization of the coefficients A, b, c, and d introduces some uncertainties leading to A + ΔA, b + Δb, c + Δc, and d + Δd, respectively. It is common to consider the sensitivity of the transfer function with respect to the coefficients [1,9,10], based on the following definitions.
Definition 1 (Transfer Function Derivative). Consider X ∈ R m×n and f : R m×n → C differentiable with respect to all the entries of X. The derivative of f with respect to X is defined by the matrix S X ∈ R m×n such as Applied to a scalar transfer function h where h(z) depends on a given matrix X, ∂h/∂X is a Multiple Inputs Multiple Outputs (MIMO) transfer function, defined by Definition 2 (L 2 -Norm). Let H : C → C k×l be a function of the scalar complex variable z (i.e., a MIMO transfer function). Its L 2 -norm, denoted H 2 is defined by where Y F is the Frobenius norm of the matrix Y defined by It can be computed with Proposition 4 and the following equations F and G can be seen as the MIMO state-space systems (A, b, I n , 0) and (A, I n , c, 0), respectively.

Proposition 4.
If H is the MIMO state-space system (K, L, M, N), then its L 2 -norm can be computed by where W c and W o are the controllability and observability Gramians, respectively. They are solutions to the Lyapunov equations Proof. See [1].
EURASIP Journal on Advances in Signal Processing 3 Remark 5. This measure is an extension of the more tractable but less natural L 1 /L 2 sensitivity measure proposed by Tavsanoglu and Thiele [10] ( ∂h/∂A 2 1 instead of ∂h/∂A 2 2 in (7)).
Applying a coordinate transformation, defined by x(k)U −1 x(k) to the state-space system (A, b, c, d), leads to a new equivalent realization (U −1 AU, U −1 b, cU, d).
Since these two realizations are equivalent in infinite precision but are no more equivalent in finite precision (fixed-point arithmetic, floating-point arithmetic, etc.), the L 2 -sensitivity then depends on U and is denoted M L2 (U).
It is natural to define the following problem.
Problem 1 (Optimal L 2 -sensitivity problem). Considering a state-space realization (A, b, c, d), the optimal L 2 -sensitivity problem consists of finding the coordinate transformation U opt that minimizes the transfer function sensitivity measure In [1], it is shown that the problem has one unique solution, and a gradient method can be used to solve it.

Pole Sensitivity
Measure. In addition to the transfer function sensitivity measure, some other sensitivity-based measures have been developed: the perturbations of the system poles is specially studied [11][12][13][14]. Poles are not only structuring parameters, but also indicators of the stability.
Let (λ k ) 1 k n denote the poles of the system (they are the eigenvalues of A). The partial pole sensitivity measure Ψ k is defined as follows: Remark 6. The eigenvalues λ k does not depend on b, c, and d, so the terms ∂|λ k |/∂b, ∂|λ k |/∂c, and ∂|λ k |/∂d are not considered in the definition (13) (they are null).
Moreover, the moduli of the poles is considered because the FWL error that can cause a stable system to become unstable is determined by how close the pole are to 1 and how sensitive they are to the parameter perturbations. So, the partial pole sensitivities are combined in a global Pole Sensitivity Measure [15].
where (ω k ) 1 k n are the weighting coefficients. Generally to give more weight for the poles closed to the unit circle [15]. The pole sensitivity measure is also used in closed-loop context, in some stability-related measures [14,16], see Section 6.

Limitations.
The classical measures are based on the sensitivity with respect to the coefficients. Since it was classically assumed [1,6,12] that the perturbations on the coefficients were independent and uniformly distributed random variable in the interval [− /2; /2] with some positive constant depending on the wordlength only, it was natural to consider the sensitivity as a good evaluation of the overall deterioration (transfer function moving or pole moving). But this is a reasonable consideration only if the coefficients all have the same magnitude order. It is generally not the case in practice.
To illustrate this point, let us consider the first-order transfer function h : z → 100/(z − 0.8). The three following realizations are state-space realizations of this transfer function, with coefficient quantized in 8-bit fixed-point (in bold are the integer values coding for the coefficients, the exponent part being implicit, see Section 4.1) One can remark that all the coefficients do not have the same exponent (these realizations are classical realizations, that is, balanced, arbitrary-scaled, and L 2 -scaled, resp.). The quantization error of these coefficients will be completely different, since his quantization error is equal to their powerof-2 part, for example, So, for the same sensitivity, the quantization of coefficients with higher magnitude will more affect the transfer function and the poles. But the sensitivity measures previously presented cannot take this into consideration. Table 1 exhibits the transfer function sensitivity measure and the transfer function error h − h † 2 (where h † is the transfer function with quantized coefficients) for these three different realizations. In that case, X 2 has the highest L 2 -sensitivity, but is yet the most resilient to the fixed-point implementation considered.

Specialized Implicit Framework
3.1. Definitions. Many controller/filter forms, such as lattice filters and δ-operator controllers, make use of intermediate variables, and hence cannot be expressed in the traditional state-space form. The SIF has been proposed in order to model a much wider class of discrete-time linear timeinvariant controller implementations than the classical statespace form. It is presented here for MIMO filters/controllers.
The model takes the form of an implicit state-space realization [17] specialized according to and the matrix J is lower triangular with 1's on the main diagonal. Note that x(k + 1) is the state-vector and is stored from one step to the next, whilst the vector t plays a particular role as t(k + 1) is independent of t(k) (it is here defined as the vector of intermediary variables). The particular structure of J allows the expression of how the computations are decomposed with intermediates results that could be reused.

Remark 8.
In that sense, the SIF can be seen as an extension of the factored state-space representation (FSSR) proposed by Roberts and Mullis [18] as Indeed, the factored expression can be rewritten by decomposing the computations M 0 w and introducing intermediate vector (and left term) So, the left term of the implicit state space (18) can represent factored state space. But it could also represent not only linear but also affine expression like v = M 1 (M 0 w + n 0 ) + n 1 and more. In fact, all the algorithms with additions, shifts, and multiplication by a constant can be represented.
It is implicitly assumed throughout the paper that the computations associated with the realization (18) are executed in row order, giving the following algorithm: Note that in practice, steps (ii) and (iii) could be exchanged to reduce the computational delay. Also note that there is no need to compute J −1 because the computations are executed in row order and J is lower triangular with 1's on the main diagonal. Equation (18) is equivalent in infinite precision to the state This state-space system corresponds to a different parametrization than (18) (the finite-precision implementation of the state-space (A Z , B Z , C Z , D Z ) will cause different numerical deterioration than for (18)). The associated system transfer function H is given by A complete framework for the description of all digital controller implementations can be developed by using the following definitions. For further details, see [7].

Definition 9.
A realization of a transfer matrix H is entirely defined by the data Z, l, m, n, and p, where Z ∈ R (l+n+p)(l+n+m) is partitioned according to and l, m, n, and p are the matrix dimensions given previously.
The notation Z is introduced to make the further developments more compact (see (44), (70), etc.).

Equivalent Realizations.
In order to exploit the potential offered by the specialized implicit form in improving implementations, it is necessary to describe sets of equivalent system realizations. The Inclusion Principle introduced by Ikeda and Siljak [19] in the context of decentralized control, has been extended to the Specialized Implicit Form in order to characterize equivalent classes of realizations [7]. Although this extension gives the formal description of equivalent classes, it is of practical interest to consider only realizations with the same dimensions, where transformation from one realization to another is only a similarity transformation.

Proposition 10. Consider a realization Z 0 .
All the realizations Z 1 with and U, W , Y are nonsingular matrices, are equivalent to Z 0 , and share the same complexity (i.e., generically the same amount of computation).
It is also possible to just consider a subset of similarity transformations that preserve a particular structure, by adding specific constraints on U, W , or Y.
This will allow us to consider all the realizations Z with a given transfer function as input-output relationship and a given structure, and find the most suitable for the implementation.

Examples.
Here are some examples of structured realizations expressed with the SIF.

Cascaded State-Space.
The cascade form is a common realization for filter implementation. It generally has good FWL properties compared to the direct forms. For cascade form, the filter is decomposed into a number of lower order (usually first-and second-order) transfer function blocks connected in series. For the next example, we consider two standard q-operator state-space blocks connected in series as shown in Figure 1.
If two state-space realizations (A 1 , B 1 , C 1 , D 1 ) and (A 2 , B 2 , C 2 , D 2 ) are cascaded together, then it leads to the following realization The output of first block is computed in the intermediate variable and used as the input of the second block.
The main point is that if we consider the equivalent statespace realization, with parameters the parametrization is not the one used in the computations, and the FWL effects will not be the one of the implemented version.
Remark 11. The cascade structuration can be easily extended to a series of specialized implicit forms and to general multiple cascaded systems.

δ-Realizations.
Consider the δ-state-space realization with δ = (q − 1)/Δ, Δ ∈ R + * , and q is the shift operator [1,20,21]. This operator has been introduced as a unifying time operator, between discrete and continuous time. But it is used in practice for its interesting numerical properties in FWL context. This realization should be implemented with the following algorithm: where t is an intermediate variable. This could be modelled with the specialized implicit form as

ρ Direct-Form II Transposed (ρDFIIt).
Li et al. [22][23][24] have presented a new sparse structure called ρDFIIt. This is a generalization of the transposed direct-form II structure with the conventional shift and the δ-operator and is similar to that of [25]. It is a sparse realization (with 3n + 1 parameters when n is the order of the controller), leading so to an economic (few computations) implementation that could be very numerically efficient. As we will see later, this realization has n extra degrees of freedom that can be used to find an optimal realization within its particular structuration. Let us define where (γ i ) 1 i n and (Δ i > 0) 1 i n are two sets of constants. Let (a i ) 1 i n and (b i ) 0 i n be the coefficient sets of the transfer function, using the shift operator 6 EURASIP Journal on Advances in Signal Processing Therefore, h can be reparametrized with (α i ) 1 i n and (β i ) 0 i n as follows: Denoting v a¸⎛ the parameters (a i ) 1 i n , (b i ) 0 i n , (α i ) 1 i n , and (β i ) 0 i n are related [23] according to where κ¸ n i=1 Δ i and Ω ∈ R n+1×n+1 is a lower triangular matrix whose ith column is determined by the coefficients of the z-polynomial n j=i ρ j (z) for 1 i n and with Ω n+1,n+1 = 1.
Equation (34) can be, for example, implemented with a transposed direct form II (see Figure 2), and each operator ρ −1 i can be implemented as shown in Figure 3 (each ρ −1 k is obtained by cascading the (ρ −1 i ) 1 i k ). Clearly, when γ i = 0, Figure 2 is the conventional transposed direct form II. When γ i = 1, Δ i = Δ (1 i n), one gets the δ transposed direct form II. This form was first proposed as an unification for the shift-direct form II transposed and the δ-direct form II transposed. It is now used to exploit the n extradegrees of freedom given by the choice of the parameters (γ i ) 1 i n .
The corresponding algorithm is By introducing the intermediate variables needed to realize the ρ −1 i operator (according to the multiplication by Δ i done last, see Figure 3), the ρDFIIt can be rewritten as Within the SIF Framework, the ρDFIIt form is described by Remark 12. Thanks to the SIF, there is no need to use another operator unlike the shift operator.

Fixed-Point Implementation.
In this article, the notation (β, γ) is used for the fixed-point representation of a variable or coefficient (2's complement scheme), according to Figure 4. β is the total wordlength of the representation in bits, whereas γ is the wordlength of the fractional part (it determines the position of the binary-point). They are fixed for each variable (input, states, output) and each coefficient, and implicit (unlike the floating-point representation). β and γ will be suffixed by the variable/coefficient they refer to. These parameters could be scalars, vectors, or matrices, according to the variables they refer to. Let us suppose that the coefficients wordlength β Z is given (in FPGA or ASIC, it is of interest to consider the wordlength as optimization variables, in order to find hardware realizations that minimize hardware criteria like power consumption or surface, under certain numerical accuracy constraints, like L 2 -sensitivity ones [26]. This is not considered here). Then, the coefficient Z i j is represented in fixed point by (β Zij , γ Zij ) with where the a operation rounds a to the nearest integer less or equal to a (for positive numbers a is the integer part).
Remark 13. The binary point position is not defined for null coefficients; however, this is no problem because these coefficients will not be represented in the final algorithm (the null multiplications are removed). So, in order to consider coefficients that will be quantized without error, we introduced a weighting matrix δ Z such that The exactly implemented coefficients are 0 and the positive and negative powers of 2 (including ±1).

Remark 14.
In some specific computational cases the fixedpoint representation chosen for the coefficients is not always the best one as defined in (40). For example, in the Roundoff Before Multiplication scheme, some extraquantizations are added to the coefficients, in order to avoid shift operations after multiplications [2]. Only the classical case (corresponding to the Roundoff After Multiplication) is considered here, as defined by (40). Remark 15. It is also possible to choose any γ Zij such that γ Zij β Zij − 2 − log 2 |Z i j | (e.g., choose the same binarypoint position for all the the coefficients, given by the binarypoint position of the coefficient with highest magnitude). But in that case, the coefficients could be coded with less meaningful bits and have a higher relative error. When the ratio between the greatest and lowest magnitude is too high, then underflows occur for the lowest coefficients that cannot be represented. For example, this is common for the Direct Form realizations with high (or low) L 2 -gain.
During the quantization process, the coefficients are changed from Z into Z †¸Z + ΔZ. For a rounding quantization, the (ΔZ i, j ) are independent centered random variables uniformly distributed [27,28] within the ranges −2 −γZ ij −1 ΔZ i, j < 2 −γZ ij −1 , so their second-order moments are given by (exactly implemented coefficients are not changed by the quantization).

Sensitivity-Based Transfer Function Error.
As a consequence, the sensitivity of each coefficient should not be considered with the same weight, since there is no special reason for the (ΔZ i j ) to be all in the same range and share the same binary-point position. So it is interesting to evaluate how the transfer function is changed from H to H †¸H +ΔH by the coefficient quantization, rather than evaluate only its sensitivity.
By an extension of the SISO state-space definition given in [6], this degradation can be evaluated in a statistical way with the following definition.

Definition 16 (Sensitivity-Based Transfer Function Error). A measure of the transfer function error can be statistically defined by
Remark 17. This definition was introduced by Hinamoto et al. in [6], but under the assumption that the ΔZ i j all share the same variance. See Section 4.3.

EURASIP Journal on Advances in Signal Processing
The transfer function error is a tractable measure that can be evaluated with the two following propositions.

Proposition 18. The sensitivity-based transfer function error of a realization Z, with H as a transfer function, can be computed by
where

(i) δH/δZ ∈ R (l+n+p)×(l+n+m) is the transfer function sensitivity matrix (previously introduced in [7]) defined by
(iii) x 2 is the nearest power of 2 lower than |x|: Proof. A first-order approximation gives Hence, for all ω ∈ [0, 2π], Finally, considering (40) and (42) for nonnull coefficients, we get Remark 19. This proposition is the extension of Proposition 2 in [10] to the SIF and MIMO transfer function.

Proposition 20. The transfer function sensitivity ∂H/∂Z can be explicited by
where is the operator defined by Vec(·) is the classical operator that vectorizes a matrix, and H 1 and H 2 are defined by with The dimensions of M 1 , M 2 , N 1 , and N 2 are, respectively, n × (l + n + p), m × (l + n + p), (l + n + m) × n, and (l + n + m) × p. The transfer function sensitivity matrix δH/δZ can be computed by where E i, j is the matrix of appropriate size with all elements being 0 except the (i, j)th element which is unity.
The system H 1 E i, j H 2 can be seen as the following statespace system, so that Proposition 4 can be used in order to compute the L 2 -norm: Proof. The proof is based on the following lemma and can be found in [29].

Lemma 21.
Let X be a matrix in R p×l while G and H are two transfer matrices independent of X with values in C m×p and C l×n , respectively. Then, By expanding (23) in (24), and using Lemma 21, all the derivative ∂H/∂X with X ∈ {J, K, . . . , S} can be obtained and then gathered using (59) Equation (56) is quite straightforward and comes from the definition of the operator .

Remark 22.
In order to simplify the expressions, matrix extensions of log 2 , floor operator · , and power of 2 can be used. For example, if M ∈ R p×q , then log 2 (M) ∈ R p×q such as (log 2 (M)) i, j¸l og 2 (M i, j ). The binary-point positions of the coefficients can then be computed by where ½ Z represents the matrix with all coefficients set to 1 and with the same size than Z.
Also, the Ξ Z matrix is expressed by Remark 23. In the classical case where the wordlengths of the coefficients are all the same (equal to β), we can define a normalized transfer function error σ 2 ΔH by This measure is now independent of the wordlength and can be used for some comparisons. It can be computed by

Comparison with the Classical M L2
Measure. It is of interest to remark the relationship with the classical M L2 measure. In [6] where the transfer function error appears for the first time (applied on a SISO state-space system), the coefficients are supposed to have the same fixed-point representation, so their second-order moments (σ 2 Zij ) are all equal and denoted σ 2 0 . So, in that case, the M L2 satisfies Here, the transfer function error σ 2 ΔH can be seen as an extension of the M L2 measure with fixed-point considerations. The sensitivity is weighted according to the variance of the quantization noise of each coefficient. More details in that comparison can be found in [8].

Sensitivity-Based Pole Error
The same considerations applies to the poles. It is interesting to evaluate how the pole moduli are changed from |λ k | to |λ k | †¸| λ k | + Δ|λ k | by the coefficient quantization.
In the same way as in Definition 16, the degradation can be evaluated in a stochastic way.
Definition 24 (Sensitivity-Based Pole Error). The sensitivitybased pole error is defined by where σ 2 Δ|λk | is the second-order moment of the random variable Δ|λ k | This measure is tractable thanks to the two following propositions.

Proposition 25. It can be computed with
where Ξ Z is the matrix already defined in (46).

Proof. A first-order approximation gives
So, since the (ΔZ i j ) are indepedent centered random variables.

Proposition 26.
The pole sensitivity, with respect to the coefficients, can be computed by

EURASIP Journal on Advances in Signal Processing
where (x k ) 1 k n are the right eigenvectors corresponding to the eigenvalues (λ k ) 1 k n and (y k ) 1 k n the column vector of the matrix M y = (y 1 y 2 · · · y n ) defined by M y¸M − x , with M x¸( x 1 x 2 · · · x n ). M 1 and N 1 are the matrices previously defined in (55).
Proof. The proof is based on the following lemmas, proved in [1,14]. Lemma 27. Let V 0 , V 1 , and V 2 be constant matrices of appropriate dimension.
This lemma can be applied to J, K, L, . . ., S, and gives Then, the pole sensitivity matrix ∂|λ k |/∂A can be finally computed with the following lemma.

Lemma 28. The derivative of the eigenvalues (and their moduli) of a given matrix with respect to that matrix is given by
Remark 29. Roughly similar to Remark 23, it is also possible to normalize the sensitivity-based pole error in the common case where the coefficients have all the same wordlength (equal to β). We can define a normalized pole error σ 2 Δ|λ| by This measure is now independent of the wordlength and can be used for some comparisons. It could be computed by

Extension to the Closed-Loop Control
In previous sections, the filtering problems were considered, and the open-loop contexts were implicitly taken into account. In this section, we extend previous results to closedloop case, where a filter (denoted here as controller) is controlling a plant in a feedback scheme. The problem has an important practical interest in the context of robust control theory [30], when considering the model uncertainties of the process or even of the controller in the sense of FWL implementation [1].
Let us consider a plant P (defined by its transfer function or equivalently by a state-space relationship) controlled by a controller C in a standard form [30], as shown in Figure 5. w(k) ∈ R p1 and z(k) ∈ R m1 are the exogenous p 1 inputs and m 1 outputs (to control), whereas u(k) ∈ R p2 and y(k) ∈ R m2 are the p 2 control and m 2 measure signals, respectively.
The controller is realized in the SIF form (see (18)), with l, m 2 , n, and p 2 as intermediate variable, input, state and output dimensions, respectively.
Unlike open-loop context, the whole system S is here considered, with w(k) and z(k) as inputs and outputs, respectively. Its transfer function is given by with A Z ∈ R nP +n×nP +n , B Z ∈ R nP +n×p1 , C Z ∈ R m1×nP +n , D Z ∈ R m1×p1 and ⎞ ⎠ , The closed-loop poles of the system, denoted (λ k ) 1 k n+nP , are the eigenvalues of the matrix A Z . Their moduli indicate directly the stability of the closed-loop system.
In order to evaluate the closed-loop transfer function degradation or the pole moduli deviation, the two closedloop measures are used, as a natural extension to the openloop case.
Definition 30 (Closed-Loop Sensitivity-Based Error). A measure of the closed-loop sensitivity-based transfer function error can be statistically defined by The closed-loop sensitivity-based pole error is defined by They can be computed with Proposition 31.

Proposition 31. The closed-loop transfer function error is given by
where δH/δZ is obtained from the closed-loop transfer function sensitivity ∂H/∂Z given by In the same way, the sensitivity-based closed-loop pole error ∂|λ k |/∂Z is given by with U, Y, and W some invertible diagonal matrices. So Remark 32. This is similar to (26), but here U, Y, and W are diagonal. This only implies scaling.
Then, the operator · 2 satisfies and hence By remarking that the similarity on Z 0 changes the transfer function H 1 and H 2 in it comes that the sensitivity transfer function is changed in and then Now we can remark that Φ i j ∈ {1, 2, 4} and Φ i j = 1 if the power of 2 are used for the scaling. Also a 2 /a = 1 if a is a power of 2.
The same proof can be applied on the pole error since

Optimal Problem.
Even if it is not the main goal of this paper, it is now possible to consider optimal realization, according to a FWL criterion. Let J be a given criterion (it could be sensitivity-based transfer function error, pole error, or a combination of these two criteria), then the problem consists of finding the optimal realization that minimizes J or equivalently finding the optimal coordinate transform (U, Y, W ) that transform a given realization, that is, According to Proposition 33, J is invariant to powerof-2 scaling, and this optimization problem has an infinite number of solutions. Thus, it could be of interest to normalize all the coordinate transforms with regards to an extra consideration. For example, this could be a L 2 -scaling constraint, even if it is not necessary here.
The idea is to define and set the binary-point position of the states and the intermediate variables [8]. This gives us a bound on the L 2 -gain of the transfer functions from the input u to the states x and intermediate variables t, respectively. One possible constraint is to ensure that This relaxed L 2 -constraints were proposed in [32] as an extension of the strict L 2 -scaling, that still prevents the implementation from overflow. Any other successive power of 2 can be used for the boundaries. The inequalities (96) can also be expressed with the controllability Gramian W c of the realization.
With that normalization, the optimal problem is now a constrained optimization problem. One way to deal with it is to normalize each coordinate transform (U, Y, W ) before applying it. More details can be found in [8].
Since the sensitivity-based transfer function error σ 2 ΔH and pole error σ 2 Δ|λ| measures are nonsmooth, this optimization problem can be solved with a global optimization method such as the Adaptive Simulated Algorithm (ASA) [33,34]. A gradient-base method such as the quasi-Newton algorithm leads to local optima and are not used here.
The FWR Toolbox (sources available at http:// fwrtoolbox.gforge.inria.fr) was used for the numerical examples, and few minutes of computation were here required on a desktop computer.

Numerical Examples.
Let us consider the filter with coefficients given by the Matlab command ÙØØ Ö´ ¸¼º½¾ µ.
We are considering, in order to compare them, some equivalent (in infinite precision) realizations described below. The values of the measures are shown in Table 2  Z 2 : the balanced realization (it is often considered as a good realization. The work in [1] shows that the balanced realizations minimizes the L 1 /L 2 sensitivity measure).
Even if the goal of this paper is not multiobjective optimal realization, it is interesting to look for a realization that is good enough for the two measures. One possibility is to consider the following tradeoff criterion: where ( σ 2 ΔH ) opt and ( σ 2 Δ|λ| ) opt are the optimum values obtained for σ 2 ΔH and σ 2 Δ|λ| in realization Z 3 and Z 4 , respectively. Z 5 : the J 1 -optimal realization. With this measure, we aim to have a realization that simultaneously has low transfer function error and low pole error.

ρDirect Form II Transposed
These different results could be compared to the a posteriori shift of the poles and transfer function, as presented in Table 3. It depends of course on how far the coefficients are from the closest fixed-point number, the round-off mode, the wordlengths, and the sensitivities. The wordlengths used are 16, 12, and 8 bits. However, 8 bits are not enough to preserve the stability of Z 1 .
The realizations Z 5 and Z 9 exhibit the lowest transfer function and pole error estimated from the sensitivities. Their 16-bit fixed-point implementations are given by Algorithms 1 and 2, respectively. Table 3 confirms that minimizing the sensitivity-based transfer function and pole errors minimizes the probability to have the shift of the poles and transfer function to be greater than a given bound. The unpredictable part of the deterioration comes from the coefficient shift (how far the coefficients are from the closest fixed-point number), and only stochastic approach can be used to evaluate it. Since the direct shift of poles and transfer function ( h − h † 2 and |λ k | − |λ † k | ) cannot be used in optimization (it is an a posteriori measure that requires the final hardware/software implementation to be evaluated), the sensitivity-based transfer function and pole errors σ 2 ΔH and σ 2 Δ|λ| exhibited here are important measures to evaluate the FWL deterioration.

Conclusion
After presenting the classical sensitivity analysis for the finite precision implementation of linear filters or controllers, the paper has shown that its use sometimes leads to erroneous conclusion, as it does not take into consideration the exact fixed-point representation of the coefficients. So, poles and input-output errors are better indicators.