Diagonal Kernel Point Estimation of n th-Order Discrete Volterra-Wiener Systems

The estimation of diagonal elements of a Wiener model kernel is a well-known problem. The new operators and notations proposed here aim at the implementation of e ﬃ cient and accurate nonparametric algorithms for the identiﬁcation of diagonal points. The formulas presented here allow a direct implementation of Wiener kernel identiﬁcation up to the n th order. Their e ﬃ ciency is demonstrated by simulations conducted on discrete Volterra systems up to ﬁfth order


INTRODUCTION
Among the identification techniques based on input-output correlations, the one proposed by Lee and Schetzen [1] is the most widely adopted due to its versatility, even if more recent techniques and up-to-date insights on these arguments can be found in [2] and more references in [3].The application of the Lee-Schetzen technique on discrete nonlinear systems is straightforward and also gains some validity advantages versus the continuous time version, as stated rigorously in [4] and in [5].In [6], the authors describe some characteristic behaviors of the Lee-Schetzen method for discrete systems and propose practical suggestions on its use.
The estimation of diagonal elements of a Wiener model kernel is a well-known problem.Such problem can be found documented in [6,7,8].It arises from the higher estimation error variance exhibited by the estimation process of the kernel points having at least two equal coordinates.In [6], some explanations for this phenomenon, which augments increasing the number of equal coordinates, are given.The original Lee-Schetzen identification technique was particularly subject to this kind of errors.Goussard et al., in [9], made a ma-jor contribution to the solution of the diagonal point estimation problem, although their work contains explicit solutions and proofs only up to the third order.
Koukoulas and Kalouptsidis, in [10], using the results on the calculation of cumulants due to the work of Leonov and Shiryaev [11], proposed a proof of the nth-order case valid also for inputs drawn from nonwhite Gaussian distributions.In the white Gaussian input case, the general formulas in [10] can be shown to reduce to Goussard's method.Other formulas using cumulants to estimate Wiener kernels directly have been proposed in [12].Unfortunately, no implementation problems or any simulated efficiency tests were considered in [10,12] because they were not among the purposes of the authors.
In this paper, we propose alternative formulas for the identification of nth-order Wiener kernels in the case of white Gaussian inputs, which avoid the explicit use of cumulants and are a useful shortcut to the proof of Goussard's method for higher orders.Moreover, the proposed formulas constitute an efficient way for the automatic generation of algorithm code for every order kernel identification, whereas the writing of efficient computer code is a very difficult task as the kernel order increases.Some results on implementation tests are supplied to show the efficiency of the proposed method.

THE LEE-SCHETZEN METHOD
The Volterra series constitute a model for systems which yield generalized Taylor series expansions [1].Under appropriate system class requirements [1,2,4,13,14,15,16,17,18], the input/output relationship for a discrete-time causal timeinvariant nonlinear system can be expressed as To enhance model convergence and to allow identification by Lee-Schetzen method, the series (1) must be rearranged in terms of nonhomogeneous G operators [1,2].An operator is said to be a Wiener G operator if it satisfies the following definitions and conditions [1,2]: is the Wiener kernel of pth order; H m is a homogeneous mth-order Volterra operator, defined as where x(n) must be a zero-mean Gaussian white process (i.e., an independent identically distributed (i.i.d.) sequence from a zero-mean Gaussian distribution) with E{x(n)x(n + t)} = Aδ(t), where E{•} is the statistical expectation operator, δ(t) is the unitary impulse sequence, and A is the second-order moment of the input x.The Lee-Schetzen method for nondiagonal point estimation of a pth-order Wiener kernel is described by [1,6]: (5) For the diagonal point case, a more complicated form is needed to account for the lower-order kernel contributions.The exact expressions for the second-and third-order Wiener kernels are [ where δ σiσj δ(σ i − σ j ) is the unitary impulse sequence delayed by σ i − σ j .For higher orders, this kind of explicit expression becomes unwieldy, due to the great number of correction terms in the diagonal point case.To overcome this difficulty, Lee and Schetzen [1] proposed the general identification formula [1,6] where G m [k m ; x(n)] is the mth G-functional of the white Gaussian input x(n) [1,2,6].Unfortunately, this way of proceeding results in poor performances of the identification algorithm.In a practical situation, the limitations due to the finite length of input sequences and the departure from ideal statistical properties bias the identification procedure.In the implementation of ( 7), the identification errors of every point of the lower-order identified kernels are summed up by the G m operator and they all contribute to the output error.On the contrary, only pointwise lower-order kernel errors affect expressions like (6).Indeed, we found that the development of nth-order compact expressions of the form (6) leads to some implementation advantages, while the numerical results remain the same with respect to the method of Goussard et al. in [9] which featured a similar kind of improvement of the original Lee-Schetzen method.

EFFICIENT nTH-ORDER FORMULAS FOR THE IDENTIFICATION OF DIAGONAL POINTS
In the major literature concerning the identification of Volterra systems, the examples supplied often do not exceed the third order.This is due to the fact that the identification algorithms become very cumbersome for higher orders.To extend the identification algorithms to higher orders in an easy way, we introduced new notations and operators which permit to handle, in a short and recursive form, the complicated expressions involved by algorithm generation.Actually, a manual generation of the code may be a very tedious and difficult task still for relatively low-order problems.

Preliminaries
Let M be a set of m distinct naturals, Q ⊆ M, and q = |Q| and m = |M| the cardinalities of Q and M, respectively.If P(M) is the power set of M (i.e., the set of all subsets of M) and M is the set of all n-tuples of formal variables of integer values, a relationship between the elements of P(M) and M follows: such that σ(Q) = (σ i1 , . . ., σ iq ) ∈ M, where Furthermore, it will come in handy to define σ M (Q) = σ(M −Q) as the function σ applied to the complementary set of Q with respect to M. Also, it will be necessary to generalize the definition of a qth-order Wiener kernel in the following way: with Q = ∅ and k(σ(∅)) k 0 , where k 0 is the Wiener zeroth-order kernel.Moreover, we define with We now give a definition analogous to that given by Schetzen for the homonym operator in [1].For our purposes, the operator will be redefined as In [1], Schetzen reported that when x(n) is a stationary zero-mean jointly Gaussian random sequence for odd q and it is equal to the sum of products of factors E{x(n − σ i )x(n − σ j )} with i, j ∈ Q for even q, resulting from all completely distinct ways of partitioning the set {x(n−σ h ) : h ∈ Q} into pairs.If x(n) is white Gaussian, under the ergodicity hypothesis, it holds that E{x(n − σ i )x(n − σ j )} = Aδ σiσj .
In particular, for q = 0, we have (∅) = 1, and for q = 2 and q = 4, we have, respectively, A new operator Π will now be introduced as where r = m q , Q ⊆ M, Q i are all the subsets generated by the combinations of q elements chosen from M, and f is a symmetrical mapping with respect to σ(Q).In particular, it holds that The properties ( 14), (15), and ( 16) are trivially verified using definitions ( 13) and ( 11).

Formulas for mth-order Wiener kernel estimation
From the above definitions, we have derived the following general formulas for the mth-order kernel estimates: from which where (•) denotes the integer part of (•).The formulas just presented allow the mth-order Wiener kernel to be identified.Note that for m = 2, 3 they reduce to (6).In the diagonal points, the estimation will be improved with respect to the classical Schetzen technique referred to here by (7).It must be noted that a real improvement is obtained only when the expectations are assessed by averages on finite-length sequences, as it is unavoidable in practice.A proof for ( 18) can be found in Appendix A.

Explicit generalization of Goussard's method to mth order
As previously pointed out, an improvement in the estimation of diagonal elements was also obtained by Goussard et al. [9].Although they proposed a method for the improvement of the diagonal points estimation, which is in principle analogous to the idea which resides behind the development of ( 17) and (18), in [9] they demonstrated only the expressions up to the third order.Actually, we aimed at the generalization of those formulas and proofs for higher orders in a compact and handy way.
It can be proved (see Appendix B) that the mth-order version of the original Goussard formulas assumes the following form: where the operator Ψ is defined as We also propose a recursive form of formula (21) which can be useful for generating the code which computes Ψ for higher orders: The preceding formula can also be given in a more compact implicit form: 20), ( 21), ( 22), and (23) has been supplied in Appendix B.
Interesting higher-order formulas for identification in a nonparametric approach can also be found in [10] or referenced in [2], where they are based on cross-cumulants rather than crossmoments.These formulas generalize the identification method avoiding an explicit Wiener-to-Volterra series conversion and they hold also for nonwhite Gaussian inputs.If the input is white, they simplify in a form equivalent to (19).Actually, the use of ( 19) and ( 21) can be found to be equivalent to the formula using the cumulant definitions in the white Gaussian case.
In [12] and references therein, a useful formula can be found which directly relates Wiener kernels to cumulants.The use of ( 18), after some manipulations, is equivalent to the formulation proposed in [12].The computation of the joint cumulants of mth-order requires, in principle, the knowledge of all the joint moments up to mth-order [2].In (18), only the mth moment is needed because the lowerorder moments are implicitly stored in lower-order previously estimated kernels.So the notations and formulas proposed here constitute mainly a handy tool for the straightforward implementation of cumulant calculus in the particular case of white Gaussian input.The implementation efficiency of (18) resides in the way the storing of lower-order moments is accomplished by accounting for similar terms generated by the symmetry properties of the lower-order moments (or cumulants themselves).
The main differences between the method related to (18) and the method of ( 19) and (21) reside in the application point of view: while the first needs the storage of lower-order kernels, the plain implementation of (19) permits to identify any kernel without knowing the others.This second technique obviously causes additional computation time in the complete estimation of a model, as will be shown in the next section.
The use of (18) gives also the most efficient way to access the lower-order moments needed by a smart implementation of ( 19) and (21).In [9], those general-order implementation issues were not covered.

IMPLEMENTATION TESTS
In the following, the results obtained by the implementation of (18) (which will be referred to as the straight method) are compared with the ones obtained by the formulas of Schetzen [1] (which will be referred to as the classic method1 ) and the ones by Goussard et al. [9] (referred to here as Goussard's method), which, in Section 3.3, have been extended to higher orders explicitly.The formulas have been tested identifying 100 discrete Volterra systems of the fifth order.For a significant implementation test, we needed a quite general set of test systems.The most general Volterra discrete causal system could have been created drawing the values of the kernels from a Gaussian distribution.Here, for the sake of simplicity of the implementation and of the exposition, only the constituent FIR filter taps have been drawn from a Gaussian distribution (independent from the input sequences).Indeed, the nth-order kernel was constituted by the cascade of an FIR filter and an nth-power nonlinear block.The system memory length for each order results from the ten taps of the FIR kernel generators.Besides this restriction, we retain that the test so conducted still maintained enough generality.
It must be noted that the results coming from the straight method and Goussard's one differed only by round-off errors, so in Table 1 and Figure 1, only the results for the straight method will be reported but they hold for Goussard's method as well.Besides, the two methods differ considerably in computing times: Table 2 shows that the straight method is faster than Goussard's method (computation times are almost four times shorter for the fifth-order case).This happens because the straight method avoids some redundant computation of the moments of the input and output vectors by trading it for the storage of lower-order kernel values.The first test for the estimation efficiency has been performed using ten white Gaussian inputs of 10 5 samples for each of the 100 systems.The results of this test are shown in Table 1.Each cell of the table reports two values: the first one refers to one input of 10 5 samples.The second one is the value obtained with a mean on ten kernel identifications, with ten independent input sequences of the same length.Under the assumption of the ergodicity of the identification process, this procedure corresponds to a single experiment with an input length ten times longer than the first one.When the value of the desired kernel is nearly zero, the relative error tends to infinity.As a consequence, we established an arbitrary threshold for the relative error value equal to 10.Only the points with a relative error under threshold are consid-ered as valid points.Table 1 shows the percentage of valid points.
Results show, for all the methods, an improvement of identification accuracy as the input length increases.In the classic method, such improvement is less than in the straight one.This first test was a consistency test for the algorithms.
In a subsequent simulation, the percentage of acceptance of kernel points has been calculated increasing further the number of the input signals for only one of the test systems arbitrarily chosen.Figure 1 shows such results, evidencing how the straight method works better than the classic one.The off-diagonal estimates have been reported both in Table 1 and in Figure 1 as a reference, because they represent the best case, which is equivalent for all the methods considered so far.

CONCLUSION
The formulas proposed here with the use of well-suited notations permit to handle, in an efficient way, the nth-order identification of Wiener kernels.The proof of the formulas has been supplied and simulation has demonstrated their efficiency with respect to the classic Lee-Schetzen method.
The alternate method proposed here and referred to as the straight method has been shown to be considerably faster than previous improvements of the Lee-Schetzen method known in literature [9], especially as the order and the size of the kernels increase.

APPENDICES A. PROOF OF FORMULA (17)
We have to prove that when x(n) is a sequence of white Gaussian random variables (an i.i.d.Gaussian process), it holds that When the Wiener series expansion exists, we can write If we multiply the left and right members of (A.2) by D[x(n); σ(M)] and apply the expectation operator, then, exploiting the orthogonality of the G and D operators defined in ( 2) and (10) (it can be easily proved that D operators are a particular case of G operators [1,9]), it holds that From the properties of the expectation operator and of the operators G and D, it follows that in the sum of (A.3) for even (odd) m, the terms with indices h odd (even) are identically zero, then (A.3) can be simplified as follows: So (A.1) holds if the validity of the following can be verified: To prove (A.5), we have to consider the explicit general expression of a G operator in the discrete-time case [1,2]: (A.6) Using (A.6) and the definition of the operator and letting p = m − 2h, (A.5) can be simplified as with Now consider the general expression of a term of the sum over s in the first member of (A.7): We will show hereafter how the terms deriving from the expansion of (A.9) can be grouped in p/2 term typologies.We define as ν-type term the following expression: Note that in the expression (A.10), there are ν pairs of identical formal variables in the argument of k p and a corresponding number ν of sum operators, which explains the choice of the ν-type term name.The term (A.10) collects a group of addenda of (A.9), as we are going to point out next.
Recalling that an expression like E{x(n − σ i )x(n − σ j )} generates the sequence δ(σ i − σ j ), consider one of the products of unity pulse δ-sequences deriving from the complete expansion of (A.9): with N chosen such that 0 ≤ N ≤ (p − 2s)/2 .The term (A.11) is compounded by two types of factors: we will dub homogeneous those factors of the form δ τiτj and mixed those factors of the form δ τiσj , for all i, j ∈ N. Note that the product between the first part of (A.9), and a homogeneous factor δ τiτj collapses the two sums τi and τj into one, and at the same time makes indistinguishable the arguments of k p in the ith and jth positions.On the other hand, the product between (A.12) and a mixed factor δ τiσj cancels the sum τi and substitutes τ i with σ j in the ith position of the argument of k p .Using the sifting property of the δ-sequences and letting ξ s+i = τ i , from the product between (A.12) and (A.11), we obtain the following simplified expression of a group of addenda of (A.9): The expression (A.13) constitutes a part of a ν-type term like (A.10) with ν = s + N, and it is easy to show that all the terms obtained from (τ 1 , . . ., τ p−2s , σ 1 , . . ., σ m ), having their first p − 2s factors in common with the term (A.11), can be collected by the following expression: Hence, the term of type s collects all the addenda of the complete expansion of (A.9) which have in common the following p − 2s factors: Now, it can be observed that in the expansion of (A.9), there are other terms of the same kind of (A.15) which have the expression in common, but the argument of k p different for a permutation of the group of variables Using the symmetry hypothesis3 on k p , those terms become similar to (A.15).Hence we now aim at obtaining the coefficient to be multiplied by (A.15) which accounts for all those similar terms.This coefficient is actually the number of completely distinct permutations, in the sense of the definition of , among the P = (p − 2s)!/2 N permutations of the group of variables (A.18) with N pairs of repeated elements.Indeed, note that a position exchange of the variable σ from ith to jth position in the argument of k p corresponds to a distinct permutation.In fact, that position exchange derives from two distinct factor products of (•) which differ at least in the mixed factors δ στi and δ στj .On the other hand, a position exchange between the variable pairs (τ i , τ i ) and (τ j , τ j ) corresponds to a change of the order of factors in product terms.The product terms, where the homogeneous factors δ τiτi and δ τj τj differ only in position, will be indistinguishable for . N being the number of pairs of the τ variables in the argument of k p , for each of the allowed P permutations, there will be a group of N! indistinguishable corresponding permutations.So, in the expansion of (A.9), the number of indistinguishable terms from the term (A.15), due to the symmetry of k p , will be equal to The first subscript of U denotes the type of the term to which the coefficient is associated and the second is the index of the outer sum of (A.7).Up to this point, we have focused our attention on the fact that, s, N, and the n-tuple (σ 1 , σ 2 , . . ., σ p−2(s+N) ) being chosen, the term (A.15) is a representative of U s+N,s similar terms of (A.9).Now, we observe that, for the symmetry of k p and the definition of (τ 1 , . . ., σ m ), we have N C = m p−2(s+N) equivalence classes which have a term like (A.15) as a representative.Those N C classes constitute the quotient set of the terms of (A.9) under the symmetry of k p and the distinguishability rules of .Actually, each equivalence class corresponds to an unordered choice of (p − 2(s + N)) from a total of m σ-variables.
Henceforth, the definition (13) of the Π operator comes in handy to define the term: (A.20) This term collects all the representatives of the equivalence classes we can obtain from the set of terms of (A.9) for a certain choice of s and N. Now, by the previous arguments and definitions, it is straightforward to rewrite (A.7) in the following equivalent form: For the validity of (A.21), it suffices to verify p/2 + 1 equations, the first of which is and it is trivially verified as C 0 = 1/ p! and U 0,0 = p!.The remaining p/2 equations will be verified if it holds that Finally, (A.22) and (A.23) imply (A.21) which is equivalent to (A.7) and (A.5).The validity of (A.5), for all m, h ∈ N, h ≤ m/2 , implies (A.1).

B. PROOF OF FORMULAS (19), (20), (21), (22), AND (23)
Exploiting ( 18), we will prove that formulas (19), (20), ( 21), (22), and (23) are valid for every finite set of distinct naturals M with cardinality m.With the use of ( 18), the verification of ( 19) is equivalent to the verification of the following equation: We have to show that the definition of the Ψ operator given in (20), ( 21), (22), and (23) implies (B.1).This will be demonstrated using induction separately for the odd and even m cases.If we let the induction index equal to ν = m/2 + 1, then the cases ν = 1, 2 correspond to m = 0, 2 in the even m case and to m = 1, 3 in the odd m case.The cases with m = 0, 1, 2, 3 are verified by (18) or, alternatively, can be found proved in [9] or [10] in a different way.Hence, we consider m = 2ν − 2 for the even case and m = 2ν − 1 for the odd case, and suppose that (22) satisfies (19) when the induction index is ν − 1.For m > 3 and for 1 ≤ h ≤ m/2 , this is equivalent to supposing the following equation valid: Using (B.2), (B.1) can be rewritten as Further, for the properties of the expectation and the Π operators, (B.3) can be rewritten as follows: Due to the arbitrary choice of y(n), ( 22) and ( 23) guarantee a recursive definition of Ψ which is a solution for the ν-index case of the induction.So ( 22) or ( 23) is a solution for (B.1).
It is left to prove that (21) fits formulas ( 22) and (23), so (21) will be the explicit operative solution for (19).Exploiting ( 21) and ( 14) in the right member of (23), we get From the definition of the Π operator and from the property proved in Appendix C, it is easy to derive the rules that allow to rewrite (B.5) in this way: After collecting similar terms in (B.6) (i.e., the terms with equal + h), it can be stated that the validity of the following equation:

C. A PROPERTY OF THE Π OPERATOR
Let M be a finite set of positive distinct integers and R, Q such that R ⊆ Q ⊆ M with cardinalities m, r, and q, respectively, m, r, and q must be all odd or all even integers.Let f be a symmetrical mapping with respect to the argument σ(R).
Under this hypothesis, it holds that To prove this, let S be a mapping which associates a sum of terms with the set of the terms being summed.It must be observed that the first and the second member of (C.1) are actually made of sums of terms.So we can associate two sets to the sums in the left and right members of (C.1) in this way: Now, the proof of (C.1) can be made by proving that (1) for all a ∈ A, there exists b ∈ B such that b ≡ a; (2) for all b ∈ B, there exists We consider first the item (1).Using definition (13) of Π , the left and the right member of (C.1) can be made more explicit (it could be done also in the former definitions of A and B): Q j being a combination of q elements chosen from m elements of M, R i a combination of r elements chosen from q elements of Q j , and R h a combination of r elements chosen from m elements of M. We also need the definition of the sets of addenda associated to a particular choice of R i , Q j and R h in this way: Then, noting that for every i, j allowed by (C.4), R i is a combination of elements of Q j , Q j ⊆ M implies that R i is also a combination of elements of M. Hence there exists at least one h (among the ones allowed by (C.5)) such that R h = R i .With these arguments, it can be said that From the preceding expression and from (C.7), it trivially follows that A i j ⊆ B h , and then using (C.6), it follows that, for all a i j ∈ A i j , there exists b h ∈ B h such that a i j = b h .Item (1) has been proved.Now we will focus on item (2).If we choose a set R h allowed by (C.7) and the corresponding B h , an arbitrary element b ∈ B h would be described by the following expression: (C.9) It must be observed that the term b can be generated only by the inner sum of (C.4).In particular, it is generated only when Q j = I ∪ R h .Q j is an allowed choice of q elements among the m elements of M, and it also holds that I = M − Q j .From this, it follows that in the expansion of (C.4), there exists only one group of addenda as follows: This group, by the definition of , will surely contain once the addend (C.8).So we showed that for all b ∈ B and for all partitions of the factors of b in two groups of (q − r)/2 and of (m − q)/2 elements, there exists a choice for Q j which guarantees that there exists one and only one element of A which is congruent with b, and so A b = ∅.Moreover, |A b | is equal to the number of all possible such partitions of the factors of b.This number is obviously equal to the number of permutations of (m − r)/2 elements with the repetition of two elements (q − r)/2 and (m − q)/2 times, respectively.Hence we get |A b | = ((m − r)/2)!/((q − r)/2)!((m − q)/2)!.This proves item (2).

Figure 1 :
Figure 1: Percentage of valid points (see footnote a) versus number (or length) of input signals for only one of the test systems.An abscissa unit corresponds to 10 5 independent input samples.(a) 2nd order, (b) 3rd order, (c) 4th order, and (d) 5th order.
r−1 σi p−r , (C.8) with {i 1 , . . ., i m−r } = (M − R h ).Note that to every factor δ, a pair of subscripts is associated.The number of subscript pairs for the term b is equal to |M − R h |/2 = (m − r)/2.Now we choose a two-set partition of the factors of b, with (q − r)/2 and (m − q)/2 elements, respectively.To the two-set of factors just obtained will be associated the two corresponding sets having, as elements, the indices of the σ-variables in the subscripts I = {i 1 , . . ., i q−r }, I = {i 1 , . . ., i m−q }, I ∪ I = (M − R h ), and I ∩ I = ∅.Now we will pick up only the part of b having the factors belonging to the indices set I to form the term b :

Table 1 :
Mean values, over 100 independent systems, of percentage of kernel points which have an identification relative error less than threshold (these points are referred to as valid points 2 ).Simulations with 1 input (left) and 10 inputs (right).Every input is a 10 5 sample sequence from a zero-mean independent white Gaussian distribution.

Table 2 :
Mean computation time (seconds) over 10 identifications of a test system versus model order, for each of the three methods implemented.
(11)bviously holds that A = i, j A i j and B = h B h .If now we consider two sets of distinct positive integers α and β, exploiting the properties deriving from definition(11)of , it is easy to prove that h .(C.6)