Performance Analysis of Adaptive Volterra Filters in the Finite-Alphabet Input Case

This paper deals with the analysis of adaptive Volterra ﬁlters, driven by the LMS algorithm, in the ﬁnite-alphabet inputs case. A tailored approach for the input context is presented and used to analyze the behavior of this nonlinear adaptive ﬁlter. Complete and rigorous mean square analysis is provided without any constraining independence assumption. Exact transient and steady-state performances expressed in terms of critical step size, rate of transient decrease, optimal step size, excess mean square error in stationary mode, and tracking nonstationarities are deduced.


INTRODUCTION
Adaptive systems have been extensively designed and implemented in the area of digital communications. In particular, nonlinear adaptive filters, such as adaptive Volterra filters, have been used to model nonlinear channels encountered in satellite communications applications [1,2]. The nonlinearity is essentially due to the high-power amplifier used in the transmission [3]. When dealing with land-mobile satellite systems, the channels are time varying and can be modeled by a general Mth-order Markovian model to describe these variations [4]. Hence, to take into account the effect of the amplifier's nonlinearity and channel variations, one can model the equivalent baseband channel by a time-varying Volterra filter. In this paper, we analyze the behavior and parameters tracking capabilities of adaptive Volterra filters, driven by the generic LMS algorithm.
In the literature, convergence analysis of adaptive Volterra filters is generally carried out for small adaptation step size [5]. In addition, a Gaussian input assumption is used in order to take advantage of the Price theorem results. However, from a practical viewpoint, to maximize the rate of convergence or to determine the critical step size, one needs a theory that is valid for large adaptation step size range. To the best knowledge of the authors, no such exact theory exists for adaptive Volterra filters. It is important to note that the so-called independence assumption, well known of being a crude approximation for large step size range, is behind all available results [6].
The purpose of this paper is to provide an approach tailored for the finite-alphabet input case. This situation is frequently encountered in many digital transmission systems. In fact, we develop an exact convergence analysis of adaptive Volterra filters, governed by the LMS algorithm. The proposed analysis, pertaining to the large step size case, is derived without any independence assumption. Exact transient and steady-state performances, that is, critical step size, rate of transient decrease, optimal step size, excess mean square error (EMSE), and tracking capability, are provided.
The paper is organized as follows. In the second section, we provide the needed background for the analysis of adaptive Volterra filters. In the third section, we present the signal input model. In the fourth section, we develop the proposed approach to analyze the adaptive Volterra filter. Finally, the fifth section presents some simulation results to validate the proposed approach.

BACKGROUND
The FIR Volterra filter's output may be characterized by a truncated Volterra series consisting of q convolutional terms. The baseband model of the nonlinear time-varying channel is described as follows: where x k is the input signal, and n k is the observation noise, assumed to be i.i.d and zero mean. In the above equation, q is the Volterra filter order, L is the memory length of the filter, and f k m (i 1 , . . . , i m ) is a complex number, referred to as the mth-order Volterra kernel. This latter complex number may be a time-varying parameter.
The Volterra observation vector X k is defined by where only one permutation of each product x i1 x i2 · · · x im appears in X k . It is well known [7] that the dimension of the Volterra observation vector is β = q m=1 L+m−1 m . The input/output recursion, corresponding to the above model, can then be rewritten in the following linear form: where . . , f q k (L − 1, . . . , L − 1)] T is a vector containing all the Volterra kernels.
In this paper, we assume that the evolution of F k is governed by an Mth-order Markovian model where the Λ i (i = 1, . . . , M) are matrices which characterize the behavior of the channel. Ω k = [ω 1k , ω 2k , . . . , ω βk ] T is an unknown zero-mean process, which characterizes the nonstationarity of the channel. It is to be noted that process {Ω k } is independent of the input { X k } as well as the observation noise {n k }.
In this paper, we consider the identification problem of this time-varying nonlinear channel. To wit, an adaptive Volterra filter driven by the LMS algorithm is considered. This analysis is general, and therefore includes the stationary case, that is, Ω k = 0, as well as the linear case, that is, The coefficient update of the adaptive Volterra filter is given by where y e k is the output estimate, G k is the vector of (nonlinear) filter coefficients at time index k, µ is a positive step size, and (·) * stands for the complex conjugate operator. Moreover, we assume that the channel and the Volterra filter have the same length.
By considering the deviation vector V k , that is, the difference between the adaptive filter coefficients vector G k and the optimum parameters vector F k , that is, V k = G k − F k , the behavior of the adaptive filter and the channel variations can be usefully described by an augmented vector Φ k defined as From (3)-(6), it is readily seen that one can deduce that the dynamics of the augmented vector are described by the following linear time-varying recursion: where and I (β) is the identity matrix with dimension β.
Note that V k is deduced from Φ k by the following simple relationship: where 0 (l,m) is a zero matrix with l rows and m columns. The behavior of the adaptive filter can be described by the evolution of the mean square deviation (MSD) defined by where (·) H is the transpose of the complex conjugate of (·) and E(·) is the expectation operator. To evaluate the MSD, we must analyze the behavior of E(Φ k Φ H k ). Since Ω k and n k are zero mean and independent of X k and Φ k , the nonhomogeneous recursion between E(Φ k+1 Φ H k+1 ) and From the analysis of this recursion, all mean square performances in transient and in steady states of the adaptive Volterra filter can be deduced. However, (11) is hard to solve. In fact, since X k and X k−1 are sharing L − 1 components, they are dependent. Thus, C k and C k−1 are dependent, which means that Φ k and C k are dependent as well. Hence, (11) becomes difficult to solve. It is important to note that even when using the independence assumption between C k and Φ k , equation (11) is still hard to solve due to its structure.
In order to overcome these difficulties, Kronecker products are required. Indeed, after transforming the matrix Φ k Φ H k to an augmented vector, by applying the vec(·) linear operator, which transforms a matrix to an augmented vector, and by using some properties of tensorial algebra [8], that is, vec(ABC) = (C T ⊗ A) vec(B), as well as the commutativity between the expectation and the vec(·) operator, that is, vec(E(M)) = E(vec(M)), (11) becomes where ⊗ stands for the Kronecker product [8].
It is important to note that due to the difficulty of the analysis, few concrete results were obtained until now [9,10]. When the input signal is correlated, and even in the linear case, the analysis is usually carried out for a first-order Markov model and a small step size [11,12]. For a small step size, an independence assumption is made between C k and Φ k , which leads to a simplification of (12), Equation (13) becomes a linear equation, and can be solved easily. However, the obtained results which are based on the independence assumption, are valid only for small step sizes. The aim of this paper is to propose a valid approach to solve (12) for all step sizes, that is, from the range of small step sizes to the range of large step sizes, including the optimal and critical step sizes. To do so, we consider the case of baseband channel identification, where the input signal is a symbol sequence belonging to a finite-alphabet set.

Input signal model
In digital transmission contexts, when dealing with baseband channel identification, the input signal x k represents the transmitted symbols during a training phase. These symbols are known by the transmitter and by the receiver. The input signal belongs to a finite-alphabet set S = {a 1 2}, and a probability transition matrix P = 1/2 1/2 1/2 1/2 . This model for the transmitted signal is widely used, especially for the performance analysis of trellis-coded modulation techniques [13].
Consequently, the Volterra observation vector X k remains also in a finite-alphabet set with cardinality N = d L . Thus, the matrix C k , defined in (8) and which governs the adaptive filter, belongs also to a finitealphabet set where As a result, the matrix C k can be modeled as an irreducible discrete-time Markov chain {θ(k)} with finite state space {1, 2, . . . , N} and probability transition matrix P = [p i j ], such that By using the proposed model of the input signal, we will analyze the convergence of the adaptive filter in the next subsection.

Exact performance evaluation
The main idea used to tackle (11), in the finite-alphabet input case, is very simple. Since there are N possibilities for Ψ θ(k) , we may analyze the behavior of E(Φ k Φ H k ) through the following quantity, denoted by Q j (k), j = 1, . . . , N, and defined by where 1 (θ(k)= j) stands for the indicator function, which is equal to 1 if θ(k) = j and is equal to 0 otherwise. It is interesting to recall that at time k, Ψ θ(k) can have only one value among the N possibilities, which means that N j=1 1 (θ(k)= j) = 1.
From the last equation, it is easy to establish the relationship between E(Φ k Φ H k ) and Q j (k). In fact, we have Therefore, we can conclude that the LMS algorithm converges if and only if all of the Q j (k) converge. The recursive relationship between Q j (k + 1) and all the Q i (k) can be established as follows: In order to overcome the difficulty of the analysis found in the general context, we take into account the properties induced by the input characteristics, namely, (1) C k belongs to a finite-alphabet set (2) Ψ i are constant matrices independent of Φ k .
Hence, the dependence difficulty found in (12) is avoided, and one can deduce that where From (18)-(24), along the same lines as in the linear case [10,14], and by expressing the recursion between Q j (k + 1) and the remaining Q i (k), we have proven, without any constraining independence assumption on the observation vector, that the terms Q j (k + 1) satisfy the following exact and compact recursion: where where Diag Ψ denotes a block diagonal matrix defined by The vector Γ depends on the power of the observation noise and the input statistics and is defined by The compact linear and deterministic equation (25) will replace (11). From (25), we will deduce all adaptive Volterra filter performances.

Convergence conditions
Since the recursion (25) is linear, the convergence of the LMS is simply deduced from the analysis of the eigenvalues of ∆. We assume that the general Markov model (4) describing the channel behavior is stable, the algorithm stability can then be deduced from the stationary case, where M = 1, Ω k = 0, and Λ 1 = I. In this case, since F k is constant, we choose Φ k = V k to analyze the behavior of the algorithm. Hence,

Proposition 1. The LMS algorithm converges only if the alphabet set
Physically, this condition means that, in order to converge to the optimal solution, we have to excite the algorithm in all directions which spans the space.
Proof. If the alphabet set does not span the space, we can find a nonzero vector, z, orthogonal to the alphabet set, and by constructing an augmented vector it is easy to show that ∆Z = Z, and so the matrix ∆ has an eigenvalue equal to one.
This matrix is a Vandermonde matrix, and it is full rank if and only if d > q, which proves the excitation condition. It is easy to note that this result is similar to the one obtained in [7]. As a consequence of this proposition, we can conclude that we cannot use a QPSK signal (d = 4) to identify a Volterra with order q = 5.

Convergence condition
We provide, under the persistent excitation condition, a very useful sufficient critical step size in the following proposition. Proposition 3. If the Markov chain {θ(k)} is ergodic, the alphabet set A = { W 1 , W 2 , . . . , W N } spans the space C β , and the noise n k is zero mean, i.i.d., sequence independent of X k , then there exists a critical step size µ c such that and if µ ≤ µ c , then the amplitude of ∆'s eigenvalues are less than one, and the LMS algorithm converges exponentially in the mean square sense.
Proof. Using the tensorial algebra property (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD), the matrix ∆∆ H is given by It is interesting to note that the matrix diag(( is a nonnegative symmetric matrix. By denoting {D j , j = 1, . . . , N − 1}, the set of vectors orthogonal to the vector W i , the eigenvalues of the matrix (( 2 ) are as follows: Assuming that the Markov chain {θ(k)} is ergodic, the probability transition matrix P is acyclic [15], and it has 1 as the unique largest amplitude eigenvalue, corresponding to the vector u = [1, . . . , 1] T . This means that for a nonzero vector R in C Nβ 2 , R H (P T ⊗ I β 2 )(P ⊗ I β 2 )R = R H R if and only if R has the following structure: where e is a nonzero vector in C β 2 . Now, for any nonzero vector R in C Nβ 2 , there are two possibilities: (1) there exists an e in C β 2 such that R = u ⊗ e, (2) R does not have the structure described by (35).
In the first case, we can express R H ∆∆ H R as follows: Since which means In the second case, it is easy to show that This is due to the fact that Diag Ψ is a symmetric nonnegative matrix, with largest eigenvalue equal to one. Now, using the fact that R does not have the structure (35), this leads to If we resume the two cases, we conclude that for any nonnegative vector R in C Nβ 2 , which concludes the proof.
It is interesting to note that when the input signal is a PSK signal, which has a constant modulus, all the quantities 2/ W H i W i are equal and thus they are also equal to the exact critical step size.
Moreover, in the general case, the exact critical step size µ c and the optimum step size µ opt for convergence are deduced by the analysis of the ∆ eigenvalues as a function of µ. These important quantities depend on the transmitted alphabet and on the transition matrix P.

Steady-state performances
If the convergence conditions are satisfied, we determine the steady-state performances (k → ∞) by From lim k→∞ Q i (k), and using the relationship (9) between V k and Φ k , we deduce that and thus the exact value of MSD. In the same manner, we can compute the exact EMSE: Using the relationship (9) between V k and Φ k , we can develop the EMSE as follows: Under the convergence conditions, E(vec(Φ k Φ H k )1 θ(k)=i ) converges to lim k→∞ Q i (k), the mean square error (MSE) can be given by In this section, we have proven that without using any unrealistic assumptions, we can compute the exact values of the MSD and the MSE. It is interesting to note that the proposed approach remains valid even when the model order of the adaptive Volterra filter is overestimated, which means that the nonlinearity order and/or the memory length of the adaptive Volterra filter are greater than the real system to be identified. In fact, in this case the observation noise is still independent of the input signal, and the used assumptions remain valid. Indeed, this case is equivalent to identifying some coefficients which are set to zero. Of course, this will decrease the rate of convergence, and increase the MSE at the steady state.
In the next section, we will confirm our analysis through a study case.

SIMULATION RESULTS
The exact analysis of adaptive Volterra filters made for the finite-alphabet input case is illustrated in this section. We consider a case study, where we want to identify a nonlinear time-varying channel, modeled by a time-varying Volterra filter. The transmitted symbols are i.i.d. and belong to a QPSK constellation, that is, x k ∈ {1 + j, 1 − j, −1 + j, −1 − j} (where j 2 = −1). In this case, we have and x k can be modeled by a discrete-time Markov chain with transition matrix equal to In this example, we assume that the channel is modeled as follows: The observation noise n k is assumed to be i.i.d complex Gaussian with power E(|n k | 2 ) = 0.001. The parameters vec- T is assumed to be time varying, and its variations are described by a second-order Markovian model where γ = 0.995, α = π/640, and Ω k is a complex Gaussian, zero mean, i.i.d., spatially independent, and with components power E(|ω k | 2 ) = 10 −6 . We assume that the adaptive Volterra filter has the same length as the channel model. In this case, the input observation vector is equal to X k = [x k , x k−1 , x 2 k x k−1 , x k x 2 k−1 ] T , and it belongs to a finite-alphabet set with cardinality equal to 16, which is the number of all x k and x k−1 combinations.
The sufficient critical step size computed using (32) is equal to µ min cNL = 1/10. To analyze the effect of the step size on the convergence rate of the algorithm, we report in Figure 1 the evolution of the largest absolute value of the eigenvalues of ∆, we deduce that (i) the critical step size µ c , deduced from the finitealphabet case, corresponding to λ max (∆) = 1 is equal to µ c = 0.100, which has the same value as µ min cNL = 1/10. This result is expected since the amplitude of the input data x k is constant; (ii) the optimal step µ opt , corresponding to the minimum value of λ max (∆), is µ opt = 0.062. The optimal rate of convergence is found to be min µ λ max (∆) = 0.830.
In order to evaluate the evolution of the EMSE versus the iteration number, we compute the recursion (25), and we run a Monte Carlo simulation over 1000 realizations, for µ = 0.06, for an initial deviation vector V 0 = [1, 1, 1, 1] T ,   and for an initial value of the channel parameters vector F 0 = [0, 0, 0, 0] T . Figure 2 shows the superposition of the simulation results with the theoretical ones. Figure 3 shows the variations of the EMSE at the convergence, versus the step size, which varies from 0.001 to 0.100. The simulation results are obtained by averaging over 100 realizations.
The simulations of transient and steady-state performances are in perfect agreement with the theoretical analysis. Note from Figure 3 the degradation of the tracking capabilities of the algorithm for small step size. The optimum step size is high, and it cannot be deduced from classical analysis.

CONCLUSION
In this paper, we have presented an exact and complete theoretical analysis of the generic LMS algorithm used for the identification of time-varying Volterra structures. The proposed approach is tailored for the finite-alphabet input case, and it was carried out without using any unrealistic independence assumptions. It reflects the exactness of the obtained performances in transient and in steady cases of the adaptive nonlinear filter. All simulations of transient and tracking capabilities are in perfect agreement with our theoretical analysis. Exact and practical bounds on the critical step size and optimal step size for tracking capabilities are provided, which can be helpful in a design context. The exactness and the elegance of the proof are due to the input characteristics, which is commonly used in the digital communications context.