Stochastic analysis of neural network modeling and identification of nonlinear memoryless MIMO systems

Ibnkahla, Mohamed

doi:10.1186/1687-6180-2012-179

Research
Open access
Published: 21 August 2012

Stochastic analysis of neural network modeling and identification of nonlinear memoryless MIMO systems

Mohamed Ibnkahla¹

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 179 (2012) Cite this article

2796 Accesses
2 Citations
Metrics details

Abstract

Neural network (NN) approaches have been widely applied for modeling and identification of nonlinear multiple-input multiple-output (MIMO) systems. This paper proposes a stochastic analysis of a class of these NN algorithms. The class of MIMO systems considered in this paper is composed of a set of single-input nonlinearities followed by a linear combiner. The NN model consists of a set of single-input memoryless NN blocks followed by a linear combiner. A gradient descent algorithm is used for the learning process. Here we give analytical expressions for the mean squared error (MSE), explore the stationary points of the algorithm, evaluate the misadjustment error due to weight fluctuations, and derive recursions for the mean weight transient behavior during the learning process. The paper shows that in the case of independent inputs, the adaptive linear combiner identifies the linear combining matrix of the MIMO system (to within a scaling diagonal matrix) and that each NN block identifies the corresponding unknown nonlinearity to within a scale factor. The paper also investigates the particular case of linear identification of the nonlinear MIMO system. It is shown in this case that, for independent inputs, the adaptive linear combiner identifies a scaled version of the unknown linear combining matrix. The paper is supported with computer simulations which confirm the theoretical results.

Introduction

Neural network[1] approaches have been extensively used in the past few years for nonlinear MIMO system modeling, identification and control where they have shown very good performances compared to classical techniques[2–6].

If these NN approaches are to be used in real systems, it is important for the algorithm designer and the user to understand their learning behavior and performance capabilities. Several authors have analyzed NN algorithms during the last two decades which considerably helped the neural network community to better understand the mechanisms of neural networks[1, 7–15]. For example, the authors in[13] have studied a simple structure consisting of two inputs and a single neuron. The authors in[8] studied a memoryless single-input single-output (SISO) system identification model for the single neuron case. In[9] the authors proposed a stochastic analysis of gradient adaptive identification of nonlinear Wiener systems composed of a linear filter followed with a Zero-memory nonlinearity. The model was composed of a linear adaptive filter followed by an adaptive parameterized version of the nonlinearity. This study has been later generalized[16] for the analysis of stochastic gradient tracking of time-varying polynomial Wiener systems. In[12] the author analyzed NN identification of nonlinear SISO Wiener systems with memory for the case where the adaptive nonlinearity is a memoryless NN with an arbitrary number of neurons. The case of a nonlinear SISO Wiener-Hammerstein system (i.e., an adaptive filter followed by an adaptive Zero-memory NN followed by an adaptive filter) has been analyzed in[11].

This paper deals with a typical class of nonlinear MIMO systems (Figure 1) which is composed of M inputs, M memoryless nonlinearities, a linear combiner, and L outputs. This corresponds, for example, to MIMO channels used in wireless terrestrial communications[17–22], satellite communications[23, 24], amplifier modeling[25], control of nonlinear MIMO systems[6], etc. Recently, a neural network approach has been proposed to adaptively identify the overall input–output transfer function of this class of MIMO systems and to characterize each component of the system (i.e., the memoryless nonlinearities and the linear combiner)[4]. The proposed NN model is composed of a set of memoryless NN blocks followed by an adaptive linear combiner. Each part of the adaptive system aims at identifying the corresponding part in the unknown MIMO system. The algorithm has been successfully applied to system modeling, channel tracking, and fault detection.

The purpose of this paper is to provide a stochastic analysis of NN modeling of this class of MIMO systems. The paper provides a general methodology that may be used to solve other problems in stochastic NN learning analysis. The methodology consists of splitting the study into simple structures, before studying the complete structure. Here, as a first step we start by analyzing a simple linear adaptive MIMO scheme (consisting of an adaptive matrix) that identifies the nonlinear MIMO system (i.e., problem of linear identification of a nonlinear MIMO system). Then we analyze a nonlinear adaptive system in which the nonlinearities are assumed to be known and frozen during the learning process, only the linear combiner is made adaptive. Finally, the complete adaptive scheme is analyzed taking into account the insights given by the analysis of the simpler structures. In our analytical approach, we derive the general formulas and recursions, which we apply to a case study that we believe is illustrative to the reader.

The paper is organized as follows. The problem statement is given in Section 2. The study of the simple structures is detailed in Section 3. Section 4 presents the analysis for the complete structure. Simulation results and illustrations are given in Section 5. Finally, conclusions and future work are given in Section 6.

Problem statement

Nonlinear MIMO system

The class of nonlinear MIMO systems discussed in this paper is presented in Figure 1. Each input x_i (n) (i = 1,…,M) is nonlinearly transformed by a memoryless nonlinearity g_i (.). The outputs of these nonlinearities are then linearly combined by an L × M matrix H = [h_ji] (assumed in this paper to be constant). Matrix H is defined by the unknown system to be identified. For example, in wireless MIMO communication systems, M is the propagation matrix representing the channel between M transmitting antennas and L receiving antennas.

The j^th output of the MIMO system is expressed as:

y_{j} (n) = \sum_{i = 1}^{M} h_{ji} (n) g_{i} (x_{i} (n)) + N_{j} (n)

(1)

where N_j is a white Gaussian noise with variance σ₀². Let

$\begin{array}{l} X (n) = [x_{1} (n) x_{2} (n) \dots x_{M} (n)]^{t}, g (X (n)) = [g_{1} (x_{1} (n)) \\ g_{2} (x_{2} (n)) \dots g_{M} (x_{M} (n))]^{t}, Y (n) = [y_{1} (n) y_{2} (n) \dots y_{L} (n)]^{t}, \\ and N (n) = {[N_{1} (n) N_{2} (n) \dots N_{L} (n)]}^{t} . \end{array}$

The system input–output relationship can be expressed in a matrix form as:

Y (n) = H \times g (X (n)) + N (n) .

(2)

Neural Network identification structure and algorithm

The neural network (Figure 2) is composed of M blocks. Each block k has a scalar input x_k (n) (k = 1,…,M), N neurons and a scalar output. The output of the k^th block is expressed as:

N N_{k} (n) = \sum_{i = 1}^{N} c_{ki} f (a_{ki} x_{k} (n) + b_{ki}), k = 1, ...., M

(3)

Where f is the NN activation function. a_ki, c_ki, b_ki are, respectively, the input weight, bias term, and output weight of the i^th neuron in the k^th block. The output NN_k of the k^th block is connected to the j^th output of the system through weight w_jk. The system j^th output is then expressed as:

s_{j} (n) = \sum_{k = 1}^{M} w_{jk} N N_{k} (n), j = 1,, L

(4)

Weights w_jk will be represented by an LxM matrix: W = [w_jk]. Let

$\begin{array}{l} S (n) = {[s_{1} (n) s_{2} (n) \dots s_{L} (n)]}^{t} a n d N N (n) = [N N_{1} (n) \\ {N N_{2} (n) \dots N N_{M} (n)]}^{t} . \end{array}$

Equations (4) can then be expressed in a matrix form as:

S (n) = W \times N N (n) .

(5)

For the learning process, the NN parameters are updated so that to minimize the sum of the squared errors between the unknown system outputs and the corresponding outputs of the model (Figure 3):

{‖e (n)‖}^{2} = \sum_{j = 1}^{L} e_{j}^{2} (n) .

(6)

$\begin{array}{l} Here e_{j} (n) = y_{j} (n) - s_{j} (n) a n d e (n) = [e_{1} (n) e_{2} (n) \dots \\ e_{L} (n)]^{t} . \end{array}$

The gradient descent recursions for weight adaptation are:

W (n + 1) = W (n) + 2 μ e (n) N N^{t} (n)

(7)

c_{ki} (n + 1) = c_{ki} (n) + 2 μ f (a_{ki} x_{k} (n) + b_{ki}) \sum_{l = 1}^{L} w_{lk} e_{l} (n)

(8)

\begin{array}{l} a_{ki} (n + 1) = a_{ki} (n) + 2 μ c_{ki} x_{k} (n) f^{'} (a_{ki} x_{k} (n) \\ + b_{ki}) \sum_{l = 1}^{L} w_{lk} e_{l} (n) \end{array}

(9)

\begin{array}{l} b_{ki} (n + 1) = b_{ki} (n) + 2 μ c_{ki} f^{'} (a_{ki} x_{k} (n) + b_{ki}) \\ x \sum_{l = 1}^{L} w_{lk} e_{l} (n) \end{array}

(10)

where μ is a small positive constant and $f^{'} ()$ represents the derivative: $f^{'} (x) = \frac{\partial f (x)}{\partial x} .$

Case study

After the derivation of the general formulas, it is important that we apply them to special cases in order to get closed-form expressions of the different recursions that can be illustrated to the reader. We have chosen here a case study that we think is good to illustrate our results. In this case study, the inputs x_i (n) will be assumed uncorrelated Zero-mean Gaussian variables with variance $σ_{x_{i}}^{2}$ . The NN activation function will be taken as the erf function. The unknown nonlinear transfer functions are taken from a family of nonlinear functions of the form $g_{i} (x) = α_{i} x exp (\frac{- β_{i} x^{2}}{2})$ , where α_i and β_i are positive constants. These nonlinear functions are reasonable models for amplitude conversions of nonlinear high power amplifiers (HPA) used in digital communications[12, 25, 26]. Note that other nonlinear functions may be considered, however, explicit closed-form solutions of the different derivations may not be possible.

Study of simplified structures: Linear adaptation

Before analyzing the full structure, we will analyze the following simplified schemes which will help us understand the complete structure:

1.
The adaptive system is composed of an adaptive linear combiner W (Section 3.1).
2.
The adaptive system is composed of W and scaled versions of the unknown nonlinearities (Section 3.2).

Linear adaptive system

This section studies the linear adaptive system that tries to model the nonlinear MIMO system (Figure 4):

Mean weight behavior and Wiener solution

Since matrix W is linear, it will not be able to identify the nonlinear blocks. However, we will see that it is able to identify matrix H to within a diagonal scaling matrix if the inputs are Zero-mean and independent.

The gradient descent update of matrix W is expressed as:

\begin{matrix} W (n + 1) & = W (n) + 2 μ e (n) X^{t} (n) \\ = W (n) + 2 μ ((H g X (n) + N (n) \\ - W (n) X (n) (X^{t} (n) \end{matrix}

(11)

Averaging both sides of (11) and using the standard LMS assumption of small μ[10], we obtain:

\begin{matrix} E (W (n + 1)) & \approx E (W (n)) + 2 μ (H R_{g (X) X} \\ - E (W (n)) R_{XX}) \\ = E (W (n)) (I - 2 μ R_{XX}) + 2 μ H R_{g (X) X} \end{matrix}

(12)

Where $R_{XX} = E (X X^{t}), R_{g (X) X} = E (g (X) X^{t}) .$

By setting the updating gradient term to Zero, it can be shown that this equation has a single stationary point (Wiener solution[10]) which is expressed by:

W = W_{0} = H U w h e r e U = R_{g (X) X} R_{XX}^{- 1}

(13)

Following Equation (12), the mean weights can be expressed as a function of the initial condition as:

\begin{matrix} E (W (n)) & = W (0) {(I - 2 μ R_{XX})}^{n} \\ + 2 μ H R_{g (X) X} \sum_{p = 0}^{n - 1} {(I - 2 μ R_{XX})}^{p} \end{matrix}

(14)

If μ is sufficiently small, the first term converges to 0 and the second term converges to $H R_{g (X) X} R_{XX}^{- 1}$ .

Hence, the mean weights converge to the Wiener solution:

W (\infty) = W_{0} = H R_{g (X) X} R_{XX}^{- 1}

(15)

It can be easily shown that the stability condition on μ is[10]:

0 < μ < \frac{1}{λ_{max}}

(16)

where λ_max is the largest eigenvalue of the covariance matrix R_XX.

Note that for Zero-mean independent inputs, U is a diagonal matrix:

\begin{array}{l} U = R_{g (X) X} R_{XX}^{- 1} \\ = [\begin{matrix} \frac{E [g_{1} (x_{1}) x_{1}]}{σ_{x_{1}}^{2}} & 0… & 0 \\ 0 & \frac{E [g_{2} (x_{2}) x_{2}]}{σ_{x_{2}}^{2}} & \dots \\ 0 & 0… & \frac{E [g_{M} (x_{M}) x_{M}]}{σ_{x_{M}}^{2}} \end{matrix}] \end{array}

(17)

In this case, the linear adaptation allows the identification of matrix W to within a scaling matrix, which depends on the nonlinearities and the input signals. As expected, the scaling matrix reduces to the identity matrix if g_k (x_k) = x_k.

Application to the case study:

For the particular nonlinear functions given in the case study (see Section 2.3), it is easy to show:

\begin{array}{l} E (x_{i} (n) g_{i} (x_{i} (n))) = \frac{α_{i} {σ_{x_{i}}}^{2}}{{(1 + β_{i} {σ_{x_{i}}}^{2})}^{\frac{3}{2}}} and \\ E (g_{i}^{2} (x_{i} (n))) = \frac{α_{i}^{2} {σ_{x_{i}}}^{2}}{{(1 + 2 β_{i} {σ_{x_{i}}}^{2})}^{\frac{3}{2}}} \end{array}

(18)

The mean weight transient recursions are expressed as:

\begin{array}{l} E (w_{jk} (n + 1)) = E (w_{jk} (n)) (1 - 2 μ σ_{x_{k}}^{2}) \\ + 2 μ h_{jk} \frac{α_{k} σ_{x_{k}}^{2}}{{(1 + β_{k} σ_{x_{k}}^{2})}^{\frac{3}{2}}} \end{array}

(19)

Matrix U reduces to the following diagonal matrix:

U = [\begin{matrix} \frac{α_{1}}{{(1 + β_{1} σ_{x_{1}}^{2})}^{\frac{3}{2}}} & 0… & 0 \\ 0 & \frac{α_{2}}{{(1 + β_{2} σ_{x_{2}}^{2})}^{\frac{3}{2}}} & \dots \\ 0 & 0… & \frac{α_{M}}{{(1 + β_{M} σ_{x_{M}}^{2})}^{\frac{3}{2}}} \end{matrix}]

(20)

Transient MSE and Wiener MSE

The transient MSE is determined by:

\begin{array}{l} E ({‖e (n)‖}^{2}) & = E ({‖H g (X (n)) + N (n) - W (n) X (n)‖}^{2}) \\ = \sum_{j = 1}^{L} E [e_{j}^{2} (n)] \end{array}

(21)

where:

E (e_{j}^{2} (n)) = E ({‖H_{j} g (X (n)) + N_{j} (n) - W_{j} (n) X (n)‖}^{2})

(22)

$\begin{array}{l} where W_{j} (n) = {[w_{j 1} (n) w_{j 2} (n) \dots w_{jM} (n)]}^{t} a n d H_{j} = \\ {[h_{j 1} h_{j 2} \dots h_{jM}]}^{t} . \end{array}$

Using the independence of noise and weights at time n, we get:

\begin{array}{l} E (e_{j}^{2} (n)) = σ_{0}^{2} + E ({‖(H_{j} g (X (n)) - W_{j} (n) X (n)‖}^{2}) \\ = σ_{0}^{2} + H_{j}^{t} R_{g (X) g (X)} H_{j} - 2 H_{j}^{t} R_{g (X) X} E (W_{j} (n)) \\ + E (W_{j} (n) R_{XX} W_{j}^{t} (n)) \end{array}

(23)

The total MSE is therefore expressed as:

\begin{array}{l} E ({‖e (n)‖}^{2}) = L σ_{0}^{2} + \sum_{j = 1}^{L} H_{j}^{t} R_{g (X) g (X)} H_{j} \\ - 2 H_{j}^{t} R_{g (X) X} E (W_{j} (n)) \\ + E (W_{j}^{t} (n) R_{XX} W_{j} (n)) \end{array} .

(24)

Wiener MSE:

The Wiener MSE, $ζ_{0} = E ({‖e_{W_{0}} (n)‖}^{2})$ , is the minimum MSE that can be reached by the system if W is equal to the Wiener solution W₀ = HU. It can be easily shown that:

\begin{array}{l} ζ_{0} = E ({‖e_{W_{0}} (n)‖}^{2}) \\ = L σ_{0}^{2} + E ({‖H g (X (n)) - W_{0} X (n)‖}^{2}) \\ = L σ_{0}^{2} + E ({‖H (g (X (n)) - U X (n))‖}^{2}) \end{array}

(25)

It is clear from this equation that if the unknown functions are linear, then the Wiener MSE reduces to the noise power. The MSE is always larger than ζ₀ because of the misadjustment error introduced by the weight fluctuations.

Now we can write the MSE as a function of the Wiener MSE:

\begin{matrix} E ({‖e (n)‖}^{2}) & = E ({‖H (g (X (n)) + N (n) - W (n) X (n))‖}^{2}) \\ = E ({‖e_{W_{0}} (n) - (W (n) - W_{0}) X (n)]‖}^{2}) \end{matrix}

(26)

Let the instantaneous deviation of the matrix weights with respect to the Wiener solution be denoted by:

V (n) = [v_{jk} (n)] = W (n) - W_{0} .

(27)

We have:

E ({‖e (n)‖}^{2}) = E ({‖e_{W_{0}} (n) - V (n) X (n)‖}^{2}) .

(28)

This expression is similar to that of the well-known LMS algorithm[10], and can be evaluated as the sum of the minimum error and excess error (or misadjustment) as:

E ({‖e (n)‖}^{2}) = ζ_{0} + \sum_{j = 1}^{L} t r (R_{XX} K_{V_{j} V_{j}} (n))

(29)

where $V_{j} (n) = {[v_{j 1} v_{j 2} \dots v_{jM}]}^{t} a n d K_{V_{j} V_{j}} (n) = E (V_{j} (n) V_{j}^{t} (n))$ .

The misadjustment is expressed as:

Δ (n) = t r (R_{XX} \sum_{j = 1}^{L} R_{V_{j} V_{j}} (n)) .

(30)

At the convergence, we have:

E ({‖e (\infty)‖}^{2}) = ζ_{0} + Δ (\infty) .

(31)

Derivation of the misadjustment:

From Equation (11) it is easy to show that the weight fluctuations follow the recursion:

V (n + 1) = V (n) + 2 μ (e_{W_{0}} (n) - V (n) X (n)) X^{t} (n)

(32)

Taking the mean of this equation and applying the orthogonality principle between the input vector and the Wiener error, we get:

E (V (n + 1)) = E (V (n)) (1 - 2 μ R_{XX})

(33)

Thus, as expected, if μ is sufficiently small E( V (n)) converges to 0.

Similarly, for each vector V_j we can obtain the following recursion:

V_{j} (n + 1) = V_{j} (n) + 2 μ (e_{W_{0} j} (n) - X^{t} (n) V_{j} (n)) X (n))

(34)

The evaluation of the covariance matrix of the weight fluctuations is obtained by multiplying both sides of Equation (34) by $V_{j}^{t} (n + 1)$ and averaging:

\begin{matrix} K_{V_{j} V_{j}^{t}} (n + 1) & = K_{V_{j}^{t} V_{j}^{t}} (n) - 2 μ R_{XX} K_{V_{j} V_{j}} (n) \\ - 2 μ K_{V_{j} V_{j}} (n) R_{XX} \\ + 2 μ E [e_{W_{0} j} (n) X V_{j}^{t} (I - 2 μ X X^{t})] \\ + 2 μ E {[e_{W_{0} j} (n) X V_{j}^{t} (I - 2 μ X X^{t})]}^{t} \\ + 4 μ^{2} E [X X^{t} K_{V_{j} V_{j}} X X^{t}] \\ + 4 μ^{2} E [e_{W_{0} j}^{2} (n) X X^{t}] \end{matrix}

(35)

These expectations are derived in Appendix III, which yields:

\begin{matrix} K_{V_{j} V_{j}^{t}} (n + 1) & = K_{V_{j}^{t} V_{j}^{t}} (n) - 2 μ R_{XX} K_{V_{j} V_{j}} (n) \\ - 2 μ K_{V_{j} V_{j}} (n) R_{XX} \\ + 4 μ^{2} (- E [H_{j}^{t} g (X) X E (V_{j}^{t} (n)) X X^{t}] \\ + t r (R_{XX} W_{0} E (V_{j}^{t} (n))) R_{XX} \\ + R_{XX} W_{0} E (V_{j}^{t} (n)) R_{XX})) \\ + 4 μ^{2} (- E [H_{j}^{t} g (X) X E (V_{j}^{t} (n)) X X^{t}] \\ + t r (R_{XX} W_{0} E (V_{j}^{t} (n))) R_{XX} \\ {+ R_{XX} W_{0} E (V_{j}^{t} (n)) R_{XX}))}^{t} \\ + 4 μ^{2} (t r (R_{XX} K_{V_{j} V_{j}} (n)) R_{XX} \\ + 2 R_{XX} K_{V_{j} V_{j}} (n) R_{XX}) \\ + 4 μ^{2} (E [g (X) g^{t} (X) H_{j} H_{j}^{t} X X^{t}] \\ + σ_{0}^{2} R_{XX} - t r (R_{XX} W_{0 j}^{t} W_{0 j}^{t}) R_{XX} \\ - 2 R_{XX} W_{0 j}^{t} W_{0 j}^{t} R_{XX}) \end{matrix}

(36)

Taking into account that E(V_j(∞)) = 0, K_VjVj can be obtained by solving the following equation:

\begin{matrix} R_{XX} & K_{V_{j} V_{j}} (\infty) + K_{V_{j} V_{j}} (\infty) R_{XX} - 2 μ t r (R_{XX} K_{V_{j} V_{j}} (\infty)) R_{XX} \\ - 4 μ R_{XX} K_{V_{j} V_{j}} (\infty) R_{XX} \\ = 2 μ [E [g (X) g^{t} (X) H_{j} H_{j}^{t} X X^{t}] + 2 μ σ_{0}^{2} R_{XX} \\ - 2 μ t r (R_{XX} W_{0 j}^{t} W_{0 j}^{t}) R_{XX} - 4 μ R_{XX} W_{0 j}^{t} W_{0 j}^{t} R_{XX}] \end{matrix}

(37)

This expression holds for any input signal. It can be simplified if $R_{XX} = σ_{x}^{2} I$ . In this case we have:

t r (R_{XX} K_{V_{j} V_{j}} (\infty)) = μ \frac{σ_{0}^{2} σ_{x}^{2} M + t r (E (g (X) g^{t} (X) H_{j} H_{j}^{t} X X^{t})) - σ_{x}^{4} (M + 2) t r (W_{0 j} W_{0 j}^{t})}{1 - μ σ_{x}^{2} (M + 2)}

(38)

It is now easy to determine the total misadjustment:

Δ (\infty) = \sum_{j = 1}^{L} t r (R_{XX} K_{V_{j} V_{j}} (\infty)) = μ \frac{σ_{0}^{2} σ_{x}^{2} M L + t r [E (g (X) g^{t} (X) \sum_{j = 1}^{L} H_{j} H_{j}^{t} X X^{t})] - σ_{x}^{4} (M + 2) t r (W_{0} W_{0}^{t})}{1 - μ σ_{x}^{2} (M + 2)}

(39)

Note that, as expected, in the case of linear functions Δ(∞) reduces to:

{Δ (\infty) |}_{g (X) = X} = \frac{μ σ_{0}^{2} σ_{x}^{2} M L}{1 - μ σ_{x}^{2} (M + 2)} .

(40)

The additional terms are due to the nonlinearities and they should be calculated specifically for each nonlinearity.

Application to the case study:

The MSE is expressed as:

\begin{matrix} E ({‖e (n)‖}^{2}) & = L {σ_{0}}^{2} + \sum_{j = 1}^{L} \sum_{k = 1}^{M} α_{k} σ_{x_{k}}^{2} [\frac{α_{k}}{{(1 + 2 β_{i} {σ_{x_{k}}}^{2})}^{\frac{3}{2}}} h_{jk}^{2} \\ - \frac{2}{{(1 + β_{k} {σ_{x_{k}}}^{2})}^{\frac{3}{2}}} h_{jk} w_{jk} (n) + w_{jk}^{2} (n)] \end{matrix}

(41)

The Wiener MSE is expressed in this case as:

ζ_{0} = L {σ_{0}}^{2} + \sum_{i = 1}^{M} α_{i}^{2} σ_{x_{i}}^{2} \frac{{(1 + 2 β_{i} {σ_{x_{i}}}^{2})}^{\frac{3}{2}} - 1}{{(1 + β_{i} {σ_{x_{i}}}^{2})}^{3}} \sum_{j = 1}^{L} h_{ji}^{2}

(42)

Adaptive W, the nonlinearities are frozen and known with scale factors

In this section, matrix W is adaptive, the nonlinearities are frozen and known with scale factors (Figure 5).

Mean weight behavior and stationary points

The gradient descent update of matrix W is expressed as:

\begin{matrix} W (n + 1) & = W (n) + 2 μ e (n) Ω g {(X (n))}^{t} \\ = W (n) + 2 μ [(H g (X (n)) + N (n) \\ - W (n) Ω g (X (n))] Ω g {(X (n))}^{t} \end{matrix}

(43)

where $Ω = [\begin{matrix} η_{1} & 0 & 0 \\ 0 & \dots & 0 \\ 0 & η_{M} \end{matrix}]$ .

Averaging both sides of (43) and using the standard LMS assumption of small μ, we obtain:

\begin{matrix} E (W (n + 1)) & \approx E (W (n)) + 2 μ (H Ω R_{g (X) g (X)} \\ - E (W (n)) Ω^{2} R_{g (X) g (X)}) \\ = E (W (n)) (I - 2 μ Ω^{2} R_{g (X) g (X)}) \\ + 2 μ H Ω R_{g (X) g (X)} \end{matrix}

(44)

These recursions have a single stationary point (Wiener solution) which is:

W = W_{0} = H \times Ω^{- 1}

(45)

Following Equation (44), the mean weight behavior can be expressed as function of the initial condition as:

\begin{matrix} E (W (n)) & = W (0) {(I - 2 μ Ω^{2} R_{g (X) g (X)})}^{n} \\ + 2 μ H Ω R_{g (X) g (X)} \sum_{p = 0}^{n - 1} {(I - 2 μ Ω^{2} R_{g (X) g (X)})}^{p} \end{matrix}

(46)

Hence, if μ is sufficiently small, it can be shown that the mean weights converge to the Wiener solution:

W (\infty) = W_{0} = H Ω^{- 1} .

(47)

The stability condition on μ is: $0 < μ < \frac{1}{λ_{max}}$

Where λ_max is the largest eigenvalue of the covariance matrix Ω²R_{g (X)g(X)}.

Thus, if each nonlinear function g_k (.) is known with a scaling factor η_k, then weights h_jk will be identified by w_jk (to the inverse of the scaling factor).

MSE

We have:

\begin{matrix} E ({‖e (n)‖}^{2}) & = E (‖ H g (X (n)) + N (n) \\ - W (n) Ω g (X (n)) ‖^{2}) \\ = \sum_{j = 1}^{L} E [e_{j}^{2} (n)] \end{matrix}

(48)

where:

\begin{matrix} E (e_{j}^{2} (n)) & = E (‖ H_{j} g (X (n)) + N_{j} (n) \\ - W_{j} (n) Ω g (X (n)) ‖^{2}) \end{matrix}

(49)

Using the independence of noise and weights at time n, we obtain:

\begin{matrix} E (e_{j}^{2} (n)) & = σ_{0}^{2} + E ({‖(H_{j} - Ω W_{j} (n)) g (X (n))‖}^{2}) \\ = σ_{0}^{2} + H_{j}^{t} R_{g (X) g (X)} H_{j} \\ - 2 H_{j}^{t} Ω R_{g (X) g (X)} E (W_{j} (n)) \\ + E (W_{j}^{t} (n) Ω^{2} R_{g (X) g (X)} W_{j} (n)) \end{matrix}

The MSE is therefore expressed as:

\begin{matrix} E ({‖e (n)‖}^{2}) & = L σ_{0}^{2} + \sum_{j = 1}^{L} {H_{j}}^{t} R_{g (X) g (X)} H_{j} \\ - 2 {H_{j}}^{t} Ω R_{g (X) g (X)} E (W_{j} (n)) \\ + E (W_{j}^{t} (n) Ω^{2} R_{g (X) g (X)} W_{j} (n)) \end{matrix}

(50)

Wiener MSE:

The Wiener MSE can be easily expressed as:

\begin{matrix} ζ_{0} & = E (e_{W_{0}}^{2} (n)) \\ = L σ_{0}^{2} + E ({‖(H - W_{0} Ω) g (X (n))‖}^{2}) = L σ_{0}^{2} \end{matrix}

(51)

Therefore the Wiener MSE is equal to the noise floor: There are no terms due to the nonlinearities. This is expected since the nonlinearities are known with a scaling matrix Ω (we have seen that the scaling matrix is canceled by W₀ since W₀ = HΩ^-1).

Let Z (n) = Ω g(X(n)), we can then express the MSE as a function of ζ₀, the weight fluctuation vector V(n) = W(n)–W₀, and Z(n):

\begin{matrix} E ({‖e (n)‖}^{2}) & = E ({‖e_{W_{0}} (n) - (W (n) - W_{0}) Z (n)‖}^{2}) \\ = ζ_{0} + E ({‖V (n) Z (n)‖}^{2}) \\ = ζ_{0} + \sum_{j = 1}^{L} t r (R_{ZZ} K_{V_{j} V_{j}} (n)) \end{matrix}

(52)

Similarly to Equation (29), the misadjustment is expressed as:

Δ (n) = t r (R_{ZZ} \sum_{j = 1}^{L} K_{V_{j} V_{j}} (n)) .

(53)

The steady state MSE is then expressed as:

E ({‖e (\infty)‖}^{2}) = ζ_{0} + Δ (\infty) .

(54)

Derivation of the misadjustment:

It is easy to show that the weight fluctuations follow the recursion:

V (n + 1) = V (n) + 2 μ (e_{W_{0}} (n) - V (n) Z (n)) Z^{t} (n)) .

(55)

Taking the mean of this equation and applying the orthogonality principle between the input vector and the Wiener error, we obtain:

E (V (n + 1)) = E (V (n)) \times (1 - 2 μ R_{ZZ}) .

(56)

Thus, as expected, if μ is sufficiently small, E(V(n)) converges to 0.

For each vector V_j we have similar recursions:

V_{j} (n + 1) = V_{j} (n) + 2 μ (e_{W_{0} j} (n) - Z {(n)}^{t} V_{j} (n)) Z (n)) .

(57)

The evaluation of the covariance matrix of the weight fluctuations is obtained by multiplying both sides of Equation (57) by $V_{j}^{t} (n + 1)$ and averaging:

\begin{matrix} K_{V_{j} V_{j}^{t}} (n + 1) \\ = K_{V_{j}^{t} V_{j}^{t}} (n) - 2 μ R_{ZZ} K_{V_{j} V_{j}} (n) - 2 μ K_{V_{j} V_{j}} (n) R_{ZZ} \\ + 2 μ E [e_{W_{0} j} (n) Z (n) V_{j}^{t} (I - 2 μ Z (n) Z {(n)}^{t})] \\ + 2 μ E {[e_{W_{0} j} (n) Z (n) V_{j}^{t} (I - 2 μ Z (n) Z {(n)}^{t})]}^{t} \\ + 4 μ^{2} E [Z (n) Z {(n)}^{t} K_{V_{j} V_{j}} Z (n) Z {(n)}^{t}] \\ + 4 μ^{2} E [e_{W_{0} j}^{2} (n) Z (n) Z {(n)}^{t}] \end{matrix}

(58)

In a similar way as in Appendix III, K_VjVj (∞) can be obtained by solving the following equation:

\begin{matrix} R_{ZZ} & K_{V_{j} V_{j}} (\infty) + K_{V_{j} V_{j}} (\infty) R_{ZZ} - 2 μ t r (R_{ZZ} K_{V_{j} V_{j}} (\infty)) R_{ZZ} \\ - 4 μ R_{ZZ} K_{V_{j} V_{j}} (\infty) R_{ZZ} \\ = 2 μ [E [g (X) g^{t} (X) H_{j} H_{j}^{t} Z Z^{t}] + 2 μ σ_{0}^{2} R_{ZZ} \\ - 2 μ t r (R_{ZZ} W_{0 j}^{t} W_{0 j}^{t}) R_{ZZ} - 4 μ R_{ZZ} W_{0 j}^{t} W_{0 j}^{t} R_{ZZ}] \end{matrix}

(59)

This expression can not be further simplified because R_ZZ is not necessarily of the form $σ_{z}^{2} I$ .

Therefore, tr(R_ZZK_VjVj (∞)) should be calculated for each nonlinearity and for each Ω.

It is interesting to study the case where nonlinearities are known with the same scaling factor, i.e., Ω = ηI. In this case, and if the input vectors are independent and the outputs of the nonlinearities are Zero-mean and of equal variance $σ_{g}^{2}$ , we have:

t r (R_{ZZ} K_{V_{j} V_{j}} (\infty)) = μ \frac{σ_{0}^{2} η^{2} σ_{g}^{2} M}{1 - μ η^{2} σ_{g}^{2} (M + 2)}

(60)

As expected, the total misadjustment reduces to:

Δ (\infty) = \sum_{j = 1}^{L} t r (R_{ZZ} K_{V_{j} V_{j}} (\infty)) = μ \frac{σ_{0}^{2} η^{2} σ_{g}^{2} M L}{1 - μ η^{2} σ_{g}^{2} (M + 2)} .

(61)

Here the value of the misadjustment is similar to that of linear identification of a linear system (LMS algorithm). This is expected since in this case there are no errors due to the approximation of the nonlinearities.

Case study

For the case study, it is easy to show that $G_{ii} = E (g (x_{i}^{2} (n))) = \frac{{α_{i}}^{2} {σ_{x}}^{2}}{{(1 + 2 β_{i} {σ_{x_{i}}}^{2})}^{\frac{3}{2}}} .$ This yields:

E (e_{j}^{2} (n)) = {σ_{0}}^{2} + \sum_{i}^{M} \frac{{α_{i}}^{2} {σ_{x_{i}}}^{2}}{{(1 + 2 β_{i} {σ_{x_{i}}}^{2})}^{\frac{3}{2}}} {(h_{ji} - w_{ji} (n) η_{i})}^{2}

(62)

Study of the full structure

This section deals with the full structure (Figures 2,3). All the NN and matrix W weights are updated.

Mean weight transient behavior

We take the following notations for the weights: $E (w_{jk} (n)) = {\bar{w}}_{jk} (n), E (c_{ki} (n)) = {\bar{c}}_{ki} (n), E (a_{ki} (n)) = {\bar{a}}_{ki} (n), E (b_{ki} (n)) = {\bar{b}}_{ki} (n)$ .

The update of matrix W is expressed as:

\begin{matrix} W (n + 1) & = W (n) + 2 μ e (n) N N {(X (n))}^{t} \\ = W (n) + 2 μ [H g (X (n)) + N (n) \\ - W (n) N N (X (n))] N N {(X (n))}^{t} \end{matrix}

(63)

Averaging both sides of (63) and using the standard LMS assumption of small μ, we obtain:

\begin{matrix} E (W (n + 1)) & \approx E (W (n)) + 2 μ (H R_{g (X) N N (X)} (n) \\ - E (W (n)) R_{N N (X) N N (X)} (n)) \\ = E (W (n)) (I - 2 μ R_{N N (X) N N (X)} (n)) \\ + 2 μ H R_{g (X) N N (X)} (n) \end{matrix}

(64)

Where $R_{N N (X) N N (X)} (n) = E [N N (X (n)) N N {(X (n))}^{t}], R_{g (X) N N (X)} (n) = E [g (X (n)) N N {(X (n))}^{t}]$ .

These matrices are time-dependent since they depend on the NN block weights which are updated through time.

Using the scalar notation we have:

\begin{matrix} {\bar{w}}_{jk} (n + 1) = {\bar{w}}_{jk} (n) \\ + 2 μ E [(\sum_{i = 1}^{M} h_{ji} g_{i} (x_{i} (n)) - \sum_{l = 1}^{M} w_{jl} N N_{l} (n))) \\ \times \sum_{m = 1}^{N} c_{km} f (a_{km} x_{k} (n) + b_{km})] \\ \approx {\bar{w}}_{jk} (n) + 2 μ [\sum_{i, m}^{M, N} h_{ji} {\bar{c}}_{km} E (g_{i} (x_{i} (n)) f ({\bar{a}}_{km} x_{k} (n) + {\bar{b}}_{km})) \\ - \sum_{l, m, i}^{M, N, N} {\bar{w}}_{jl} {\bar{c}}_{li} {\bar{c}}_{km} E (f ({\bar{a}}_{li} x_{l} (n) + {\bar{b}}_{li}) f ({\bar{a}}_{km} x_{k} (n) + {\bar{b}}_{km}))] \end{matrix}

(65)

$\begin{matrix} Let: K_{i} (a_{km}, b_{km}) = E (g_{i} (x_{i} (n)) f (a_{km} x_{k} (n) + b_{km})), a n d \\ F_{lk} (a_{li}, b_{li}, a_{km}, b_{km}) = E (f (a_{li} x_{l} (n) + b_{li}) f (a_{km} x_{k} (n) + b_{km})) \end{matrix}$

With these notations we have:

\begin{matrix} {\bar{w}}_{jk} (n + 1) & = {\bar{w}}_{jk} (n) + 2 μ [\sum_{i, m}^{M, N} h_{ji} c_{km} K_{i} ({\bar{a}}_{km}, {\bar{b}}_{km}) \\ - \sum_{l, m, i}^{M, N, N} {\bar{w}}_{jl} {\bar{c}}_{li} {\bar{c}}_{km} F_{lk} ({\bar{a}}_{li}, {\bar{b}}_{li}, {\bar{a}}_{km}, {\bar{b}}_{km})] \end{matrix}

(66)

For the NN block weights we have:

\begin{matrix} {\bar{c}}_{ki} (n + 1) = {\bar{c}}_{ki} (n) + 2 μ E (\sum_{l = 1}^{L}, w_{lk} e_{l} (n) f (a_{ki} x_{k} (n) + b_{ki})) \\ \approx {\bar{c}}_{ki} (n) + 2 μ \sum_{l = 1}^{L} {\bar{w}}_{lk} E (\sum_{p = 1}^{M} h_{lp} g_{p} (x (n)) f ({\bar{a}}_{ki} x (n) + {\bar{b}}_{ki}) \\ - \sum_{m = 1}^{M} w_{lm} (\sum_{q = 1}^{N}, {\bar{c}}_{mq} f ({\bar{a}}_{mq} x (n) + {\bar{b}}_{mq})) f ({\bar{a}}_{ki} x (n) + {\bar{b}}_{ki})) \\ = {\bar{c}}_{ki} (n) + 2 μ \sum_{l = 1}^{L} {\bar{w}}_{lk} (\sum_{p = 1}^{M} h_{lp} K_{p} ({\bar{a}}_{ki}, {\bar{b}}_{ki}) \\ - \sum_{m = 1}^{M} {\bar{w}}_{lm} (\sum_{q = 1}^{N}, {\bar{c}}_{mq} F_{mk} ({\bar{a}}_{mq}, {\bar{b}}_{mq}, {\bar{a}}_{ki}, {\bar{b}}_{ki}))) \end{matrix}

(67)

\begin{matrix} {\bar{a}}_{ki} (n + 1) = {\bar{a}}_{ki} (n) \\ + 2 μ E [c_{ki} \sum_{l = 1}^{L} w_{lk} e_{l} (n) x (n) f^{'} (a_{ki} x (n) + b_{ki})] \\ \approx {\bar{a}}_{ki} (n) + 2 μ {\bar{c}}_{ki} \sum_{l = 1}^{L} {\bar{w}}_{lk} (\sum_{p = 1}^{M}, h_{lp} E (g_{p} (x (n)) x (n) \\ \times f^{'} ({\bar{a}}_{ki} x (n) + {\bar{b}}_{ki})) - \sum_{m = 1}^{M} {\bar{w}}_{lm} (\sum_{q = 1}^{N} {\bar{c}}_{mq} E (f ({\bar{a}}_{mq} x (n) \\ + {\bar{b}}_{mq}) x (n) f^{'} ({\bar{a}}_{ki} x (n) + {\bar{b}}_{ki})))) \\ = {\bar{a}}_{ki} (n) + 2 μ {\bar{c}}_{ki} \sum_{l = 1}^{L} {\bar{w}}_{lk} (\sum_{p = 1}^{M} h_{lp} \frac{\partial K_{p} ({\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{a}}_{ki}} \\ - \sum_{m, q, (m, q) \neq (k, i)}^{M, N} {\bar{w}}_{lm} {\bar{c}}_{mq} \frac{\partial F_{mk} ({\bar{a}}_{mq}, {\bar{b}}_{mq}, {\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{a}}_{ki}} \\ - \frac{1}{2}, {\bar{w}}_{lk}, {\bar{c}}_{ki}, \frac{\partial F_{ki} ({\bar{a}}_{ki}, {\bar{b}}_{ki}, {\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{a}}_{ki}}) \end{matrix}

(68)

\begin{matrix} {\bar{b}}_{ki} (n + 1) = {\bar{b}}_{ki} (n) \\ + 2 μ E [c_{ki} \sum_{l = 1}^{L} w_{lk} e_{l} (n) f^{'} (a_{ki} x (n) + b_{ki})] \\ \approx {\bar{b}}_{ki} (n) + 2 μ {\bar{c}}_{ki} \sum_{l = 1}^{L} {\bar{w}}_{lk} (\sum_{p = 1}^{M}, h_{lp} E (g_{p} (x (n)) f^{'} ({\bar{a}}_{ki} x (n) \\ + {\bar{b}}_{ki})) - \sum_{m = 1}^{M} {\bar{w}}_{lm} (\sum_{q = 1}^{N} {\bar{c}}_{mq} E (f ({\bar{a}}_{mq} x (n) \\ + {\bar{b}}_{mq}) f^{'} ({\bar{a}}_{ki} x (n) + {\bar{b}}_{ki})))) \\ = {\bar{b}}_{ki} (n) + 2 μ {\bar{c}}_{ki} \sum_{l = 1}^{L} {\bar{w}}_{lk} (\sum_{p = 1}^{M} h_{lp} \frac{\partial K_{p} ({\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{b}}_{ki}} \\ - \sum_{m, q}^{M, N} {\bar{w}}_{lm} {\bar{c}}_{mq} \frac{\partial F_{mk} ({\bar{a}}_{mq}, {\bar{b}}_{mq}, {\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{b}}_{ki}}) \end{matrix}

(69)

These equations hold for any nonlinearity. In the following, we will calculate them explicitly for the case study described in Section 2.3

Application to the case study:

Since the inputs are independent and Zero-mean, we have K_i (a_km,b_km)=0, k≠i , and (see Appendix I)

\begin{matrix} K_{i} (a_{im}, b_{im}) & = E (g_{i} (x_{i} (n)) f (a_{im} x (n) + b_{im})) \\ = \sqrt{\frac{2}{π}} \frac{α_{i} σ_{x}^{2}}{(1 + β_{i} σ_{x}^{2})} \frac{a_{im}}{\sqrt{σ_{x}^{2} (a_{im}^{2} + β_{i}) + 1}} \\ - \frac{1}{2} \sqrt{\frac{2}{π}} α_{i} σ_{x}^{2} b_{im}^{2} \frac{a_{im}}{{(1 + σ_{x}^{2} (β_{i} + a_{im}^{2}))}^{\frac{3}{2}}} \end{matrix}

(70)

In the other hand we have: $F_{lk} (a_{li}, b_{li}, a_{km}, b_{km}) = 0, l \neq k$ , and (see Appendix I)

\begin{matrix} F_{kk} (a_{ki}, b_{ki}, a_{km}, b_{km}) \\ = E (f (a_{ki} x (n) + b_{ki}) f (a_{km} x (n) + b_{km})) \\ = \frac{2}{π} {sin}^{- 1} (\frac{a_{ki} a_{km} σ_{x}^{2}}{\sqrt{1 + σ_{x}^{2} a_{ki}^{2} + σ_{x}^{2} a_{km}^{2} + σ_{x}^{4} a_{ki}^{2} a_{km}^{2}}}) \\ - \frac{1}{π} b_{ki}^{2} \frac{σ_{x}^{2} a_{ki} a_{km}}{(1 + σ_{x}^{2} a_{ki}^{2}) \sqrt{1 + σ_{x}^{2} (a_{ki}^{2} + a_{km}^{2})}} \\ - \frac{1}{π} b_{km}^{2} \frac{σ_{x}^{2} a_{ki} a_{km}}{(1 + σ_{x}^{2} a_{km}^{2}) \sqrt{1 + σ_{x}^{2} (a_{ki}^{2} + a_{km}^{2})}} \\ + b_{ki} b_{km} \frac{2}{π} \frac{1}{\sqrt{1 + σ_{x}^{2} (a_{ki}^{2} + a_{km}^{2})}} \end{matrix}

(71)

Inserting these expressions in equations (66)-(69), we obtain:

\begin{matrix} {\bar{w}}_{jk} (n + 1) & = {\bar{w}}_{jk} (n) + 2 μ [h_{jk} \sum_{m = 1}^{N} c_{km} K_{k} ({\bar{a}}_{km}, {\bar{b}}_{km}) \\ - {\bar{w}}_{jk} \sum_{m, i}^{N} {\bar{c}}_{ki} {\bar{c}}_{km} F_{kk} ({\bar{a}}_{ki}, {\bar{b}}_{ki}, {\bar{a}}_{km}, {\bar{b}}_{km})] \end{matrix}

(72)

\begin{matrix} {\bar{c}}_{ki} (n + 1) & = {\bar{c}}_{ki} (n) + 2 μ \sum_{l = 1}^{L} {\bar{w}}_{lk} (h_{lk} K_{k} ({\bar{a}}_{ki}, {\bar{b}}_{ki}) \\ - {\bar{w}}_{lk} (\sum_{q = 1}^{N}, {\bar{c}}_{kq} F_{kk} ({\bar{a}}_{kq}, {\bar{b}}_{kq}, {\bar{a}}_{ki}, {\bar{b}}_{ki}))) \end{matrix}

(73)

\begin{matrix} {\bar{a}}_{ki} (n + 1) & = {\bar{a}}_{ki} (n) + 2 μ {\bar{c}}_{ki} \sum_{l = 1}^{L} {\bar{w}}_{lk} (h_{lk} \frac{\partial K_{k} ({\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{a}}_{ki}} \\ - {\bar{w}}_{lk} \sum_{q \neq k}^{N} {\bar{c}}_{kq} \frac{\partial F_{kk} ({\bar{a}}_{kq}, {\bar{b}}_{kq}, {\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{a}}_{ki}} \\ - \frac{1}{2} {\bar{w}}_{lk} {\bar{c}}_{kk} \frac{\partial F_{kk} ({\bar{a}}_{kk}, {\bar{b}}_{kk}, {\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{a}}_{kk}}) \end{matrix}

(74)

\begin{matrix} {\bar{b}}_{ki} (n + 1) & = {\bar{b}}_{ki} (n) + 2 μ {\bar{c}}_{ki} \sum_{l = 1}^{L} {\bar{w}}_{lk} (h_{lk} \frac{\partial K_{k} ({\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{b}}_{ki}} \\ - \sum_{q}^{N} {\bar{w}}_{lk} {\bar{c}}_{kq} \frac{\partial F_{kk} ({\bar{a}}_{kq}, {\bar{b}}_{kq}, {\bar{a}}_{ki}, {\bar{b}}_{ki})}{\partial {\bar{b}}_{ki}}) \end{matrix}

(75)

The explicit expressions of the different derivatives are detailed in Appendix II.

Stationary points

We obtain the stationary points by setting to 0 the expectations of the updating gradient terms in (64) and(4.5-7).

For W, we obtain:

\begin{matrix} W_{0} = H \times R_{g (X) N N (X)} R_{N N (X) N N (X)}^{- 1} = H \times U, w h e r e \\ U = R_{g (X) N N (X)} R_{N N (X) N N (X)}^{- 1} . \end{matrix}

(76)

For c_ki we obtain the equations:

\begin{matrix} \sum_{l = 1}^{L} w_{lk} (h_{lk} K_{k} (a_{ki}, b_{ki}) \\ - {\bar{w}}_{lk} (\sum_{q = 1}^{N}, c_{kq} F_{kk} (a_{kq}, b_{kq}, a_{ki}, b_{ki}))) = 0 \end{matrix}

(77)

For a_ki we obtain the equations:

\begin{matrix} \sum_{l = 1}^{L} w_{lk} (h_{lk} \frac{\partial K_{k} (a_{ki}, b_{ki})}{\partial a_{ki}} \\ - {\bar{w}}_{lk} \sum_{q \neq k}^{N} {\bar{c}}_{kq} \frac{\partial F_{kk} (a_{kq}, b_{kq}, a_{ki}, b_{ki})}{\partial a_{ki}} \\ - \frac{1}{2} w_{lk} c_{kk} \frac{\partial F_{kk} (a_{kk}, b_{kk}, a_{ki}, b_{ki})}{\partial a_{kk}}) = 0 \end{matrix}

(78)

For b_ki we obtain:

\begin{matrix} \sum_{l = 1}^{L} w_{lk} (h_{lk} \frac{\partial K_{k} (a_{ki}, b_{ki})}{\partial b_{ki}} \\ - \sum_{q}^{N} w_{lk} c_{kq} \frac{\partial F_{kk} (a_{kq}, b_{kq}, a_{ki}, b_{ki})}{\partial b_{ki}}) = 0 \end{matrix}

(79)

The above equations are nonlinear in the NN variables. They can be solved numerically, but they are very difficult to solve analytically.

Convergence of the algorithm to the stationary points:

It is always interesting to show whether an algorithm is capable of converging to its stationary points or not. In our case it is difficult to establish this, since the updating equations of the weights are nonlinear, except for W.

In the case where the NN weights are frozen we can establish the convergence condition for W.

In this case we have:

\begin{matrix} E (W (n + 1)) & = E (W (n)) (I - 2 μ R_{N N (X) N N (X)}) \\ + 2 μ H R_{g (X) N N (X)} \end{matrix}

(80)

The covariance matrices are fixed, since in this case the NN weights are frozen.

E(W(n)) can be expressed as a function of the initial condition as:

\begin{matrix} E (W (n)) & = W (0) {(I - 2 μ R_{N N (X) N N (X)})}^{n} + 2 μ H R_{g (X) N N (X)} \\ \times \sum_{p = 0}^{n - 1} {(I - 2 μ R_{N N (X) N N (X)})}^{p} \end{matrix}

(81)

If μ is sufficiently small, the steady state solution to (81) is:

W (\infty) = W_{0} = H R_{g (X) N N (X)} R_{N N (X) N N (X)}^{- 1} .

(82)

Hence, the mean weights converge to the stationary point W₀, and the stability condition on μ is:

0 < μ < \frac{1}{λ_{max}}

(83)

Where λ_max is the largest eigenvalue of the correlation matrix R_NN(X)NN(X).

Application to the case study:

For the case study it can be shown that U reduces to a diagonal matrix:

\begin{matrix} U & = R_{g (X) N N (X)} R_{N N (X) N N (X)}^{- 1} \\ = [\begin{matrix} γ_{1} & 0 \dots & 0 \\ 0 & γ_{2} \dots & 0 \\ 0 & 0 \dots & γ_{M} \end{matrix}], \end{matrix}

(84)

where:

γ_{k} = \frac{\sum_{m}^{N} K_{k} (a_{km}, b_{km})}{\sum_{m, i}^{N, M} c_{ki} c_{km} F_{kk} (a_{ki}, b_{ki}, a_{km}, b_{km})}

(85)

This indicates that weights w_jk are scaled versions of the unknown weights h_jk, the scale factor γ_k is the same for all the weights connecting the k^th NN block to the outputs and it depends only on block k weights. If the error is sufficiently small, the k^th block NN will approximate the k^th nonlinearity to the inverse of the scale factor.

MSE expression

The transient MSE is determined by:

\begin{matrix} E ({‖e (n)‖}^{2}) & = \sum_{j = 1}^{L} E [e_{j}^{2} (n)] \\ = E (‖ H g (X (n)) + N (n) \\ - W (n) N N (X (n)) ‖^{2}) \end{matrix}

(86)

where:

\begin{matrix} E (e_{j}^{2} (n)) & = E (‖ H_{j} g (X (n)) + N_{j} (n) \\ - W_{j} (n) N N (X (n)) ‖^{2}) \\ = E ((\sum_{i = 1}^{M} h_{ji} g_{i} (x_{i} (n)) + N_{j} (n) \\ - \sum_{k - 1}^{M} w_{jk} \times \sum_{i = 1}^{N} {c_{ki} f (a_{ki} x_{k} (n) + b_{ki}))}^{2}) \end{matrix}

(87)

Which can be expressed as:

\begin{matrix} E (e_{j}^{2} (n)) = {σ_{0}}^{2} + \sum_{i, l}^{M} h_{ji} h_{jl} E (g_{i} (x_{i} (n)) g_{l} (x_{l} (n))) \\ - 2 \sum_{i, k}^{M} h_{ji} (n) w_{jk} (\sum_{m = 1}^{N}, c_{km} E (g_{i} (x_{i} (n)) f (a_{km} x_{k} (n) + b_{km}))) \\ + \sum_{k, l}^{M} \sum_{i, m}^{N} w_{jl} w_{jk} c_{li} c_{ki} E (f (a_{li} x_{l} (n) + b_{li}) f (a_{km} x_{k} (n) + b_{km})) \end{matrix}

(88)

Let G_il = E(g_i(x_i (n))gl(x_i (n))). Using the notations of Section 4.1, we have:

\begin{matrix} E (e_{j}^{2} (n)) & = {σ_{0}}^{2} + \sum_{i, l}^{M} h_{ji} h_{jl} G_{il} \\ - 2 \sum_{i, k}^{M} h_{ji} w_{jk} (\sum_{m = 1}^{N}, c_{km} K_{i} (a_{km}, b_{km})) \\ + \sum_{k, l}^{M} \sum_{i, m}^{N} w_{jl} w_{jk} c_{li} c_{km} F_{lk} (a_{li}, b_{li}, a_{km}, b_{km}) \end{matrix}

(89)

Application to the case study:

It can be easily shown that:

\begin{matrix} E (e_{j}^{2} (n)) & = {σ_{0}}^{2} + \sum_{i}^{M} h_{ji}^{2} G_{ii} \\ - 2 \sum_{k}^{M} h_{jk} w_{jk} (\sum_{m = 1}^{N}, c_{km} K_{k} (a_{km}, b_{km})) \\ + \sum_{k}^{M} w_{jk}^{2} \sum_{i, m}^{N} c_{ki}^{2} F_{kk} (a_{ki}, b_{ki}, a_{km}, b_{km}) . \end{matrix}

(90)

The 1^st term of $E (e_{j}^{2} (n))$ represents the noise power, the 2^nd term is the signal power of the j^th MIMO output, the 3^rd term is the sum of the individual contributions of the neurons weighed by W and H weights, the 4^th term represents the sum of the coupling terms between neurons inside the same block weighed by W. Note that since the inputs are Zero-mean and independent, there are no coupling terms between neurons in different blocks (as in Eq. (89)).

The total MSE is then expressed as:

\begin{matrix} E ({‖e (n)‖}^{2}) & = L {σ_{0}}^{2} + \sum_{j, i}^{M} h_{ji}^{2} G_{ii} \\ - 2 \sum_{j, k}^{M} h_{jk} w_{jk} (\sum_{m = 1}^{N}, c_{km} K_{k} (a_{km}, b_{km})) \\ + \sum_{j, k}^{M} w_{jk}^{2} \sum_{i, m}^{N} c_{ki}^{2} F_{kk} (a_{ki}, b_{ki}, a_{km}, b_{km}) . \end{matrix}

(91)

Case of frozen NN weights:

It is interesting to see the behavior of the MSE in the case where the NN weights are frozen.

In this case we have:

\begin{matrix} ζ_{0} & = E ({‖e_{W_{0}} (n)‖}^{2}) \\ = L σ_{0}^{2} + E ({‖H g (X (n)) - W_{0} N N (X (n))‖}^{2}) . \end{matrix}

(92)

Here the minimum MSE depends on the noise floor and on the NN approximation error of the nonlinearities. It is clear from this equation and from Section 3.2 that, if the NN blocks ideally identify the nonlinearities (to within scale factors), then ζ₀ reduces to the noise floor.

The MSE can be written as a function of ζ₀ as:

\begin{matrix} E ({‖e (n)‖}^{2}) & = E ({‖e_{W_{0}} (n) - (W (n) - W_{0}) N N (X (n))]‖}^{2}) \\ = E ({‖e_{W_{0}} (n) - V (n) X (n)‖}^{2}) \end{matrix}

(93)

The steady state MSE is in this case:

E ({‖e (\infty)‖}^{2}) = ζ_{0} + \sum_{j = 1}^{L} t r (R_{N N (X) N N (X)} K_{V_{j} V_{j}} (\infty)) .

(94)

The misadjustment can be derived similarly as in Sections 3.1 and 3.2. We obtain a similar equation as (53), by replacing R_ZZ by R_NN(X)NN(X). The equation can not be simplified further.

It is interesting to notice, however, that if the NN blocks perfectly identify the nonlinearities and if the conditions above equation (60) are fulfilled, then:

\begin{matrix} Δ (\infty) & = \sum_{j = 1}^{L} t r (R_{N N (X) N N (X)} K_{V_{j} V_{j}} (\infty)) \\ = μ \frac{σ_{0}^{2} σ_{g}^{2} M L}{1 - μ σ_{g}^{2} (M + 2)} \end{matrix}

(95)

Simulation examples

In this section we present some simulation results which are applied to the case study described in Section 2.3. In these simulations, we have considered a 2 × 2 MIMO system (i.e., M = L = 2). For the parameterized nonlinearities we have chosen α₁=α₂=1, β₁=1, β₂=2. Unless otherwise specified, the inputs are uncorrelated Zero-mean white Gaussian processes with σ_xi= 1. In the simulations, the unknown combining matrix was fixed and was taken as $H = [\begin{matrix} 1 & 0.3 \\ 0.3 & 1 \end{matrix}]$ . For example, in a MIMO communication system, H can be seen as the propagation matrix between 2 transmitting antennas and 2 receiving antennas.

Linear adaptation

For the linear adaptation case (Section 3.1), the adaptive system is composed of a 2x2 matrix W. For the noise we have taken σ₀ = 0.001. The mean weight recursions and the MSE transient behaviors (Figures 6,7) have been estimated over 20 Monte Carlo (MC) simulations and compared to the theoretical derivations (Equations (19) and (41)). This chosen number of MC simulations shows excellent fit between the Theory and MC estimations which confirms the validity of the different assumptions made. A larger number of MC simulations allows a better smoothing of the curves, but the conclusions remain the same.

Matrix W converges to a scaled version of H: $W_{0} = [\begin{matrix} 0.3536 & 0.0577 \\ 0.1061 & 0.1925 \end{matrix}] = H U, U = [\begin{matrix} 0.3536 & 0.0000 \\ 0.0000 & 0.1925 \end{matrix}] .$ Note the typical behavior of the LMS algorithm: A time constant controls the transient part of the learning curve and the mean weight curve. This is fundamentally different from the full NN system learning which is governed by several time constants and presents plateau regions (Section 5.2). It should be noted that the steady state MSE is high because of the error caused by the fact that the nonlinearities are not approximated (actually they are modeled by the identity function) (Equation (25)).

MSE surface for the full NN algorithm

We move now to the study of the full NN algorithm. In this simulation, we have taken N = 3 neurons in each of the two NN blocks. The learning rate was fixed to μ =0.0045. Figure 8 shows the MSE surface (i.e., Eq. (90), with no time dependence) as a function of w₁₁ and w₁₂ (the other parameters were fixed). It is clear that the MSE is quadratic in w₁₁ and w₂₁. It presents a single global minimum (as shown in Equations (76 and 84)).

Figure 9 (resp. Figure 10) shows the MSE surface as a function of a₁₁ and c₁₁ (resp. a₁₁ and a₁₂) (the other parameters were fixed). It can be noted the flat areas (plateau regions) around the minima of the MSE surface. This explains the slow evolution of the NN weights when the algorithm gets close to its convergence point.

The MSE evolution during the learning process (Equation (90)) has been compared to 20 MC estimations (Figure 11a): The theory shows very good fit with the simulation results. In the Figure, we notice that the MSE presents several phases (each phase is controlled by a time constant) which end by a plateau phase where the MSE decreases very slowly. This is a typical behavior of the backpropagation algorithm[1] which is fundamentally different from that of the linear adaptation scheme (Figure 6). Here the MSE error is much smaller. This is expected, since here the additional MSE error due to the nonlinearities (Eq. 92) is highly reduced because our NN blocks have correctly identified the unknown nonlinearities (Figure 12,13). Here we are in a situation close to that of Section 3.2 (Equations 51-52).

Figure 11b shows the MSE evolution during the learning process for different values of the noise variance σ₀. It can be seen that, as the noise variance decreases, the MSE decreases. However, below a certain value of σ₀ (here σ₀=0.0005), the MSE curves are almost identical. This is because in this case, the weight misadjustment error (for the linear part) and the nonlinear approximation error (of the nonlinear memoryless part) are much higher than the error caused by the presence of noise (see Eqs. 92-93).

Figure 11c investigates the influence of the learning rate μ. It can be seen that as μ increases (up to μ=0.002), the algorithm is faster and the MSE is lower at the end of the simulation time. However, for μ>0.002, as μ increases, the algorithm is faster at the beginning of the learning process, but the MSE is higher at the end of the simulation time. This is due to the misadjustment error which is higher for higher μ (see, e.g., Eq. 95).

Mean weight transient behavior for the full NN algorithm

Here we keep the system described in Section 5.2. The mean weight recursions for the linear combiner W and the two NN blocks are shown in Figure 14,15,16 for both theory and MC estimations. The theoretical and estimated curves are indistinguishable. This confirms the validity of the different assumptions made in Sections 4.1 and 4.2.

Notice that, in Figure 14, W weights have a fast evolution at the beginning of the learning process (with values approaching H×U(n) where U is a diagonal matrix). They then evolve slowly till the end of the learning process. The slow evolution is justified by the plateau regions presented by the MSE surface. At the end of this simulation, matrix U was close to a diagonal matrix: $U_{Sim} = [\begin{matrix} 1.2702 & 0.0001 \\ 0.0003 & 1.0946 \end{matrix}],$ (and $U_{Theory} = [\begin{matrix} 1.270 & 0 \\ 0 & 1.095 \end{matrix}]$ ). This result is expected since the inputs are uncorrelated (Equations 84-85).

Figure 12,13 show that functions g₁ (x) and g₂ (x) have been correctly identified by the corresponding NN blocks (the NN functions are normalized by the scaling factors γ₁=1/1.2702, γ₁=1/1.0946, respectively).

Impact of correlated inputs

In the simulations below we study the impact of correlated inputs. The input signal vector is chosen here as a 2D Gaussian process with covariance matrix of the form $R_{XX} = [\begin{matrix} 1 & ρ \\ ρ & 1 \end{matrix}] .$ The number of neurons in each NN block was taken as N = 5 neurons. The learning rate μ=0.0075. We have run several simulations for different values of the cross correlation. Note that (ρ=0) corresponds to independent inputs, and (ρ=1) corresponds to the same input (i.e., x₁=x₂). The values of matrix U and the MSE after n=210⁵ iterations are shown in Table 1. It can be seen from Table 1 that, in practice, matrix U remains very close to a diagonal matrix, even for high correlation between inputs. This indicates that the system is capable of correctly identifying the nonlinearities even when the inputs are highly correlated. The identification performances for the cases (ρ=0.6) and ( ρ=0.99) are illustrated in Figures 17,18 and19,20, respectively. As expected, the MSE increases as the correlation between inputs increases (Table 1). When ρ=1 (i.e., the two inputs are the same) the system is capable of correctly identifying the overall MIMO input–output transfer function. However, in this case, it is not capable of separating the nonlinearities (as U is not diagonal). The reason is that in this case, the system is seen by the learning algorithm as a 1x2 SIMO system which has several equivalent structures. Figure 21 shows an example of two equivalent structures. Therefore, the adaptive system is structurally not able to separate the nonlinearities. It is worth to note that for the case (ρ=0.999), the inputs look like noisy versions of each other (i.e., this is equivalent to a 1x2 system identification problem with noisy inputs). Thus, the MSE for the case (ρ=0.999 ) is larger than the MSE for the case (ρ=1).

Table 1 Effect of correlated inputs

Full size table

Conclusion and future work

The paper provides a statistical analysis of NN modeling and identification of a class of nonlinear MIMO systems. The study investigates the MSE error, mean weight behavior, stationary points, misadjustment error, and stability conditions. The unknown system is composed of a set of single-input memoryless nonlinearities followed by a combining matrix. The NN model is composed of a set of single-input memoryless NN blocks followed by an adaptive linear combiner. The paper is supported with simulation results which show good agreement between the theoretical recursions and MC simulations. Future work will focus on 3 research directions. The first will explore the theoretical findings in order to express the effect of the number of neurons on the transient and steady state behavior of the algorithm. The second research axis will investigate the case where matrix H is time-varying and/or with memory (this may have applications, for example, in adaptive control of nonlinear dynamical MIMO systems). Finally, we will study the algorithm behavior and performance for specific inputs (such as space-time coded signals used in wireless communications and their impact on the system capacity).

Appendix I

1)
Calculation of F _kk Let x ₁ and x ₂ be two zero-mean Gaussian variables such that $σ_{x_{1}}^{2} = σ_{x_{2}}^{2} = σ_{x}^{2} a n d E (x_{1} x_{2}) = ρ$ Therefore, $F_{kk} (a_{ki}, b_{ki}, a_{km}, b_{km}) = E {(f (a_{ki} x_{k} (n) + b_{ki}) f (a_{km} x_{k} (n) + b_{km}))}_{ρ = σ_{x}^{2}} .$ Using Price’s theorem we have: $E [\frac{\partial^{2} f (a_{ki} x_{1} + b_{ki}) f (a_{km} x_{2} + b_{km})}{\partial x_{1} \partial x_{2}}] = \frac{\partial E [f (a_{ki} x_{1} + b_{ki}) f (w_{km} x_{2} + b_{km})]}{\partial ρ} .$ Let $U (ρ) = E [\frac{\partial^{2} f (a_{ki} x_{1} + b_{ki}) f (w_{km} x_{2} + b_{km})}{\partial x_{1} \partial x_{2}}],$ Then: $E {[f (a_{ki} x_{1} + b_{ki}) f (a_{km} x_{2} + b_{km})]}_{ρ = σ_{x}^{2}} - E {[f (a_{ki} x_{1} + b_{ki}) f (a_{km} x_{2} + b_{km})]}_{ρ = 0} = \int_{0}^{σ_{x}^{2}} U (ρ) d ρ .$

Thus, using the un-correlation criteria between x₁ and x₂ for ρ=0 , we have: $E {[f (a_{ki} x_{1} + b_{ki}) f (a_{km} x_{2} + b_{km})]}_{ρ = 0} = E [f (a_{ki} x_{1} + b_{ki})] E [f (a_{km} x_{2} + b_{km})] .$ Thus: $F (a_{ki}, b_{ki}, a_{km}, b_{km}) = E [f (a_{ki} x_{1} + b_{ki})] E [f (a_{km} x_{2} + b_{km})] + \int_{0}^{σ_{x}^{2}} U (ρ) d ρ .$ We have:

$\begin{matrix} U (ρ) = E [\frac{\partial^{2} f (a_{ki} x_{1} + b_{ki}) f (a_{km} x_{2} + b_{km})}{\partial x_{1} \partial x_{2}}] \\ = \frac{2}{π} a_{ki} a_{km} \frac{1}{2 π {|R|}^{\frac{1}{2}}} \int_{- \infty}^{+ \infty} \int_{- \infty}^{+ \infty} e^{- \frac{1}{2} {(a_{ki} x_{1} + b_{ki})}^{2}} e^{- \frac{1}{2} {(a_{km} x_{2} + b_{km})}^{2}} \\ e^{- \frac{1}{2} X^{t} R^{- 1} X} d x_{1} d x_{2} \end{matrix}$

where $X = {[x_{1} x_{2}]}^{t} a n d R = [\begin{matrix} σ_{x}^{2} & ρ \\ ρ & σ_{x}^{2} \end{matrix}] .$ Combining the terms in the exponentials and completing the squares, the integrals can be calculated:

$\begin{matrix} U (ρ) = \frac{2}{π} \frac{a_{ki} a_{km}}{\sqrt{1 + σ_{x}^{2} a_{ki}^{2} + σ_{x}^{2} a_{km}^{2} + (σ_{x}^{4} - ρ^{2}) a_{ki}^{2} a_{km}^{2}}} \times \\ exp [\frac{1}{2} [- b_{ki}^{2} - b_{km}^{2} + \frac{1}{a_{ki}^{2} + \frac{σ_{x}^{2}}{σ_{x}^{2} - ρ^{2}}} [b_{ki}^{2} a_{ki}^{2} + \\ \frac{{(b_{ki} a_{ki} (a_{ki}^{2} + \frac{σ_{x}^{2}}{σ_{x}^{2} - ρ^{2}}) + b_{km} a_{km} \frac{ρ}{σ_{x}^{2} - ρ^{2}})}^{2}}{(a_{ki}^{2} + \frac{σ_{x}^{2}}{σ_{x}^{2} - ρ^{2}}) (a_{km}^{2} + \frac{σ_{x}^{2}}{σ_{x}^{2} - ρ^{2}}) a_{km}^{2} - \frac{ρ^{2}}{σ_{x}^{2} - ρ^{2}}}]]] . \end{matrix}$

Note that in the biasless case (i.e. all the bias terms are set to 0) this expression reduces to:
$U (ρ) = \frac{2}{π} \frac{a_{ki} a_{km}}{\sqrt{1 + σ_{x}^{2} a_{ki}^{2} + σ_{x}^{2} a_{km}^{2} + (σ_{x}^{4} - ρ^{2}) a_{ki}^{2} a_{km}^{2}}} .$

The integral is then simple to calculate:
$\int_{0}^{σ_{x}^{2}} U (ρ) d ρ = \frac{2}{π} {sin}^{- 1} (\frac{a_{ki} a_{km} σ_{x}^{2}}{\sqrt{1 + σ_{x}^{2} a_{ki}^{2} + σ_{x}^{2} a_{km}^{2} + σ_{x}^{4} a_{ki}^{2} a_{km}^{2}}})$

In the other hand, since $E [f (w_{k} x_{1})] = E [f (w_{i} x_{2})] = 0$ , then: $F (a_{ki}, a_{km}, 0, 0) = \frac{2}{π} {sin}^{- 1} (\frac{a_{ki} a_{km} σ_{x}^{2}}{\sqrt{1 + σ_{x}^{2} a_{ki}^{2} + σ_{x}^{2} a_{km}^{2} + σ_{x}^{4} a_{ki}^{2} a_{km}^{2}}}) .$ When the bias terms are not set to 0, a Taylor series expansion on the bias terms can be used in order to avoid the calculation of the integral.
2)
Calculation of K
$\begin{matrix} K_{k} & (a_{km}, b_{km}) = E [g_{k} (x_{k}) f (a_{km} x_{k} + a_{km})] \\ = \frac{1}{\sqrt{2 π}} \frac{1}{σ_{x}} \int_{- \infty}^{+ \infty} α_{k} x e^{- \frac{β_{k} x^{2}}{2}} e^{- \frac{x^{2}}{2 σ_{x}^{2}}} \int_{0}^{a_{km} x + b_{km}} e^{- \frac{u^{2}}{2}} d u d x . \end{matrix}$

The inside integral can be eliminated by integrating by parts on variable x.The integral is then evaluated by combining the terms in the exponentials and completing the squares. This yields:
$\begin{matrix} K_{k} & (a_{km}, b_{km}) = \sqrt{\frac{2}{π}} \frac{α_{k}}{\frac{1}{σ_{x}^{2}} + β_{k}} \frac{a_{km}}{\sqrt{σ_{x}^{2} (a_{km}^{2} + β_{k}) + 1}} \\ \times exp (\frac{- b_{km}^{2}}{2} (1 - \frac{σ_{x}^{2}}{1 + σ_{x}^{2} (β_{k} + a_{km}^{2})})) . \end{matrix}$

Again, a Taylor series expansion can be used to simplify this expression.Note that in the biasless case we have:
$K_{k} (a_{km}, 0) = \sqrt{\frac{2}{π}} \frac{α}{\frac{1}{σ_{x}^{2}} + β_{k}} \frac{a_{km}}{\sqrt{σ_{x}^{2} (a_{km}^{2} + β_{k}) + 1}}$

Appendix II

The derivatives needed to compute the different recursions are expressed as follows:

\begin{matrix} \frac{\partial K_{k} (a_{km}, 0)}{\partial a_{km}} = \sqrt{\frac{2}{π}} \frac{α σ_{x}^{2}}{{(σ_{x}^{2} (a_{km}^{2} + β_{k}) + 1)}^{\frac{3}{2}}}, \frac{\partial K_{k} (a_{km}, b_{km})}{\partial a_{km}} = \frac{\partial K_{k} (a_{km}, 0)}{\partial a_{km}} - \sqrt{\frac{2}{π}} α_{k} σ_{x}^{2} \frac{b_{km}^{2}}{2} \frac{1 + σ_{x}^{2} β_{k} - 2 σ_{x}^{2} a_{km}^{2}}{{(1 + σ_{x}^{2} (β_{k} + a_{km}^{2}))}^{\frac{5}{2}}} \\ \frac{\partial F_{kk} (a_{ki}, 0, a_{ki}, 0)}{\partial a_{ki}} = \frac{4}{π} \frac{σ_{x}^{2} a_{ki}}{(σ_{x}^{2} a_{ki}^{2} + 1) \sqrt{1 + 2 σ_{x}^{2} a_{ki}^{2}}}, \\ \frac{\partial F_{kk} (a_{ki}, a_{ki}, b_{ki}, b_{ki})}{\partial a_{ki}} = \frac{\partial F (a_{ki}, 0, a_{ki}, 0)}{\partial a_{ki}} - \frac{2}{π} b_{ki}^{2} σ_{x}^{2} a_{ki} \frac{5 σ_{x}^{2} a_{ki}^{2} + 3}{{(1 + σ_{x}^{2} a_{ki}^{2})}^{2} {(1 + 2 σ_{x}^{2} a_{ki}^{2})}^{\frac{3}{2}}} \\ \frac{\partial F (a_{ki}, 0, a_{km}, 0)}{\partial a_{ki}} = \frac{2}{π} \frac{σ_{x}^{2} a_{km}}{(σ_{x}^{2} a_{ki}^{2} + 1) \sqrt{1 + σ_{x}^{2} (a_{ki}^{2} + a_{km}^{2})}} \\ \frac{\partial F (a_{ki}, a_{km}, b_{ki}, b_{km})}{\partial a_{ki}} = \frac{\partial F (a_{ki}, a_{ki}, 0, 0)}{\partial a_{ki}} - \frac{1}{π} b_{ki}^{2} \frac{σ_{x}^{2} a_{km}}{(1 + σ_{x}^{2} a_{ki}^{2}) \sqrt{1 + σ_{x}^{2} (a_{km}^{2} + a_{ki}^{2})}} (\frac{1 + σ_{x}^{2} a_{km}^{2}}{1 + σ_{x}^{2} (a_{km}^{2} + a_{ki}^{2})} - \frac{2 σ_{x}^{2} a_{ki}^{2}}{1 + σ_{x}^{2} a_{ki}^{2}}) \\ - \frac{1}{π} a_{km}^{2} \frac{σ_{x}^{2} a_{km}}{{(1 + σ_{x}^{2} (a_{km}^{2} + a_{ki}^{2}))}^{\frac{3}{2}}} - b_{ki} b_{km} \frac{2}{π} \frac{σ_{x}^{2} a_{ki}}{{(1 + σ_{x}^{2} (a_{km}^{2} + a_{ki}^{2}))}^{\frac{3}{2}}} \\ \frac{\partial F (a_{ki}, a_{km}, b_{ki}, b_{km})}{\partial b_{km}} = - \frac{2}{π} b_{ki} \frac{σ_{x}^{2} a_{ki} a_{km}}{(1 + σ_{x}^{2} a_{ki}^{2}) \sqrt{1 + σ_{x}^{2} (a_{km}^{2} + a_{ki}^{2})}} - b_{km} \frac{2}{π} \frac{1}{\sqrt{1 + σ_{x}^{2} (a_{km}^{2} + a_{ki}^{2})}} \end{matrix}

Appendix III

\begin{matrix} K_{V_{j} V_{j}^{t}} (n + 1) & = K_{V_{j}^{t} V_{j}^{t}} (n) - 2 μ R_{XX} K_{V_{j} V_{j}} (n) \\ - 2 μ K_{V_{j} V_{j}} (n) R_{XX} \\ + 2 μ E [e_{W_{0} j} (n) X V_{j}^{t} (I - 2 μ X X^{t})] \\ + 2 μ E {[e_{W_{0} j} (n) X V_{j}^{t} (I - 2 μ X X^{t})]}^{t} \\ + 4 μ^{2} E [X X^{t} K_{V_{j} V_{j}} X X^{t}] \\ + 4 μ^{2} E [e_{W_{0} j}^{2} (n) X X^{t}] \end{matrix}

(96)

The calculations are similar to[9] Appendix, the main difference is that here we deal with a multi-dimensional input. Therefore, we will follow the same methodology as in[9]. Following[10] the expectation before the last one can be calculated as:

\begin{matrix} E [X X^{t} K_{V_{j} V_{j}} (n) X X^{t}] & = t r (R_{XX} K_{V_{j} V_{j}} (n)) R_{XX} \\ + 2 R_{XX} K_{V_{j} V_{j}} (n) R_{XX} . \end{matrix}

(97)

The first expectation is expressed as:

\begin{matrix} E & [e_{W_{0 j}} (n) X V_{j}^{t} (I - μ X X^{t})] \\ = E [e_{W_{0 j}} (n) X V_{j}^{t}] - μ E [e_{W_{0 j}} (n) X V_{j}^{t} X X^{t}] \end{matrix}

(98)

The first term is Zero (orthogonality principle). The second term is:

\begin{matrix} E [e_{W_{0 j}} (n) X V_{j}^{t} X X^{t}] & = E [(H_{j}^{t} g_{j} (x_{j}) + N_{j} (n) \\ - W_{0 j}^{t} X (n)) X V_{j}^{t} X X^{t}] \end{matrix}

(99)

The middle term is Zero (Zero-mean white noise), the last expectation is:

\begin{matrix} E [W_{0 j}^{t} X (n) X V_{j}^{t} X X^{t}] & = E [X (n) X W_{0} V_{j}^{t} (n) X X^{t}] \\ \approx t r (R_{XX} W_{0} E (V_{j}^{t} (n))) R_{XX} \\ + 2 R_{XX} W_{0} E (V_{j}^{t} (n)) R_{XX} \end{matrix}

(100)

The first expectation in Eq. (99) $E [H_{j}^{t} g (X) X V_{j}^{t} (n) X X^{t}] \approx E [H_{j}^{t} g (X) X E (V_{j}^{t} (n)) X X^{t}]$ involves the nonlinearity g_j (x_j) and should be evaluated explicitly.The remaining expectation in (96) is: $E [e_{oj}^{2} (n) X X^{t}] .$

E [e_{oj}^{2} (n) X X^{t}] = E [{(H_{j}^{t} g (X) - W_{0 j}^{t} X)}^{2} X X^{t}] + σ_{0}^{2} R_{XX}

(101)

We have

\begin{matrix} E [{(W_{0 j}^{t} X)}^{2} X X^{t}] & = E [X X^{t} W_{0 j}^{t} W_{0 j}^{t} X X^{t}] \\ = t r (R_{XX} W_{0 j}^{t} W_{0 j}^{t}) R_{XX} \\ + 2 R_{XX} W_{0 j}^{t} W_{0 j}^{t} R_{XX} \end{matrix}

(102)

The first term in (102) is:

\begin{matrix} E [{(H_{j}^{t} g (X))}^{2} X X^{t}] & = E [H_{j} H_{j}^{t} g (X) g^{t} (X) X X^{t}] \\ = E [g (X) g^{t} (X) H_{j} H_{j}^{t} X X^{t}] \end{matrix}

(96)
is then expressed as:
$\begin{matrix} K_{V_{j} V_{j}^{t}} (n + 1) = K_{V_{j}^{t} V_{j}^{t}} (n) - 2 μ R_{XX} K_{V_{j} V_{j}} (n) \\ - 2 μ K_{V_{j} V_{j}} (n) R_{XX} + 4 μ^{2} (- E [H_{j}^{t} g (X) X E (V_{j}^{t} (n)) X X^{t}] \\ + t r (R_{XX} W_{0} E (V_{j}^{t} (n))) R_{XX} + R_{XX} W_{0} E (V_{j}^{t} (n)) R_{XX})) \\ + 4 μ^{2} (- E [H_{j}^{t} g (X) X E (V_{j}^{t} (n)) X X^{t}] \\ {+ t r (R_{XX} W_{0} E (V_{j}^{t} (n))) R_{XX} + R_{XX} W_{0} E (V_{j}^{t} (n)) R_{XX}))}^{t} \\ + 4 μ^{2} (t r (R_{XX} K_{V_{j} V_{j}} (n)) R_{XX} + 2 R_{XX} K_{V_{j} V_{j}} (n) R_{XX}) \\ + 4 μ^{2} (E [g (X) g^{t} (X) H_{j} H_{j}^{t} X X^{t}] + σ_{0}^{2} R_{XX} \\ - t r (R_{XX} W_{0} t W_{0}^{t}) R_{XX} - 2 R_{XX} W_{0 j}^{t} W_{0 j}^{t} R_{XX}) \end{matrix}$

Endnotes

This work has been supported by The Natural Sciences and Engineering Research Council of Canada (NSERC).

The time index of the weights has been omitted from the right hand side of the equations to make them easier to read.

References

Haykin S: Neural Networks: A Comprehensive Foundation. Prentice Hall; 1999.
MATH Google Scholar
Gao Y, Er M: Online adaptive fuzzy neural identification and control of a class of MIMO nonlinear systems. IEEE Trans. Fuzzy Systems 2003, : 462-476.
Ge SS, Wang C: Adaptive neural control of uncertain nonlinear MIMO systems. IEEE Trans. Neural Networks 2004, 15: 674-692. 10.1109/TNN.2004.826130
Article Google Scholar
Ibnkahla M, Al-Hinai A: Adaptive modeling and identification of nonlinear MIMO channels using neural networks. In Adaptive Signal Processing in Wireless Communications. Edited by: Ibnkahla M. CRC Press, Boca Raton, FL, USA; 2008.
Chapter Google Scholar
Narendra KS, Parthasarathy K: Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Networks 1990, 1: 4-27. 10.1109/72.80202
Article Google Scholar
Xu H, Ioannou P: Robust adaptive control for a class of MIMO nonlinear systems with guaranteed error bounds. IEEE Trans. Automatic Control 2003, : 718-742.
Amari S: Mathematical foundations of neurocomputing. Proc. IEEE September 1990, 78(9):1443-1463. 10.1109/5.58324
Article Google Scholar
Bershad NJ, Ibnkahla M, Castanié F: Statistical analysis of a two-layer back propagation algorithm used for modeling non linear memoryless channels: The single neuron case. IEEE Trans. Signal Processing March 1997, 45(3):747-756. 10.1109/78.558493
Article Google Scholar
Bershad N, Celka P, Vesin JM: Stochastic analysis of gradient adaptive identification of nonlinear systems with memory for Gaussian data and noisy input and output measurements. IEEE Trans. Signal Processing March 1999, 47(3):675-689. 10.1109/78.747775
Article MathSciNet MATH Google Scholar
Haykin S: Adaptive Filter Theory. Prentice Hall; 1996.
MATH Google Scholar
Ibnkahla M, Bershad NJ, Sombrin J, Castanié F: Neural network modeling and identification of non linear channels with memory: Algorithms, applications and analytic models. IEEE Trans. Signal Processing 1998, 46: 5.
Article MATH Google Scholar
Ibnkahla M: Statistical analysis of neural network modeling and identification of nonlinear channels with memory. IEEE Trans. Signal Processing 2002, : 1508-1517.
Shynk J, Roy S: Convergence properties and stationary points of a perceptron learning algorithm. Proc. IEEE Oct. 1990, 70: 1599-1604.
Article Google Scholar
Taylor JG: Mathematical Approaches to Neural Networks. North-Holland, Amsterdam; 1993.
MATH Google Scholar
White H: learning in artificial neural networks: A statistical perspective. Neural Comput. 1989, 1: 425-464. 10.1162/neco.1989.1.4.425
Article Google Scholar
Bershad N, Celka P, Vesin JM: Analysis of stochastic gradient tracking of time-varying polynomial Wiener systems. IEEE Trans. Signal Processing June 2000, 48(6):1676-1686. 10.1109/78.845925
Article MathSciNet MATH Google Scholar
Bolcskei H: MIMO systems: Principles and trends. In Signal Processing for. Mobile Communications Handbook. 12th edition. Edited by: Ibnkahla M. CRC Press; 2004.
Google Scholar
Javornik T, Kandus G, Plevel S, White G, Burr A: V-BLAST algorithm performance in non-linear channel. IEEE Computer as a Tool Conference September 2003, 1: 183-187.
Google Scholar
Poitau G, Kouki A: Impact of realistic amplification models on dynamic VBLAST optimization. Proc. Vehicular Technology Conference Spring 2004, : 894-897.
Woo S, Lee D, Kim K, Hur H, Lee C, Laskar J: Combined effects of RF impairments in the future IEEE 802.11n WLAN systems. Proc. IEEE Vehicular Technology Conference Spring May 2005, 2: 1346-1349.
Google Scholar
Yang S, Xi J, Mu X: Decision aided joint compensation of clipping noise and nonlinearity for MIMO-OFDM systems. Proc. IEEE International Symposium on Communications and Information Technology (ISCIT) 2005, 1: 725-728.
Google Scholar
Yang S, Xi J, Wang F, Mu X, Kobayashi H: Decision aided compensation of residual frequency offset for MIMO-OFDM systems with nonlinear channel. Proc. International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS) 2005, : 113-116.
Sulyman A, Ibnkahla M: Performance of MIMO systems with antenna selection over nonlinear fading channels. IEEE Journal in Selected Topics in Signal Processing April 2008, 2: 159-170.
Article Google Scholar
Sulyman A, Ibnkahla M: Performance analysis of nonlinearly amplified M-QAM signals in MIMO channels. European Transactions in Communications January 2008, 19(1):15-22.
Google Scholar
Pedro J, Maas S: A comparative overview of microwave and wireless power amplifier behavioral modeling approaches. IEEE Trans. Microwave Theory and Techniques 2005, 53(4):1150-1163.
Article Google Scholar
Saleh A: Frequency-independent and frequency–dependent nonlinear models of TWT amplifiers. IEEE Trans. Communications 1981, 29: 11. 10.1109/TCOM.1981.1094876
Article Google Scholar

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering Department, Queen’s University, Kingston, Ontario, K7L 3N6, Canada
Mohamed Ibnkahla

Authors

Mohamed Ibnkahla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Ibnkahla.

Additional information

Competing interests

The author declares that he has no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Authors’ original file for figure 15

Authors’ original file for figure 16

Authors’ original file for figure 17

Authors’ original file for figure 18

Authors’ original file for figure 19

Authors’ original file for figure 20

Authors’ original file for figure 21

Authors’ original file for figure 22

Authors’ original file for figure 23

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Ibnkahla, M. Stochastic analysis of neural network modeling and identification of nonlinear memoryless MIMO systems. EURASIP J. Adv. Signal Process. 2012, 179 (2012). https://doi.org/10.1186/1687-6180-2012-179

Download citation

Received: 06 December 2011
Accepted: 13 July 2012
Published: 21 August 2012
DOI: https://doi.org/10.1186/1687-6180-2012-179

Stochastic analysis of neural network modeling and identification of nonlinear memoryless MIMO systems

Abstract

Introduction

Problem statement

Nonlinear MIMO system

Neural Network identification structure and algorithm

Case study

Study of simplified structures: Linear adaptation

Linear adaptive system

Mean weight behavior and Wiener solution

Transient MSE and Wiener MSE

Adaptive W, the nonlinearities are frozen and known with scale factors

Mean weight behavior and stationary points

MSE

Case study

Study of the full structure

Mean weight transient behavior

Stationary points

MSE expression

Simulation examples

Linear adaptation

MSE surface for the full NN algorithm

Mean weight transient behavior for the full NN algorithm

Impact of correlated inputs

Conclusion and future work

Appendix I

Appendix II

Appendix III

Endnotes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords