An array of multivector consists of a collection of multivectors. Give M multivectors \(\{ U_1, U_2, \ldots , U_M \}\) in \(\varvec{G}({\mathbb {R}}^3)\), the \(M \times 1\) array collects them as follows:
$$\begin{aligned} \varvec{u} = \left[ \begin{array}{ccc} U_1 \\ \vdots \\ U_M \end{array} \right] =\left[ \begin{array}{ccc} u(1,0)+u(1,1)e_1+\cdots +u(1,7)I\\ \vdots \\ u(M,0)+u(M,1)e_1+\cdots +u(M,7)I \end{array} \right] . \end{aligned}$$
(3)
The reverse transpose operation, denoted by \((\cdot )^*\), is the extension of the reverse operation of multivector to arrays of multivectors. For example, the reverse transpose of the array (3) is \(\varvec{u}^* = \left[ {\widetilde{U}}_1 \; {\widetilde{U}}_2 \; \ldots \; {\widetilde{U}}_M \right]\).
Consider reference data D(k), which is a multivector, observed at time k that comes from the linear model
$$\begin{aligned} \begin{aligned} D(k)&= \varvec{u}^*(k)\varvec{w}^o + V(k)\\&= \sum _{i=1}^M{\widetilde{U}}(k+1-i)W_i^o + V(k), \end{aligned} \end{aligned}$$
(4)
where \(\varvec{w}^o=[ W_1^o\; W_2^o\; \ldots \; W_M^o]^{{\text{T}}}\) is an unknown \(M \times 1\) array of multivector to be estimated with \((\cdot )^{{\text{T}}}\) denotes transpose, V(k) accounts for measurement noise, U(k) denotes input signal observed at time k, and \(\varvec{u}(k) = [ U(k)\; U(k-1)\; \ldots \; U(k+1-M)]^{{\text{T}}}\). The model allows one to assign heterogeneous signals from different sources \(s_i(k)\), \(i=0,1,\ldots ,2^n-1\), to each entries of the multivector, e.g., \(U(k)=s_0(k) + s_1(k)e_1 +s_2(k)e_2 + s_3(k)e3+ s_4(k)e_{12}+s_5(k)e_{23}+ s_6(k)e_{31} + s_7(k)I\). For example, fusion and linear prediction of aircraft parameters can be assigned as follows: \(s_0(k)\) is angle of attack, \(s_1(k)\) is East-West wind, \(s_2(k)\) is North-South wind, \(s_3(k)\) is vertical wind, \(s_4(k)\) is roll, \(s_5(k)\) is yaw, \(s_6(k)\) is pitch, \(s_7(k)\) is dynamic pressure.
The squared Euclidean norm providing a measure of distance in LA is represented by the array product \(||\varvec{u}||^2 \triangleq \varvec{u}^*\varvec{u}\), which is a scalar. However, the result of the array product in GA is a multivector rather than a scalar. Therefore, we take the scalar part of the array product \(\langle \varvec{u}^*\varvec{u} \rangle\) as the distance measure of the array of multivectors.
3.1 GA affine projection algorithm
The GA-APA also follows the principle of minimal disturbance and the orthogonal affine subspace theory as the standard APA. In mathematical terms, the criterion for designing the affine projection filter can be formulated as an optimization problem subject to multiple constraints. We will minimize the scalar product of the change of the estimated weight array and its reverse transpose (the distance measure of the array space of \(\varvec{w}^o\)), which is defined as
$$\begin{aligned} \langle ||\delta _{\varvec{w}}||^2 \rangle = (\widehat{\varvec{w}}^*(k+1)-\widehat{\varvec{w}}^*(k))*(\widehat{\varvec{w}}(k+1)-\widehat{\varvec{w}}(k)), \end{aligned}$$
(5)
subject to the set of N constraints
$$\begin{aligned} D(k-n) = \varvec{u}^*(k-n)\widehat{\varvec{w}}(k+1) \; \text {for}\; n = 0,1,2,\ldots ,N-1, \end{aligned}$$
(6)
where N is smaller than or equal to the length M of the weight array. The number of constraints N can be viewed as the order of the affine projection algorithm.
We apply the method of Lagrange multipliers to solve this optimization problem. Combining formulas (5) and (6), then we get the following cost function
$$\begin{aligned} J(k) = \langle ||\delta _{\varvec{w}}||^2 \rangle + \sum _{n=0}^{N-1}\langle {\widetilde{E}}(n)\lambda _n\rangle , \end{aligned}$$
(7)
where \(E(n) = D(k-n)-\varvec{u}^*(k-n)\widehat{\varvec{w}}(k+1)\) and \(\lambda _n\) is a multivector. For convenience of presentation, we introduce the following definitions:
-
An \(M \times N\) matrix \(\varvec{U}(k)\) defined by
$$\begin{aligned} \varvec{U}(k) = [\varvec{u}(k)\;\varvec{u}(k-1) \cdots \varvec{u}(k-N+1)]. \end{aligned}$$
(8)
-
An \(N \times 1\) array \(\varvec{d}(k)\) defined by
$$\begin{aligned} \varvec{d}(k) = [D(k)\;D(k-1)\; \cdots \;D(k-N+1)]^{{\text{T}}}. \end{aligned}$$
(9)
-
An \(N \times 1\) array \(\varvec{\lambda }\) defined by
$$\begin{aligned} \varvec{\lambda } = [\lambda _0 \;\lambda _1 \; \cdots \;\lambda _{N-1}]^{{\text{T}}}. \end{aligned}$$
(10)
Then, the second term of the cost function (7) can be represent as
$$\begin{aligned} \begin{aligned} \sum _{n=0}^{N-1}\langle {\widetilde{E}}(n)\lambda _n\rangle&= \langle (\varvec{d}(k)-\varvec{U}^*(k)\widehat{\varvec{w}}(k+1))^*\varvec{\lambda }\rangle \\&= (\varvec{d}^*(k)-\widehat{\varvec{w}}^*(k+1)\varvec{U}(k))*\varvec{\lambda }. \end{aligned} \end{aligned}$$
(11)
Now, we will get the derivative of the cost function J(k) with respect to the weight array \(\varvec{w}(k+1)\) following the rules of GC. In GA, the differential operator \(\partial _{w}= \partial _{\widehat{\varvec{w}}(k+1)}\) has the algebra properties of a multivector in \(\varvec{G}({\mathbb {R}}^n)\). In other words, the gradient \(\partial _{w}J(k)\) can be calculated by the geometric product of the multivector-valued quantities \(\partial _{w}\) and J(k).
Any multivector \(A\in G({\mathbb {R}}^n)\) can be decomposed into blades [5, Eq. (3.20)] via
$$\begin{aligned} A = \sum _{i=0}^{2^n-1} e_i \langle e^iA\rangle = \sum _{i=0}^{2^n-1} e^i \langle e_iA\rangle = \sum _{i=0}^{2^n-1} e^iA^i, \end{aligned}$$
(12)
in which \(A^i\) is a scalar valued, and \(\{e_i\}\) and \(\{e^i\},\; i=0,\ldots , 2^n-1\) are two different bases of \(\varvec{G}({\mathbb {R}}^n)\). \(\{e^i\}\) is the reciprocal blade basis, which is an important analytical tool for differentiation in GA. The reciprocal blade basis can convert non-orthogonal to orthogonal vectors, vice versa. Since orthogonal elements cancel out mutually, the analytical procedure is simplified. Suffice to know that the following relation holds for reciprocal bases \(e_i \cdot e^j = \delta _i^j\), where \(\delta _i^j = 1\) for \(i=j\) and \(\delta _i^j = 0\) for \(i\ne j\) (Kronecker delta). In particular, applying (12) to \(\partial _w\) results in
$$\begin{aligned} \partial _w \triangleq \sum _{l=0}^{2^n-1} e^l\langle e_l\partial _w \rangle = \sum _{l=0}^{2^n-1} e^l\partial _{w,l}. \end{aligned}$$
(13)
The gradient \(\partial _wJ(k)\) is obtained by multiplying (13) and (7), yielding
$$\begin{aligned} \begin{aligned} \partial _wJ(k)&= \sum _{l=0}^{2^n-1} e^l\partial _{w,l}\left( \langle ||\delta _{\varvec{w}}||^2 \rangle + \langle (\varvec{d}^*(k)-\widehat{\varvec{w} }^*(k+1)\varvec{U}(k))\varvec{\lambda }\rangle \right) \\&= \sum _{l=0}^{2^n-1} e^l\left( \partial _{w,l}\langle \delta _{\varvec{w}}^*\delta _{\varvec{w}} \rangle + \partial _{w,l}\langle -\widehat{\varvec{w}}^*(k+1)\varvec{U}(k)\varvec{\lambda }\rangle \right) \\&= \sum _{l=0}^{2^n-1} e^l \left( \partial _{w,l}^1 + \partial _{w,l}^2 \right) , \end{aligned} \end{aligned}$$
(14)
in which \(\partial _{w,l}^1 = \partial _{w,l}\langle \delta _{\varvec{w}}^*\delta _{\varvec{w}} \rangle\) and \(\partial _{w,l}^2 = \partial _{w,l}\langle -\widehat{\varvec{w}}^*(k+1)\varvec{U}(k)\varvec{\lambda }\rangle\). As a matter of fact, arrays of multivectors can be decomposed into blades. Thus, employing (12) once again, we can rewrite \(\delta _w\) and \(\delta _w^*\) in term of their \(2^n\) blades as follows:
$$\begin{aligned} \delta _{\varvec{w}} = \sum _{p=0}^{2^n-1} e_p\delta _{w,p} \ \text {and} \ \delta _{\varvec{w}}^* = \sum _{q=0}^{2^n-1} {\widetilde{e}}_q\delta _{w,q}^{{\text{T}}}. \end{aligned}$$
(15)
Plugging (15) into \(\partial _{w,l}^1\), we have
$$\begin{aligned} \begin{aligned} \partial _{w,l}^1&= \partial _{w,l} \langle \sum _{p,q=0}^{2^n-1} {\widetilde{e}}_q\delta _{w,q}^{{\text{T}}}e_p\delta _{w,p}\rangle \\&= \sum _{p,q=0}^{2^n-1} \langle {\widetilde{e}}_qe_p \rangle \partial _{w,l}(\delta _{w,q}^{{\text{T}}}\delta _{w,p})\\&= \sum _{p,q=0}^{2^n-1} \langle {\widetilde{e}}_qe_p \rangle (\dot{\partial _{w,l}}\dot{\delta _{w,q}^{{\text{T}}}}\delta _{w,p} + \dot{\partial _{w,l}}\delta _{w,q}^{{\text{T}}}\dot{\delta _{w,p}})\\&= \sum _{p,q=0}^{2^n-1} \langle {\widetilde{e}}_qe_p \rangle (\delta _l^q\delta _{w,p} + \delta _l^p\delta _{w,q}). \end{aligned} \end{aligned}$$
(16)
Thus, the first term of the gradient (14) can be obtained by
$$\begin{aligned} \begin{aligned} \sum _{l=0}^{2^n-1} e^l\partial _{w,l}^1&= \sum _{l=0}^{2^n-1} e^l \sum _{p,q=0}^{2^n-1} \langle {\widetilde{e}}_qe_p \rangle (\delta _l^q\delta _{w,p} + \delta _l^p\delta _{w,q}) \\&= \sum _{l=0}^{2^n-1} e^l(\sum _{p=0}^{2^n-1}\langle {\widetilde{e}}_le_p \rangle \delta _{w,p} + \sum _{q=0}^{2^n-1}\langle {\widetilde{e}}_qe_l \rangle \delta _{w,q})\\&= \sum _{l=0}^{2^n-1} e^l (\langle {\widetilde{e}}_l\delta _w \rangle + \langle {\widetilde{\delta }}_we_l \rangle )\\&= 2{\widetilde{\delta }}_{\varvec{w}}. \end{aligned} \end{aligned}$$
(17)
Then, we calculate the second term of the formula (14) and get
$$\begin{aligned} \begin{aligned} \sum _{l=0}^{2^n-1} e^l\partial _{w,l}^2&= -\sum _{l=0}^{2^n-1} e^l \partial _{w,l}\langle \sum _{p=0}^{2^n-1}{\widetilde{e}}_p\widehat{\varvec{w}}_p^{{\text{T}}}(k+1) \varvec{U}(k) \varvec{\lambda } \rangle \\&= -\sum _{l=0}^{2^n-1} e^l \sum _{p=0}^{2^n-1} \langle {\widetilde{e}}_p\partial _{w,l}\widehat{\varvec{w}}_p^{{\text{T}}}(k+1) \varvec{U}(k)\varvec{\lambda } \rangle \\&= -\sum _{l=0}^{2^n-1} e^l \langle {\widetilde{e}}_l\varvec{U}(k)\varvec{\lambda } \rangle \\&= -\widetilde{(\varvec{U}(k)\varvec{\lambda })}. \end{aligned} \end{aligned}$$
(18)
Taking the results of (17) and (18), and setting the gradient (14) equal to zero, we get \(2{\widetilde{\delta }}_w = \widetilde{(\varvec{U}(k)\varvec{\lambda })}\). Taking the reverse of both sides of the equation yields
$$\begin{aligned} \widehat{\varvec{w}}(k+1)-\widehat{\varvec{w}}(k) = \frac{1}{2}\varvec{U}(k)\varvec{\lambda }. \end{aligned}$$
(19)
Next, we will eliminate the Lagrange vector \(\lambda\) from (19). Firstly, we use the definitions of (8) and (9) to rewrite (6) in the equivalent form
$$\begin{aligned} \varvec{d}(k) = \varvec{U}^*(k)\widehat{\varvec{w}}(k+1). \end{aligned}$$
(20)
Premultiplying both sides of (19) by \(\varvec{U}^*\) and then using (20) to eliminate the updated weight array \(\widehat{\varvec{w}}(k+1)\) yields
$$\begin{aligned} \varvec{d}(k) = \varvec{U}^*(k)\widehat{\varvec{w}}(k) + \frac{1}{2}\varvec{U}^*(k)\varvec{U}(k)\varvec{\lambda }. \end{aligned}$$
(21)
Based on the data available, the difference \(\varvec{e}(k)\) between \(\varvec{d}(k)\) and \(\varvec{U}^*(k)\widehat{\varvec{w}}(k)\) at the adaptation cycle k is a \(N \times 1\) error array denoted
$$\begin{aligned} \varvec{e}(k) = \varvec{d}(k) - \varvec{U}^*(k)\widehat{\varvec{w}}(k). \end{aligned}$$
(22)
Assuming the array product \(\varvec{U}^*(k)\varvec{U}(k)\) to be invertible [23] allows us to solve (21) for \(\varvec{\lambda }\), yielding
$$\begin{aligned} \varvec{\lambda } = 2 (\varvec{U}^*(k)\varvec{U}(k))^{-1}\varvec{e}(k). \end{aligned}$$
(23)
Substituting this solution into (19), we obtain the optimum change of the weight array
$$\begin{aligned} \widehat{\varvec{w}}(k+1) - \widehat{\varvec{w}}(k) = \varvec{U}(k)(\varvec{U}^*(k)\varvec{U}(k))^{-1}\varvec{e}(k). \end{aligned}$$
(24)
Finally, we introduce the step-size parameter \(\mu\) into (24), yielding
$$\begin{aligned} \widehat{\varvec{w}}(k+1) = \widehat{\varvec{w}}(k) + \mu \varvec{U}(k)(\varvec{U}^*(k)\varvec{U}(k))^{-1}\varvec{e}(k), \end{aligned}$$
(25)
which is the desired update formula of the GA-APA.
The algorithm is summarized in Algorithm 1. We can notice that GA-APA has the same format as standard APA adaptive filters. Since quaternion, complex numbers, and real numbers are subalgebras of geometric algebra, the Quaternion APA [24], the Complex APA [12], and the real-entries APA can be recovered by the GA-APA. In other words, GA-APA is a unified expression of the above algorithms.
Remark
APA is the same as NLMS in the LA domain when the order \(N=1\). But, the update equation of the first order GA-APA is different from GA-NLMS proposed in [22]. Specially, the update term of the first order GA-APA is \(\mu u(k)(u(k)^*u(k))^{-1}e(k)\) which is similar to the update term of the GA-NLMS \(\mu u(k)\langle u(k)^*u(k)\rangle ^{-1}e(k)\). We will compare them in the simulation section.
3.2 Stability of the GA-APA
The mismatch between \(w^o\) and \(\widehat{\varvec{w}}(k)\) is measured by weight-error array
$$\begin{aligned} \varvec{\epsilon }(k) = \varvec{w}^o - \widehat{\varvec{w}}(k). \end{aligned}$$
(26)
Thus, subtracting (25) from \(w^o\), we get
$$\begin{aligned} \varvec{\epsilon } (k+1) = \varvec{\epsilon } (k) -\mu \varvec{U}(k)(\varvec{U}^*(k)\varvec{U}(k))^{-1}\varvec{e}(k). \end{aligned}$$
(27)
We base the stability analysis of the GA-APA on the mean-square deviation \(y(k) = {\mathbb {E}} [\langle ||\varvec{\epsilon }(k)||^2 \rangle ]\), where \({\mathbb {E}}[\cdot ]\) accounts for expectation. Taking the distance measure of both sides of (27), rearranging terms, and taking expectations, then we get
$$\begin{aligned} \begin{aligned} y(k+1) - y(k) =&\mu ^2 {\mathbb {E}} [\langle || \varvec{U}(k)(\varvec{U}^*(k)\varvec{U}(k))^{-1}\varvec{e}(k)||^2 \rangle ] \\&- 2\mu {\mathbb {E}} [\varvec{\epsilon }^* (k)*\varvec{U}(k)(\varvec{U}^*(k)\varvec{U}(k))^{-1}\varvec{e}(k)]. \end{aligned} \end{aligned}$$
(28)
From the equation above, we see that the GA-APA algorithm is stable in the mean-square-error sense provided the mean-square deviation y(k) decreases with the increasing number of adaptation cycles k. Therefore, the step-size parameter \(\mu\) is bounded as follows:
$$\begin{aligned} 0< \mu < \frac{2{\mathbb {E}} [\varvec{\epsilon }^* (k)*\varvec{U}(k)(\varvec{U}^*(k)\varvec{U}(k))^{-1}\varvec{e}(k)]}{{\mathbb {E}} [\langle || \varvec{U}(k)(\varvec{U}^*(k)\varvec{U}(k))^{-1}\varvec{e}(k)||^2 \rangle ]}. \end{aligned}$$
(29)
3.3 Computational complexity analysis
As we can see from Algorithm 1, the main calculations are in Step 2 and Step 3. The number of real multiplications in Step 2 is \(NM\alpha ^2\), where \(\alpha =2^n\) represents the number of basis, N and M are the order of GA-APA and the length of \(\widehat{\varvec{w}}(k)\), respectively. The computational complexity of multivector matrix inversion is \(N^3\alpha ^2\beta\), where \(\beta\) represents the computational complexity of the inverse of a multivector. Therefore, the computation in Step 3 requires approximately \(\alpha ^2(N^3\beta +N^2(M+1)+NM)\) real multiplications. The total number of multiplications in GA-APA is \(\alpha ^2(N^3\beta +N^2(M+1)+2NM)\) per adaptation cycle.
3.4 Regularized GA-APA
Since a matrix inversion \((\varvec{u}^*\varvec{u})^{-1}\) is required within the GA-APA, ill-posed problems usually occur, especially under the condition of noisy observation data. To avoid this problem, we regularize the matrix that needs to be inverted. Then we get the update equation of the regularized GA-APA (R-GA-APA)
$$\begin{aligned} \widehat{\varvec{w}}(k+1) = \widehat{\varvec{w}}(k) + \mu \varvec{U}(k)(\varvec{U}^*(k)\varvec{U}(k) + \gamma \varvec{I})^{-1}\varvec{e}(k), \end{aligned}$$
(30)
where \(\gamma\) is the regularization parameter, and \(\varvec{I}\) is the \(N \times N\) identity matrix of real number.