 Research
 Open Access
 Published:
Orthogonal approach to independent component analysis using quaternionic factorization
EURASIP Journal on Advances in Signal Processing volume 2020, Article number: 39 (2020)
Abstract
Independent component analysis (ICA) is a popular technique for demixing multichannel data. The performance of a typical ICA algorithm strongly depends on the presence of additive noise, the actual distribution of source signals, and the estimated number of nonGaussian components. Often, a linear mixing model is assumed and source signals are extracted by data whitening followed by a sequence of plane (Jacobi) rotations. In this article, we develop a novel algorithm, based on the quaternionic factorization of rotation matrices and the NewtonRaphson iterative scheme. Unlike conventional rotational techniques such as the JADE algorithm, our method exploits 4×4 rotation matrices and uses approximate negentropy as a contrast function. Consequently, the proposed method can be adjusted to a given data distribution (e.g., superGaussians) by selecting a suitable nonlinear function that approximates the negentropy. Compared to the widely used, the symmetric FastICA algorithm, the proposed method does not require an orthogonalization step and is more accurate in the presence of multiple Gaussian sources.
Introduction
Blind source separation (BSS) problems occur in various engineering applications, including the multichannel speech enhancement, image restoration, analysis of electroencephalographic (EEG) signals, and telecommunications [1]. The BBS consists of recovering of unobservable source signals from their observed mixtures. Most often, the following linear model [2] of the observation vector is assumed:
where \(\mathbf {A} \in \mathbb {R}^{n \times d}\) is an unknown mixing matrix and s is a random vector of d unknown source signals. Please note that there exist other models that can deal, for example, with spherical noise [3], underdetermination, and mixture synchronization issues. In the underdetermined case, the number of source signals is greater than the number of mixtures. Obviously, it makes the BSS problem illposed. In such cases, a concept of sparsity is often employed for a better representation of source signals [4, 5]. To tackle the synchronization issues, a compressively sensed BSS (CSBSS) model [6] was proposed which combines a compressive sensing theory [7] with a linear mixing model.
In this work, we consider the conventional model (1) due to its simplicity, and unless otherwise stated, assume that n=d. Thus, from a mathematical point of view, we are looking for some matrix \(\mathbf {W} \in \mathbb {R}^{n \times n}\) such that
where Λ and P are scaling and permutation matrices, respectively. This means that the BSS problem can be solved only up to sign, scale, and permutation of the output signals [8].
Independent component analysis (ICA) is the most natural approach to solving the BBS problem. Namely, the ICA aims at transforming multivariate data into components that are as statistically independent from each other as possible. The ICA can be also viewed as an optimization method that explicitly maximizes some measure of statistical independence between the sources. Most frequently, it is based on the Central Limit Theorem and attempts to find directions in which some measure of nonGaussianity is maximized [9]. Theoretically, it is possible to extract all components as long as no more than one source is Gaussian and all of the nonGaussian sources are mutually independent. In other words, only nonGaussian components can be separated.
In fact, many ICA algorithms are known. For general overviews, see for example [1, 10–12]. The classical ICA methods include the InfoMax [13, 14], FastICA [9], and JADE [15]. In this study, we are interested in the socalled orthogonal approaches. Since independence implies uncorrelatedness, these methods directly constrain the unmixed components to be uncorrelated, i.e., for zeromean and unitvariance signals, it is assumed that E{yy^{T}}=I. This constraint is usually satisfied by whitening data before rotating them [9, 16, 17]:
where the matrix B is constrained to be orthogonal. The whitening transformation can be computed as follows:
where U is the matrix of eigenvectors of C_{xx}=E{xx^{T}}, and Σ is the diagonal matrix of the corresponding eigenvalues.
Among the orthogonal approaches, the FastICA method [9] and their variants [18] are probably the most popular techniques. They are based on maximizing some nonlinear contrast functions that approximate negentropy. It was shown in [8] that finding a demixing matrix W that minimizes the mutual information [13] is roughly equivalent to finding directions in which the negentropy is maximized. Therefore, such contrast functions can also be viewed as the measures of nonGaussianity. There are two basic variants of the FastICA method: a deflationbased and symmetric algorithm. In the case of the deflation approach, the independent components are estimated successively one by one and after every iteration step, the GramSchmidt orthogonalization is performed. The major drawback of this approach is that estimation errors of the first vectors are cumulated in the subsequent ones by the orthogonalization [19]. The symmetric variant of the FastICA algorithm [20] estimates the components in parallel. This consists in computing the oneunit updates for each component, followed by subsequent symmetric orthogonalization of the estimated demixing matrix after each iteration step.
Symmetric orthogonalization in its classical form involves calculating the square root of the inverse matrix in (4), which could be computationally expensive, for large n. Alternatively, simpler iterative methods can be used [9]. Nevertheless, the orthogonalization presents some difficulties, especially when the FastICA is implemented in hardware architectures [21]. Moreover, as we will show, the algorithms with symmetric orthogonalization could be more vulnerable to performance degradation, when the assumption regarding the maximum number of Gaussian sources is not met.
The ICA literature also contains examples of methods that do not require orthogonalization, for instance, Jacobi algorithms [2, 16, 22]. In these approaches, the matrix B is considered to be a product of orthogonal matrices, specifically the Jacobi rotations. The ICA is carried out by optimizing a contrast function for each pair of components that corresponds to a given rotation plane. Thus, by sweeping the local optimization over all possible pairs, the global independent sources can be extracted. The most important advantage of the Jacobi algorithms is that the local optimization can be solved analytically if the contrast function is given as a 4thorder polynomial (e.g., kurtosis). An example is the JADE technique [15] which consists in jointly approximately diagonalizing a set of eigenmatrices. The eigenmatrices are estimated by 4thorder moments [23]. In fact, this is a variant of the Jacobi algorithm, because the diagonalization is performed using the Jacobi rotations. Unfortunately, the use of the 4thorder moments as a contrast function may be discouraged due to poor asymptotic efficiency for superGaussians and lack of robustness to outliers [24]. Also, the computational complexity of the Jacobilike algorithms can be relatively high, since the number of required data rotations increases quadratically with n. Theoretically, the number can be reduced by replacing the conventional Jacobi rotations with higherdimensional rotations. For instance, 4×4 orthogonal matrices that represent rotations in R^{4} can be used. It has been shown in [25] that such a matrix can be uniquely described by only two unit quaternions.
In our preliminary work [26], we proposed a novel fourchannel ICA algorithm that uses quaternionbased factorizations of 4×4 orthogonal matrices and exploits a contrast function based on negentropy approximation. A solution based on the gradient method with the L2norm constraint was developed. In this article, we propose to use the NewtonRaphson iterative scheme for solving the optimization task that results in a faster convergence rate as compared to the previous solution. Moreover, we extended our algorithm for the case of n≥4 channels, where n is even, by breaking ndimensional problem up into a set of fourdimensional subproblems. For this purpose, a new sweep pattern for datadriven Jacobilike algorithms is developed. Also, a more detailed comparative evaluation is performed that involves not only symmetric but also the deflationbased FastICA method and the JADE algorithm.
This paper is organized as follows. Section 2 describes quaternionic factorizations of 4×4 orthogonal matrices. In Section 3, a novel ICA algorithm is presented, and the solution based on the NewtonRaphson iterative scheme is derived. Also, a new sweep pattern for datadriven algorithms is developed. Section 4 investigates the performance of the proposed method via numerical simulations. Finally, the conclusions are given in Section 5.
Quaternions and rotations in \(\mathbb {R}^{4}\)
Most frequently a quaternion \(Q \in \mathbb {H}\) is represented in the rectangular form:
where i,j,k denote imaginary units. The real part of Q is q_{0} and the pure quaternion part is iq_{1}+jq_{2}+kq_{3}. The multiplication of quaternions is determined by the following rules:
It is associative and distributes over vector addition, but it is not commutative. Similarly as for ordinary complex numbers, the conjugate of Q is given by \(\bar {Q} = q_{0}  \boldsymbol {i}q_{1}  \boldsymbol {j}q_{2}  \boldsymbol {k}q_{3}\), and the norm (modulus), is defined as
Since any quaternion Q can also be represented by the vector q=[q_{0} q_{1} q_{2} q_{3}]^{T}, a set \(\mathbb {H}\) can be identified with the \(\mathbb {R}^{4}\) vector space.
Let {P,Q} be a pair of unit quaternions, i.e., P=Q=1, corresponding to the vectors \(\mathbf {p}, \mathbf {q} \in \mathbb {R}^{4}\), and W be a quaternion corresponding to an arbitrary vector \(\mathbf {w} \in \mathbb {R}^{4}\). Consider a real linear transformation from \(\mathbb {H}\) to \(\mathbb {H}\) that maps W to \(PW\bar {Q}\). It can be shown that this transformation can be uniquely represented by a product of the 4×4 orthogonal matrices [25]:
In particular
It means that for an arbitrary 4×4 orthogonal matrix R, there exist a unique pair of unit quaternions P and Q such that R=M^{+}(p)M^{−}(q). The proof of this little known theorem can be found in [27]. It should be noted that this product is commutative, and it is satisfied by the negative quaternions −P,−Q as well. In fact, the pairs {P,Q}, and {−P,−Q} map to same rotation of \(\mathbb {R}^{4}\). The possibility of using two parameters to describe a rotation is a special feature of the group of rotations in four dimensions.
Methods
In this section, we introduce a recently proposed fourchannel ICA method [26] and develop a new solution to an optimization problem based on the NewtonRaphson iterative scheme. We also propose a new ICA algorithm for n≥4 channels and give some illustrative examples of possible applications of this approach.
Jacobibased ICA estimation framework
Our method strictly follows the Jacobibased estimation framework. Namely, the independent components are estimated by recursively applying the socalled sweep operations:
where R_{k} is an orthogonal matrix and k is a sweep number. In the conventional Jacobibased approaches [2, 16, 22], the matrix R_{k} is the product of the rotation matrices:
where
represents Jacobi rotation by the θ angle in the plane determined by the p and q axes. Please note that the matrix (14) is also known as the Givens matrix and the corresponding Givens rotations are no different from Jacobi rotations [28]. By using the factorization (13), an ndimensional ICA problem can be decomposed into sequence of n(n−1)/2 onedimensional subproblems. Each subproblem consists in searching for the angle that maximizes some contrast function [16] in the corresponding rotation plane. The data are rotated and the process is repeated cyclically, until all rotation angles converge to zeros.
Fourchannel algorithm using quaternionic factorization
The key idea of the proposed method is to replace the Jacobi rotations with their fourdimensional quaternionbased counterparts. For simplicity, suppose that n=4. Then, six ordinary Jacobi rotations can be replaced with only two quaternionbased rotations as follows:
Thus, in each sweep, two vectors p_{k}, \(\mathbf {q}_{k} \in \mathbb {R}^{4}\) have to be estimated that, for example, maximize some measure of nonGaussianity of the rotated data. We propose to optimize these parameters in two consecutive steps in a similar way as the optimal rotation angles are determined for the Jacobi rotations. Firstly, we look for the vector q_{k} and rotate the data using the matrix M^{−}(q_{k}). Then, we look for p_{k} and rotate the result of the first step by using M^{+}(p_{k}). These steps have to be repeated until both rotation matrices are approximately equal to the identity matrices.
It was shown in [29] that measures of nonGaussianity based on some nonlinear functions are often considerably more robust than cumulantbased approximations. Moreover, by properly choosing the nonlinear function, we can adjust the ICA algorithm to work for practically any distribution of source signals. Oneunit contrast function for measuring nonGaussianity of any random variable y in the negentropybased ICA framework is given by [9]:
where v is a standardized Gaussian variable, and g is any nonlinear function. Examples of such functions can be found in [30]. The random variable y is assumed to be zeromean and unitvariance. Please note that the measure (16) is always nonnegative, and it equals to zero if y is Gaussian. As suggested in [9], a contrast function for several channels (units) can be obtained by simply maximizing the sum of the oneunit functions. Thus, by taking into account the unitnorm constraints (that enforces orthogonality), we defined the following optimization problems:
where the operator [A]_{i∗} denotes ith row of an arbitrary matrix A, \(\bar {g}(y) = g(y) E\{g(v)\}\) is a centered version of the nonlinear function g(y), and
are the rotated random vectors for k≥1. We assume that the expectations in (17), (18) exist and are finite. Therefore, they can be approximated by sums, and the vector x in (19) can be replaced with an observation matrix X of size 4×m consisting of m mixture vectors stacked in columnwise order. The vectors (19), (20) can be replaced with transformed data matrices \(\mathbf {Y}_{k}^{}, \mathbf {Y}_{k}^{+} \in R^{4 \times m}\) as well. Thus, by using the matrixvector notation, we can redefine the contrast functions in (17) and (18) as follows:
where the terms {M, Y_{k}} stand either for {M^{+}, \(\mathbf {Y}_{k}^{+}\}\) or {M^{−}, \(\mathbf {Y}_{k}^{}\}\). Please note that the problems (17) and (18) are similar, and the solution depends only on the current sweep data. Therefore, in the rest of the article, the subscripts +, −, and the swept number k are dropped to simplify the notation.
Solution based on the NewtonRaphson method
In accordance with the KuhnTucker conditions, any stationary point of (22) must satisfy the following equation:
where λ is the Lagrange multiplier, and
denotes the gradient of J(u). The partial derivatives can be computed as follows:
with
where ∘ denotes the Hadamard product, \(\bar {g}'\) is the firstorder derivative of the nonlinear function \(\bar {g}\), and \(\frac {\partial \mathbf {M}(\mathbf {u})}{\partial u_{i}}\) is ith partial derivative of a given quaternionic matrix. These matrices take the form of simple permutation matrices containing only zeros and ones (see the Appendix for details), so that, in practice, the multiplication by Y can be avoided.
One can try to solve Eq. 23 numerically by using, for example, the quasiNewton method in a similar way as in the FastICA approach [9]. However, in this case, it could be a more difficult task as it involves a derivation of the secondorder derivative (Hessian matrix) of the quadratic form (25) or finding its approximation. Even if an explicit expression for this matrix will be obtained, its computation could be very expensive.
In our previous work [26], we proposed a much simpler solution based on a gradient method with L_{2}norm constraint [31]. A geometric interpretation of this method is given in Fig. 1. In a such approach, a prescaled gradient of J(u), say
with μ>0 being an empirically chosen scaling factor, is projected onto hyperplane orthogonal to the unit vector u:
The set of all such vectors at a given point u is known as the tangent space of the constraint surface ∥u∥=1. In our case, the constraint surface is a unit fourdimensional hypersphere (also known as 3sphere). Then, the solution vector is updated by using the following rule:
which is equivalent to the rotating u, the current solution vector, in the direction of δ by some angle θ along the geodesic, i.e., curve on the constraint surface that connects two arbitrary points by an arc of shortest length.
It is a wellknown fact that the convergence rate of the gradient methods depends on the value of the scaling factor μ. In particular, these methods do not necessarily converge for large values of μ. On the other hand, for small values of μ, the algorithm may converge very slowly. In the work [26], we assumed that θ=∥δ∥, and the scaling factor μ was empirically set to some positive value that depends on a given nonlinear function \(\bar {g}(y)\). Here, we propose a more sophisticated approach that follows the NewtonRaphson iterative scheme and does not require the empirically determined parameters. Please note that instead of searching for optimal μ, we can equivalently set μ=1 and maximize the following cost function with respect to rotation angle θ:
where
and u is any vector that satisfies ∥u∥=1. Thus, each constrained optimization task (17), (18) can be simplified to onedimensional angle search on the unit circle. Now, the NewtonRaphson iterative procedure can be used for finding the optimal rotation angle:
where the first and second derivatives of (31) are given by
respectively, and
In the above expression, the operator ∘2 denotes elementwise (Hadamard) power, and \(\bar {g}''\) is the secondorder derivative of the nonlinear function \(\bar {g}\). Please note that we slightly modified the NewtonRaphson iteration (33) by using plus sign and taking the absolute value of the denominator. As suggested in [32], in this way, we can enforce the algorithm to find local maxima only.
In the case of the Jacobibased algorithms, the rotation hyperplanes could overlap each other. As a result, local optimizations may not preserve (and often destruct) the global optimality. Therefore, during the early sweeps, it may not be necessary to compute the rotation angles with the highest accuracy [25]. Moreover, each rotation is performed in a local coordinate system. For these reasons, in each sweep, we perform only one iteration of the NewtonRaphson method per each optimization task (17), (18), assuming that
Please note that a similar simplification can also be found in [33]. It can be easily verified that
Since \(\hat {\mathbf {M}}(0) = \mathbf {I}\), and \(\frac {\partial \hat {\mathbf {M}}(\theta =0) }{\partial \theta } = \mathbf {M} \left (\frac {\boldsymbol \delta }{\ \boldsymbol \delta \} \right)\) the expressions (36)  (38) can be simplified to:
and the optimal rotation angle can be estimated as follows:
where ε is some small positive constant introduced to prevent division by zero. Once the rotation angle is computed, the current data matrix is transformed as \(\mathbf {Y} \rightarrow \hat {\mathbf {M}}(\theta ^{*}) \mathbf {Y}\). Recalling the factorization (15), we see that in each sweep, up to two such transformations are needed. This procedure is repeated cyclically until the convergence is achieved, i.e., for both transformations, we have θ^{∗}<ζ, where ζ is sufficiently small positive constant.
Quaternionic ICA for n≥4 channels
An implementation of the fourchannel algorithm is rather straightforward as it comprises only two fourdimensional optimization subproblems per sweep. As shown in the previous section, these subproblems can be solved numerically using a quasiNewton method. An algorithm for n≥4 channels can be developed once we determine how to design a sweep pattern consisting of a complete set of fourdimensional subproblems. Intuitively, every component (row of the data matrix Y) should be part of any fourdimensional subproblem at least once during the sweep, so no component of the whole data set is exempt from direct participation in the local optimization preserving a drive towards the global optimum. This idea follows the conventional Jacobibased ICA estimation framework implemented as a sequence of plane rotations.
It turns out that the sweep patterns comprising fourdimensional subproblems are not difficult to construct and have already been studied in the context of structured eigenproblems [34, 35]. In these approaches, the location of the fourdimensional subproblem was controlled by the position of the element a_{ij} of arbitrary symmetric or skewsymmetric matrix \(\mathbf {A} \in \mathbb {R}^{2r \times 2r}\), chosen from the r×r upper diagonal block, where r≥2. This element uniquely determines 4×4 principal submatrix located in the rows and columns i,j,r+i and r+j as shown below:
where 1≤i<r and i<j≤r. Please note that such submatrices preserve the structure of the parent matrix A, i.e., symmetry or skewsymmetry. Furthermore, a sweep comprised entirely of fourdimensional subproblems can be generated by using any cyclic or quasicyclic sweep of the upperleft block of the matrix A. As reported in [35], such a sweep pattern is inherently parallelizable, is numerically stable, and shows asymptotic quadratic convergence.
In this work, we propose to extend this idea to a datadriven ICA algorithm for even n=2r. Instead, to diagonalize the set of eigenmatrices as in the JADE algorithm, we operate directly on the data by following the aforementioned sweep pattern. Namely, our fourdimensional subproblems are constructed by cyclically selecting the components of indexes i,j,r+i, and r+j. All steps of the proposed method are summarized as the pseudocode in Algorithm 1. In order to generate the pairs {i,j}, a typical twodimensional rowcycling ordering was used, but similarly to conventional Jacobibased methods, other arrangements are also possible [36].
A rigorous theoretical proof of convergence of the algorithm seems to be a challenging task and is out of the scope of this article. However, we observed empirically that the proposed method is asymptotically quadratically convergent. Please note that our algorithm requires only n(n−2)/4 quaternionbased rotations per sweep in place of n(n−1)/2 Jacobi rotations as in the JADE method. Obviously, it suggests a faster convergence of the proposed method, especially for large n. However, the rotations performed in the JADE method have the lower computational complexity as they are used to transform cumulant matrices only and optimal rotation angles are computed analytically. These issues will be examined in more detail in the experimental section.
Illustrative examples
Figure 2 demonstrates an example of the source recovery using the proposed method. As input data, a linear mixture of the eight basic signals has been used. The original and randomly mixed waveforms are depicted in Fig. 2a and b, respectively. Figure 2c presents the source signals recovered by using the proposed approach. In order to illustrate the convergence characteristics, in Fig. 2d, we also show the negentropy approximation computed after each sweep for all channels as well as for local fourdimensional subproblems. It can be seen that the method converges in less than six sweeps, and the source signals have been successfully recovered (up to permutation and sign).
We also considered a linear mixture of five images of natural scenes and Gaussian noise. The test images were selected from the BSDS300 database [37]. They were of size 481×321 pixels with an 8bit grayscale. The original images and mixtures are presented in Fig. 3a and b. Figure 3c and d present recovered images after k=2 and k=10 sweeps, respectively. These experiments show that the proposed algorithm may have good convergence properties in general settings, i.e., for n>4.
In the next example (see Fig. 4), we show that the proposed method can also be useful if the number of sources is smaller than four. Suppose that we have only three independent sources: two speech signals and some noise. The sources were mixed using a random matrix of size 4×3 so that there is a linear dependence between the rows of the matrix. In the case of conventional ICA approaches, the input signals are usually whitened using principal component analysis (PCA), and the dimensionality of the data is decreased at the same time. Only components that correspond to sufficiently large eigenvalues are taken into account. However, in the case of our method, we cannot reduce the dimensionality of data as easily as it is closely related to the size of the rotation matrices. Therefore, for n<4, the whitening matrix (4) must be properly regularized:
where Σ_{s} is a diagonal matrix of strictly positive eigenvalues of C_{xx} and U_{s} is a (possibly rectangular) matrix of corresponding eigenvectors. In other words, the whitening stage is performed in the signalsubspace only, while the minorcomponents are replaced by zero vectors. Although such solution is suboptimal from the computational point of view, it makes the implementation much easier. As can be seen in Fig. 4c, the original sources have been properly extracted, and the 4th component has been estimated as the zero vector. These results suggest that for n<4 the minorcomponents that correspond to nullspace can be treated as nonGaussian components. Indeed, the cost function (16) takes positive values for y being a zero signal. Therefore, by maximizing this function, we can recover the zero vectors as well as the original sources.
Results and discussion
Experimental setup
The proposed algorithm has been implemented in the MATLAB environment. For the purpose of empirical evaluation, we denote our method as the QICA (Quaternionic ICA). It has been compared to the symmetric FastICA, deflationbased FastICA [9], and JADE algorithm [15]. For FastICA implementations, we used freely available MATLAB packages from [38]. The implementation of the JADE algorithm is a part of the EEGLAB opensource package [39]. The recent implementation of the JADE algorithm in R language as well as some joint diagonalization functions is also described in [40].
In the case of the QICA and FastICA methods, the experiments were conducted for the three nonlinear functions that are frequently considered in the publications [9, 30]:
We use these names in order to match the literature, even though one can consider them to be misleading. Namely, these names are related to the firstorder derivatives, not to the functions themselves.
Theoretical stability analysis of the proposed method is out of the scope of this article. Therefore, the effectiveness of the QICA method was evaluated numerically. In particular, their convergence characteristics, source separation quality, and computation time were measured for several conditions involving synthetic and real data. Please note that it makes little sense to measure convergence speed in terms of iteration number or average computation time taken by an algorithm to converge as the reference methods (except the JADE) use different stop criteria. Therefore, in order to make the comparison meaningful, we decided to disable stop conditions in all methods and perform simulation for a fixed number of iterations measuring the separation quality in each iteration step. In this way, we were also able to measure the convergence characteristics of the evaluated algorithms.
Synthetic sources were generated for the following distributions: uniform, BPSK, Laplacian, and generalized Gaussians (GG(α) with parameter α=3, and 0.5) [30]. The length of each source signal was m=1000 samples. In our experiments, we also considered “speech” distribution that was represented by data frames randomly selected from real speech recordings. These recordings were sourced from the publicly available database that was used to develop ITUT Recommendation [41]. The database consists of short fewseconds sentences spoken by female and male speakers in various languages. Depending on the scenario, we have selected from 4 up to 12 sentences from the American English set. Original samples were available in the WAV 32bit floatingpoint format with sampling frequency 16 kHz, but for our purposes, they were downsampled to 8 kHz. Both synthetic and real sources were normalized to have zeromean and unit variance.
In a basic scenario, we linearly mixed n=4 sources of the same nonGaussian distribution. In order to evaluate the robustness of the compared algorithms to the Gaussian sources, we also considered the scenarios in which one or two sources were Gaussian and the others were nonGaussian but of the same distribution. Additionally, the experiments with n=8 and n=12 nonGaussian sources have been conducted to examine a global convergence of the QICA algorithm.
In all scenarios, the coefficients of the mixing matrix were generated from the uniform distribution. Illconditioned matrices (with a condition number greater than 1000) were excluded from evaluation. The performance indexes were averaged over 1000 random realizations of the sources and the mixing matrices.
Separation quality
The separation quality was estimated using signal to interference ratios (SIRs) [30, 42] and averaged over all nonGaussian components. In particular, for the kth source signal, the SIR index is given by:
where \(\mathbf {G} = \hat {\mathbf {W}} \mathbf {A} \mathbf {D}^{1/2}\) is the gain matrix with \(\hat {\mathbf {W}}\) being the estimated demixing matrix of the corresponding algorithm, and \(\mathbf {D} = \text {diag}\{\sigma _{1}^{2}, \sigma _{2}^{2},...,\sigma _{n}^{2}\}\) being the diagonal matrix containing the variances of the original signals. Since the BSS problem can be solved only up to permutation, the rows of the matrix \(\hat {\mathbf {W}}\) must be appropriately reordered which involves a pairing process [43]. For this purpose, we followed [44] and used a greedy algorithm. In practice, for some distributions, the offdiagonal elements of the gain matrix can be close to zeros. Therefore, it is often necessary to add some small positive constant to the denominator of (48) in order to prevent dividing by zero. In our experiments, the constant of value 10^{−9} was used.
The results presented in Tables 1, 2, and 3 refer to the scenarios with n=4 components containing from 0 up to 2 Gaussian sources. The maximum number of iterations was set to k=40 for all methods. In the case of the deflationbased FastICA, the components were estimated, one by one, enforcing 40 iterations per component. The best results obtained for each distribution were denoted using bold fonts. It can be observed that the performance strongly depends on the distribution of source signals and the number of Gaussian sources.
In the first scenario (Table 1—no Gaussians), we see that the QICA and symmetric FastICA with the POW3 function gives similar results as the JADE algorithm. It is rather not surprising since the cost function of the JADE algorithm is based on 4thorder cumulants and the POW3 function is closely related to the kurtosis.
In opposition to the JADE algorithm, the performance of the QICA and FastICA methods can be improved for some distributions by selecting appropriate nonlinear functions. It can be observed that if the TANH or GAUSS function is selected, we get higher SIR indexes for mixtures with the Laplacian and GG (0.5) distributions.
By analyzing Tables 1, 2, and 3, it is also clear that the performance of all methods degrades with increasing the number of the Gaussian sources. However, the QICA method appeared to be more robust in the presence of Gaussian sources. In the scenario where one Gaussian source was used (Table 2), our algorithm provides substantially better separation quality in almost all cases. This observation is even more evident in the last scenario (Table 3), where two Gaussians were used. A similar examination can be made for the JADE method, but the proposed method outperforms the JADE algorithm for the Laplacian and GG(0.5) distributions.
Please note, that if there are no Gaussian sources, then each local minimum or maximum of the oneunit contrast function corresponds to one independent component. Since the contrast function (16) is blind to Gaussian sources, in the presence of one or more Gaussians, the number of local optima is usually lower than n, so that symmetric orthogonalization, as in FastICA approach, may pose some stability issues. Most frequently, this problem is mitigated by reducing the dimensionality of the observed data, which in fact, involves the identification of nonGaussian subspace. The simplest methods for dimensionality reduction assume that information contained in the highdimensional data is essentially nonGaussian, and it is located on a lowerdimensional manifold. Furthermore, the Gaussian components are assumed to be one order of magnitude smaller than the nonGaussian components. In such a case, the dimensionality of the observed data can be reduced by using, for example, the PCA technique. However, this is often a too strict assumption on the data, and in general, the identification of the nonGaussian subspace is not a trivial task [45, 46].
Unlike the FastICA method, the QICA as well as the JADE approaches are inherently orthogonal and thus more robust to signal model mismatch, i.e., situation where there is more than one Gaussian source. It can be verified empirically that the data rotations are performed in the nonGaussian subspace only. Thus, these methods are able to successfully recover nonGaussian components even if the observation mixtures contain multiple Gaussians.
Convergence rate
Figure 5 presents average SIR indexes versus iteration number obtained for various data distributions and the number of components ranging from 4 to 12. In this experiment, no Gaussian sources were used. In the case of the QICA and symmetric FastICA algorithm, the GAUSS nonlinear function was selected. The deflationbased FastICA is not considered here, as it works differently by recovering the components one by one. We observed empirically that, for the first component, the accuracy of this method is similar to that of the other approaches. However, for the remaining components, we noticed a substantial performance drop due to deflationary orthogonalization. Thus, on average, this method performs rather poorly as compared to other algorithms (see Table 1).
In the case of n=4 channels, we do not observe any significant difference in the convergence rate of the QICA and symmetric FastICA algorithms for most distributions. However, the proposed method evidently wins with the FastICA, for n>4. The only exception is the BPSK distribution, where the QICA algorithm converges slowly as compared to the reference methods. Although the SIR decreases with the increase in the number of components, this degradation is similar for all approaches and can be compensated by increasing the number of samples [30].
It can be observed that the JADE algorithm needs the smallest number of sweeps to converge, but its accuracy is considerably lower than that of the QICA or symmetric FastICA for most distributions. This is because the JADE algorithm is based on 4thorder cumulants which are known to be less sensitive for some mixtures, e.g., Laplacians and superGaussians.
Since there is a difference in the number of rotations per sweep between the QICA and JADE algorithms, and both methods can be configured to use the same stop criteria, it may be interesting to measure the total number of rotations that are needed to achieve convergence. Hence, in the next experiment, the maximum number of sweeps was set to k_{max}=100 and the minimum rotation angle was ζ=0.01 for both methods. Figure 6 presents the total number of rotations and corresponding SIR index measured for Laplacian distribution versus the number of components. As indicated in Section 3.4, the QICA method requires approximately 2 times fewer rotations per sweep than the JADE algorithm, but it also needs more sweeps to converge. Therefore, the total number of rotations is similar for both methods.
Computational complexity
Convergence rate measured by the number of iterations or data rotations is not the only factor that determines the speed of the algorithms. In fact, the computational cost of the single iteration, as well as the total execution time, is different for each method. Figure 7 presents the average execution times measured for various data sizes and algorithms. Due to significant differences in the implementation of the compared algorithms, we analyze both the execution time of an initialization stage and the time of a single iteration.
It can be seen in Fig. 7a that the initialization stage of the QICA algorithm is identical to that of the FastICA approach. Since it only consists in whitening the input data, it is much simpler and less memory demanding than the initialization stage of the JADE method. In the case of the JADE method, apart from whitening the input data, it is also necessary to estimate n(n+1)/2 cumulant matrices each of size n×n. Obviously, these operations involve the processing of the whole data set of size n×m samples, but they are performed only once. Thus, the memory requirement for the JADE method is of order O(nm+n^{4}) as opposed to O(nm) for the QICA method.
It can be verified in Fig. 7b that the execution time of a single sweep of the QICA method increases quadratically with the number of components and linearly with the number of samples. So our method is characterized by the computational complexity of order O(n^{2}m). The computation time of a single iteration of the JADE algorithm also increases quadratically with the number of components, but it is constant with respect to the number of samples. This is because it consists in updating the cumulant matrices only, which does not involve any explicit transformation of the input signals. Thus, its complexity is of order O(n^{2}).
Although the QICA method requires fewer rotations per single sweep than the JADE algorithm (or the same number in total), these rotations are more complex, and similarly to the FastICA method, in every sweep all data samples must be transformed. As we see in Fig. 7b, for a relatively small number of samples and a large number of components, the average execution time of the single iteration of the proposed method is similar or even smaller than that of the JADE method. On the other hand, we also see that even for such a small number components as n=4 the single iteration time of the QICA method is about six times longer than that of the symmetric FastICA approach.
Although the contrasts for the QICA and FastICA are slightly different, both methods maximize the negentropy measure which involves the evaluation of the same nonlinear functions. Regardless of the implementation, both methods must, at some point, compute these functions and/or their derivatives. In the case of the QICA approach, for each rotation, three functions: \(\bar {g}\), \(\bar {g}'\), and \(\bar {g}''\) have to be evaluated twice per every fourdimensional subproblem. Thus, at each sweep, we have 3nm(n−2) function evaluations in total. Please note that these operations can be redundant when the rotation angles are approximately close to zero. The FastICA approach requires evaluating the functions g^{′} and g^{″} only once per component [9], so each iteration is related to 2nm evaluations in total.
The computational complexity of the proposed method can be reduced by taking into account the specific structure of the matrices M^{±}(u). In this case, the matrixvector multiplication can be implemented using only 8 real multiplications [47]. On the other hand, multiplication of the 4×4 matrices can be implemented more efficiently on specific hardware architectures. Our method consists in decomposing highdimensional BSS problem into fourdimensional subproblems. So it well suits modern GPUs and CPUs which support processing fourelement vectors. In particular, typical GPUs are able to read 128 bits in one cycle and one instruction. Hence, accessing fourelement vectordata can be faster than accessing four scalar elements (i.e., 32bit singleprecision floatingpoint numbers). Moreover, for some architectures (AMD GPUs), it is essential to organize data into fourelement vectors to maximize efficiency [48]. Please note that in the case of the CPUbased implementation it is also possible to use 128bit SSE/AVX instructions. Other ICA algorithms can be expressed in terms of operations on fourelement vectors, but this is unnatural, is difficult, and results in suboptimal implementations compared to our approach.
Conclusion
We have shown that the quaternionic factorizations of 4×4 orthogonal matrices can be successfully used in the Jacobi ICA estimation framework for even n≥4. Namely, a novel ICA method has been developed, which, on the one hand, follows the Jacobibased framework and, on the other hand, utilizes the contrast function based on negentropy approximation. We have shown that the optimization task can be reduced to the problem of a onedimensional search on a unit circle and solved using the NewtonRaphson method.
The proposed algorithm outperforms the FastICA approach in terms of separation quality if the observation mixtures contain one or more Gaussian sources. It also shows a faster convergence rate than FastICA approach for most PDFs. Unlike the JADE algorithm, the proposed method can be adjusted to match a given distribution by selecting an appropriate nonlinear function that approximates the negentropy. The developed method tends to be more computationally demanding than the state of art approaches, i.e., symmetric FastICA and JADE algorithm. However, we believe that the method can be optimized by taking into account specific structure of the quaternionbased rotation matrices.
Future works include rigorous stability analysis of the designed algorithm, further optimizations, and developing a GPUoptimized implementation, where the utilization of the fourdimensional rotations seems to be especially promising.
Appendix
The derivatives of the quaternionic rotation matrices with respect to the components of the quaternion are given by:
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Abbreviations
 ICA:

Independent component analysis
 JADE:

Joint approximate diagonalization of eigenmatrices
 BSS:

Blind signal separation
 EEG:

Electroencephalography
 PCA:

Principal component analysis
 QICA:

Quaternionic independent component analysis
 GG:

Generalized Gaussian
 SIR:

Signal to interference ratio
 GPU:

Graphics processing unit
 CPU:

Central processing unit
 SSE:

Streaming SIMD extensions
 AVX:

Advanced vector extensions
 PDF:

Probability density function
References
G. Naik, W. Wang, Blind Source Separation: Advances in Theory, Algorithms and Applications (Springer, Berlin, Heidelberg, 2014).
P. Comon, Independent component analysis, a new concept?. Signal Process.36(3), 287–314 (1994).
J. Virta, K. Nordhausen, Estimating the number of signals using principal component analysis. Stat.8(1), 231 (2019).
M. Zibulevsky, B. A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary. Neural Comput.13(4), 863–882 (2001).
Y. Li, A. Cichocki, S. Amari, Analysis of sparse representation and blind source separation. Neural Comput.16(6), 1193–1234 (2004).
M. Kleinsteuber, H. Shen, Blind source separation with compressively sensed linear mixtures. IEEE Signal Process. Lett.19(2), 107–110 (2012).
M. Rani, S. B. Dhok, R. B. Deshmukh, A systematic review of compressive sensing: concepts, implementations and applications. IEEE Access. 6:, 4875–4894 (2018).
A. Hyvärinen, E. Oja, Independent component analysis: algorithms and applications. Neural Netw.13(4), 411–430 (2000).
A. Hyvärinen, Fast and robust fixedpoint algorithms for independent component analysis. IEEE Trans. Neural Netw.10(3), 626–634 (1999).
P. Comon, C. Jutten (eds.), Handbook of Blind Source Separation. Independent Component Analysis and Applications (Academic Press, Oxford, 2010).
X. Yu, D. Hu, J. Xu, Blind Source Separation  Theory and Applications (Wiley, Singapore, 2014).
G. Chabriel, M. Kleinsteuber, E. Moreau, H. Shen, P. Tichavsky, A. Yeredor, Joint matrices decompositions and blind source separation: a survey of methods, identification, and applications. IEEE Signal Process. Mag.31(3), 34–43 (2014).
A. J. Bell, T. J. Sejnowski, An informationmaximization approach to blind separation and blind deconvolution. Neural Comput.7(6), 1129–1159 (1995).
T. W. Lee, M. Girolami, T. J. Sejnowski, Independent component analysis using an extended infomax algorithm for mixed subgaussian and supergaussian sources. Neural Comput.11(2), 417–441 (1999).
J. F. Cardoso, A. Souloumiac, Blind beamforming for nonGaussian signals. IEE Proc. F Radar Signal Process.140(6), 362–370 (1993).
J. F. Cardoso, Highorder contrasts for independent component analysis. Neural Comput.11(1), 157–192 (1999).
P. Ablin, J. Cardoso, A. Gramfort, in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Faster ICA under orthogonal constraint (IEEECalgary, 2018), pp. 4464–4468.
J. Miettinen, K. Nordhausen, S. Taskinen, fICA: FastICA algorithms and their improved variants. R J. 10:, 148–158 (2019).
K. Nordhausen, P. Ilmonen, A. Mandal, H. Oja, E. Ollila, in Proc. 19th European Signal Processing Conference (EUSIPCO). Deflationbased fastICA reloaded (IEEEBarcelona, 2011), pp. 1854–1858.
A. Hyvärinen, The fixedpoint algorithm and maximum likelihood estimation for independent component analysis. Neural Process. Lett.10(1), 1–5 (1999).
M. Plauth, F. Feinbube, P. Tröger, A. Polze, in Proc. 15th International Conference on Parallel and Distributed Computing, Applications and Technologies. Fast ICA on modern GPU architectures (IEEEHong Kong, 2014), pp. 69–75.
E. G. LearnedMiller, J. W. Fisher, ICA using spacings estimates of entropy. J. Mach. Learn. Res.4:, 1271–1295 (2003).
J. Miettinen, S. Taskinen, K. Nordhausen, H. Oja, Fourth moments and independent component analysis. Stat. Sci.30:, 372–390 (2015).
A. Hyvärinen, in Proc. IEEE Signal Processing Society Workshop. Neural Networks for Signal Processing VII. Oneunit contrast functions for independent component analysis: a statistical analysis (IEEEAmelia Island, 1997), pp. 388–397.
N. Mackey, Hamilton and Jacobi meet again: quaternions and the eigenvalue problem. SIAM J. Matrix Anal. Appl.16(2), 421–435 (1995).
A. Borowicz, in Proc. Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA). On using quaternionic rotations for indpendent component analysis (IEEEPoznań, Poland, 2018), pp. 114–119.
H. G. Baker, Quaternions and orthogonal 4x4 real matrices (1996). http://archive.gamedev.net/archive/reference/articles/article428.html. Accessed 09 Jan 2020.
G. H. Golub, C. F. Van Loan, Matrix Computations (Johns Hopkins University Press, USA, 2013).
A. Hyvärinen, in Proc. Conference on Advances in Neural Information Processing Systems 10. New approximations of differential entropy for independent component analysis and projection pursuit (MIT PressDenver, 1997), pp. 273–279.
P. Tichavsky, Z. Koldovsky, E. Oja, Performance analysis of the FastICA algorithm and CramérRao bounds for linear independent component analysis. IEEE Trans. Signal Process.54(4), 1189–1203 (2006).
S. C. Douglas, S. Amari, S. Y. Kung, On gradient adaptation with unitnorm constraints. IEEE Trans. Signal Process.48(6), 1843–1847 (2000).
W. Murray, NewtonType Methods, Wiley Encyclopedia of Operations Research and Management Science (Wiley, Hoboken, 2011).
W. Ouedraogo, A. Souloumiac, C. Jutten, in Proc. Latent Variable Analysis and Signal Separation (LVA/ICA). Nonnegative independent component analysis algorithm based on 2D Givens rotations and a Newton optimization (SpringerBerlin, Heidelberg, 2010), pp. 522–529.
H. Faßbender, D. S. Mackey, N. Mackey, Hamilton and Jacobi come full circle: Jacobi algorithms for structured Hamiltonian eigenproblems. Linear Algebra Appl.332334:, 37–80 (2001).
D. S. Mackey, N. Mackey, D. M. Dunlavy, Structure preserving algorithms for perplectic eigenproblems. ELA. Electron. J. Linear Algebra. 13:, 10–39 (2005).
M. Parfieniuk, in Proc. International Conference on Parallel Processing and Applied Mathematics (PPAM). A parallel factorization for generating orthogonal matrices (SpringerBialystok, Poland, 2019), pp. 567–578.
D. Martin, C. Fowlkes, D. Tal, J. Malik, in Proc. 8th Int’l Conf. Computer Vision, vol. 2. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics (IEEEVancouver, 2001), pp. 416–423.
H. Gävert, J. Hurri, J. Särelä, A. Hyvärinen, Matlab FastICA v 2.5 (2005). http://research.ics.aalto.fi/ica/fastica/code/dlcode.shtml. Accessed 09 Jan 2020.
A. Delorme, S. Makeig, EEGLAB: an open source toolbox for analysis of singletrial EEG dynamics including independent component analysis. J. Neurosci. Methods. 134:, 9–21 (2004).
J. Miettinen, K. Nordhausen, S. Taskinen, Blind source separation based on joint diagonalization in R: the packages JADE and BSSasymp. J. Stat. Softw.76(2), 1–31 (2017).
(International Telecommunication Union  Telecommunication Standardization Sector, Geneva, 1998). http://handle.itu.int/11.1002/1000/4412. Accessed 09 Jan 2020.
Z. Koldovský, P. Tichavsky, E. Oja, Efficient variant of algorithm FastICA for independent component analysis attaining the CramérRao lower bound. IEEE Trans. Neural Netw.17:, 1265–77 (2006).
P. Tichavsky, Z. Koldovský, Optimal pairing of signal components separated by blind techniques. IEEE Signal Process. Lett.11:, 119–122 (2004).
V. Zarzoso, P. Comon, Robust independent component analysis by iterative maximization of the kurtosis contrast with algebraic optimal step size. IEEE Trans. Neural Netw.21(2), 248–261 (2010).
G. Blanchard, M. Kawanabe, M. Sugiyama, V. Spokoiny, K. Müller, In search of nonGaussian components of a highdimensional distribution. J. Mach. Learn. Res.7:, 247–282 (2006).
H. Sasaki, G. Niu, M. Sugiyama, in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, vol 51. NonGaussian component analysis with logdensity gradient estimation (Proceedings of Machine Learning Research  PMLRCadiz, 2016), pp. 1177–1185. http://proceedings.mlr.press/v51/sasaki16.html.
T. D. Howel, J. C. Lafon, The complexity of quaternion product. Technical Report TR 75245, Cornell University, Department of Computer Science (1975).
L. Buatois, G. Caumon, B. Lévy, in High Performance Computing and Communications, Lecture Notes in Computer Science, 4782. Concurrent number cruncher: an efficient sparse linear solver on the GPU (SpringerBerlin, 2007), pp. 358–371.
Acknowledgements
The author would like to thank the anonymous reviewers for their insightful comments that helped to improve this article.
Funding
This work was supported by the Bialystok University of Technology under the Grant WZ/WIIIT/4/2020.
Author information
Authors and Affiliations
Contributions
AB is the sole author of this work and approved the final manuscript.
Corresponding author
Ethics declarations
Consent for publication
Not applicable.
Competing interests
The author declares that there are no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Borowicz, A. Orthogonal approach to independent component analysis using quaternionic factorization. EURASIP J. Adv. Signal Process. 2020, 39 (2020). https://doi.org/10.1186/s13634020006970
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634020006970
Keywords
 BSS
 ICA
 Jacobi rotations
 Negentropy
 Quaternions