In this section, we describe the IPA algorithm. As mentioned in Section 1, this algorithm first performs subspace separation, and then performs separation within each subspace. In this article, we only study the performance of IPA in the case where all the sources are phase-locked; in this situation, the inter-subspace separation can entirely be skipped, since there is only one subspace of locked sources. Therefore, we will not discuss here the part of IPA relating to subspace separation; the reader is referred to[14] for a discussion on that subject.
3.1 Preprocessing
3.1.1 Whitening
As happens in ICA and other source separation techniques, whitening is a useful preprocessing step for IPA. Whitening, or sphering, is a procedure that linearly transforms the data so that the transformed data have the identity as its covariance matrix; in particular, the whitened data are uncorrelated[7]. In ICA, there are clear reasons to pursue uncorrelatedness: independent data are also uncorrelated, and therefore whitening the data already fulfills one of the required conditions to find independent sources. If D denotes the diagonal matrix containing the eigenvalues of the covariance matrix of the data and V denotes an orthonormal matrix which has, in its columns, the corresponding eigenvectors, then whitening can be performed in a PCA-like manner by multiplying the data x(t) by a matrix B, where[7]
The whitened data are given by z(t) = B A s(t). Therefore, whitening merely transforms the original source separation problem with mixing matrix A into a new problem with mixing matrix B A. The advantage is that B A is an orthogonal mixing matrix, and its estimation becomes easier[7].
The above reasoning is not valid for the separation of phase-locked sources. However, under rather general assumptions, satisfied by the data studied here, it can be shown that whitening places a relatively low upper bound on the condition number of the equivalent mixing matrix (see[28] and references therein). Therefore, we always whiten the mixture data before applying the procedures described in Section 3.2.
3.1.2 Number of sources
As will be seen below, IPA assumes knowledge of the number of sources, and also assumes that the mixing matrix is square: if this is not the case, a simple procedure can be used to detect the number of sources and to transform the data to obey these constraints. If the mixing process is noiseless and is given by x(t) = A s(t), where A has more rows than columns and has maximum rankf, the number of non-zero eigenvalues of the covariance matrix of x is N, where N is the number of sources (or equivalently, the number of columns of A). If the mixture is noisy with a low level of i.i.d. Gaussian additive noise, the former zero-valued eigenvalues now have small non-zero values, but detection of N is still easy to do by detecting how many eigenvalues are large relative to the plateau level of the small eigenvalues[7]g. After N is known, the data need only be multiplied by a matrix in a similar fashion to Equation (4), where D
′ is a smaller N × N diagonal matrix containing only the N largest eigenvalues in D and V
′ is a rectangular matrix containing only the N columns of V corresponding to those eigenvalues. The mixture to be separated now becomes
(5)
Since B
′
A is a square matrix and the number of sources is now given simply by the number of components of x
′, the problem now has a known number of sources and a square mixing matrix.
A remark should be made about complex-valued data. The above procedure is appropriate when both the mixing matrix and the sources are real-valued. If both the mixing matrix and the sources are complex-valued, Equation (4) still applies (V will now have complex values). However, in our case the sources and measurements are complex-valued (due of the Hilbert transform), but the mixing matrix is real. When this is the case, Equation (4) is not directly applicable. The above procedure must instead be applied not to the original data x(t), but to new data x
0
with twice as many time samples, given by for t = 1,…,T and for t = T + 1,…,2T, where and denote the real and imaginary parts of a complex number, respectively. The matrix B which results from applying Equation (4) to x
0
(or B
′ if appropriate) is then applied to the original data x as before, and the remainder of the procedure is similar[28].
3.2 Separation of phase-locked sources
The goal of the IPA algorithm is to separate a set of N fully phase-locked sources which have linearly been mixed. Since these sources have a maximal PLF with each other and the mixture components do not (as motivated in Section 2.4 above and proved in Section 3.3 below), we can unmix them by searching for projections that maximize the resulting PLFs. Specifically, this corresponds to finding a N × N matrix W such that the estimated sources, y(t) = W
T
x(t) = W
T
A s(t), have the highest possible PLFs.
The optimization problem that we shall solve is
(6)
where w
j
is the j th column of W. In the first term, we sum the squared PLFs between all pairs of sources. The second term penalizes unmixing matrices that are close to singular, and λ is a parameter controlling the relative weights of the two terms. This second term serves the purpose of preventing the algorithm from finding, e.g., solutions where two columns j and k of W are colinear, which trivially yields ϱ
jk
= 1 (a similar term is used in some ICA algorithms[7]). Each column of W is constrained to have unit norm to prevent trivial decreases of that term.
The optimization problem in Equation (6) is highly non-convex: the objective function is a sum of two terms, each of which is non-convex in the variable W. Furthermore, the unit norm constraint is also non-convex. Despite this, as we show below in Section 3.3, it is possible to characterize all the global maxima of this problem for the case λ=0 and to devise an optimization strategy taking advantage of that result.
The above optimization problem can be tackled through various maximization algorithms. Our choice was to use a gradient ascent algorithm with momentum and adaptive step sizes; after this gradient algorithm has run for 200 iterations, we use the BFGS algorithm implemented in MATLAB to improve the solution. The result of this optimization for the sources shown in Figure1 is shown in Figure2 for λ = 0.1, illustrating that IPA successfully recovers the original sources for this dataset.
3.3 Unicity of solution
In[14], we proved that a few mild assumptions on the sources, which are satisfied in the vast majority of real-world situations, suffice for a useful characterization of the global maxima of Problem (6): it turns out that there are infinitely many such maxima, and that they correspond either to correct solutions (i.e., the original sources up to permutation, scaling, and sign changes) or to singular matrices W. More specifically, we proved the following: assume that we have a set of complex-valued and linearly independent sources denoted by s(t), which have a PLF of 1 with one another. Consider also linear combinations of the sources of the form y(t) = Cs(t) where C is a square matrix of appropriate dimensions. Further assume that the following conditions hold: Then, the only linear combination y(t) = Cs(t) of the sources s(t) in which the PLF between any two components of y is 1 is y(t) = s(t), up to permutation, scaling, and sign changes[14].
-
1.
Neither s
j
(t) nor y
j
(t) can identically be zero, for all j.
-
2.
C is non-singular.
-
3.
The phase lag between any two sources is different from 0 or Π.
-
4.
The amplitudes of the sources, a
j
(t) = |s
j
(t)|, are linearly independent.
3.4 Comparison to ICA
The above result is simple, but some relevant remarks should be made. If the optimum is found using λ = 0 and the second assumption is not violated (or equivalently, det(C) = det(W)det(A) ≠ 0, which is equivalent to det(W) ≠ 0 if A is non-singular), then we can be certain that the correct solution has been found. However, if the optimization is made using λ = 0, there is a possibility that the algorithm will estimate a bad solution where, for example, some of the estimated sources are all equal to one another (in which case the PLFs between those estimated sources is trivially equal to 1). On the other hand, if we use λ ≠ 0 to guarantee that W is non-singular, the unicity result stated above cannot be applied to the complete objective function. We call “non-singular solutions” and “singular solutions” those in which det(W) ≠ 0 and det(W) = 0, respectively. The result expressed in Section 3.3 is thus equivalent to stating that “all non-singular global optima of Equation (6) with λ = 0 correspond to correct solutions”.
This contrasts strongly with ICA, where singular solutions are not an issue, because ICA algorithms attempt to find independent sources and one signal is never independent from itself[7]. In other words, singular solutions always yield poor values of the objective function of ICA algorithms. Here we are attempting to estimate phase-locked sources, and any signal is perfectly phase-locked with itself. Thus, one must always use λ≠0 in the objective function of Equation (6) when attempting to separate phase-locked sources.
We use a simple strategy to deal with this problem. We start by optimizing Equation (6) for a relatively large value of λ(λ = 0.4), and once convergence has been obtained, we use the result as the starting point for a new optimization, this time with λ = 0.2. The same process is repeated with the value of λ halved each time, until five such epochs have been run. The early optimization steps move the algorithm away from the singular solutions discussed above, whereas the final steps are done with a very low value of λ, where the above unicity conditions are approximately valid. As the following experimental results show, this strategy can successfully prevent singular solutions from being found, while making the influence of the second term of Equation (6) on the final result negligible.