Research  Open  Published:
Separation of phaselocked sources in pseudoreal MEG data
EURASIP Journal on Advances in Signal Processingvolume 2013, Article number: 32 (2013)
Abstract
This article addresses the blind separation of linear mixtures of synchronous signals (i.e., signals with locked phases), which is a relevant problem, e.g., in the analysis of electrophysiological signals of the brain such as the electroencephalogram and the magnetoencephalogram (MEG). Popular separation techniques such as independent component analysis are not adequate for phaselocked signals, because such signals have strong mutual dependency. Aiming at unmixing this class of signals, we have recently introduced the independent phase analysis (IPA) algorithm, which can be used to separate synchronous sources. Here, we apply IPA to pseudoreal MEG data. The results show that this algorithm is able to separate phaselocked MEG sources in situations where the phase jitter (i.e., the deviation from the perfectly synchronized case) is moderate. This represents a significant step towards performing phasebased source separation on real data.
1 Introduction
In recent years, the interest of the scientific community in synchrony has risen. This interest is both in its physical manifestations and in the development of a theory unifying and describing those manifestations in various systems such as laser beams, astrophysical objects, and brain neurons[1].
It is believed that synchrony plays a relevant role in the way different parts of the human brain interact. For example, when humans engage in a motor task, several brain regions oscillate coherently[2, 3]. Also, several pathologies such as autism, Alzheimer, and Parkinson are associated with a disruption in the synchronization profile of the brain, whereas epilepsy is associated with an anomalous increase in synchrony (see[4] for a review).
To perform inference on the synchrony of networks present in the brain or in other realworld systems, one must have access to the phase dynamics of the individual oscillators (which we will call “sources”). Unfortunately, in brain electrophysiological signals such as encephalograms (EEG) and magnetoencephalograms (MEG), and in other realworld situations, individual oscillator signals are not directly measurable, and one has only access to a superposition of the sources^{a}. In fact, EEG and MEG signals measured in one sensor contain components coming from several brain regions[5]. In this case, spurious synchrony may occur, as we will illustrate later.
The problem of undoing this superposition is called blind source separation (BSS). Typically, one assumes that the mixing is linear and instantaneous, which is a valid approximation in brain signals[6]. One must also make some assumptions on the sources, such as in independent component analysis (ICA) where the assumption is mutual statistical independence of the sources[7]. ICA has seen multiple applications in EEG and MEG processing (for recent applications see, e.g.,[8, 9]). Different BSS approaches use criteria other than statistical independence, such as nonnegativity of sources[10, 11] or timedependent frequency spectrum criteria[12, 13]. In our case, independence of the sources is not a valid assumption, because phaselocked sources are highly mutually dependent. Also, phaselocking is not equivalent to frequency coherence: in fact, two signals may have a severe overlap between their frequency spectra but still exhibit low or no phase synchrony at all[14]. In this article, we address the problem of how to separate such phaselocked sources using a phasespecific criterion.
Recently, we have presented a twostage algorithm called independent phase analysis (IPA) which performed very well in noiseless simulated data[15] and with moderate levels of added Gaussian white noise[14]. The separation algorithm we then proposed uses temporal decorrelation separation[16] as a first step, followed by the maximization of an objective function involving the phases of the estimated sources. In[14], we presented a “proofofconcept” of IPA, laying down the theoretical foundations of the algorithm and applying it to a toy dataset of manually generated data. However, in that article we were not concerned with the application of IPA to realworld data. In this article, we study the applicability of IPA to pseudoreal MEG data. These data are not yet meant to allow inference about the human brain; however, they are generated in such a way that both the sources and the mixing process mimic what actually happens in the human brain. The advantage of using such pseudoreal data is that the true solution is known, thus allowing a quantitative assessment of the performance of the algorithm. We also study the robustness of IPA to the case where the sources are not perfectly phaselocked. It should however be reinforced that the algorithm presented here makes no assumptions that are specific of brain signals, and should work in any situation where phaselocked sources are mixed approximately linearly and noise levels are low.
This article is organized as follows. In Section 2, we introduce the Hilbert transform. We also introduce there the phase locking factor (PLF), a measurement of synchrony which is central to the algorithm; finally, we show that synchrony is disrupted when the sources undergo a linear mixing. Section 3 describes the IPA algorithm in detail, including illustrations using a toy dataset. In Section 4, we explain how the pseudoreal MEG data are generated and show the results obtained by IPA on those data. These results are discussed in Section 5 and conclusions are drawn in Section 6.
2 Background
2.1 Hilbert transform: phase of a realvalued signal
Usually, the signals under study are realvalued discrete signals. To obtain the phase of a real signal, one can use a complex Morlet (or Gabor) wavelet transform, which can be seen as a bank of bandpass filters[17]. Alternatively, one can use the Hilbert transform, which should be applied to a locally narrowband signal or be preceded by appropriate filtering[18] for the meaning of the phase extracted by the Hilbert transform to be clear. The two transforms have been shown to be equivalent for the study of brain signals[19], but they may differ for other kinds of signals. In this article, we chose to use the Hilbert transform. To ensure that this transform yields meaningful results, we will precede its use by bandpass filtering the pseudoreal MEG sources used in this article (see Section 4.1). Note that this is a very common preprocessing step in the analysis of real MEG signals (cf.,[20–22]).
The discrete Hilbert transform x _{ h }(t) of a bandlimited discretetime signal x(t),$t\in \mathbb{Z}$, is given by a convolution[18]
Note that the Hilbert transform is a linear operator. The Hilbert filter h(t) is not causal and has infinite duration, which makes direct implementation of the above formula impossible. In practice, the Hilbert transform is usually computed in the frequency domain, where the above convolution becomes a product of the discrete Fourier transforms of x(t) and h(t). A more thorough mathematical explanation of this transform is given in[18, 23]. We used the Hilbert transform as implemented by MATLAB.
The analytic signal of x(t), denoted by$\stackrel{~}{x}\left(t\right)$, is given by$\stackrel{~}{x}\left(t\right)\equiv x\left(t\right)+\text{i}\phantom{\rule{0.3em}{0ex}}{x}_{h}\left(t\right)$, where$\text{i}=\sqrt{1}$ is the imaginary unit. The phase of x(t) is defined as the angle of its analytic signal. In the remainder of the article, we drop the tilde notation; it should be clear from the context whether the signals under consideration are the real signals or the corresponding analytic signals.
2.2 Phaselocked sources
Throughout this article, we assume that the sought sources, in number of N and denoted by s _{ j },j = 1,…,N, are phaselocked. In other words, s _{ j },j = 1,…,N are complex valued signals with nonnegative amplitudes and equal phase up to a constant plus small perturbations. Formally,
where a _{ j }(t) are the amplitudes of the sources, which are by definition nonnegative and realvalued. α _{ j } is the constant dephasing (or phase lag) between the sources (it does not depend on the time t), ϕ(t) represents an oscillation common to all the sources (it does not depend on the source j), and δ _{ j }(t) is the phase jitter, which represents the deviation of the j th source from its nominal phase α _{ j } + ϕ(t). Throughout this article, we will assume that the phase jitter is Gaussian with zero mean and a standard deviation σ.
One situation where signals follow the model in (1) is the one described by the (timedependent) Kuramoto model, under some circumstances. This simple model has extensively been used in the context of, e.g., modeling neuronal excitation and inhibition interactions, as well as largescale experimental neuroscience data[20, 24]. Under this model, the interactions between oscillators are weak relative to the stability of their limit cycles, and thus affect the oscillators’ phases only, not their amplitudes. The phase of oscillator j is governed by[1, 25, 26]
where ϕ _{ j }(t) is the phase of oscillator j (it is unrelated to ϕ(t) in Equation (1)), ω _{ j }(t) is its natural frequency, and κ _{ jk } measures the strength of the interaction between oscillators j and k. If the κ _{ jk } coefficients are large enough and ω _{ j }(t) = ω _{ k }(t) for all j,k, then the solutions of the Kuramoto model are of the form (1) with small δ _{ j }(t).
2.3 PLF
Given two oscillators with phases ϕ _{ j }(t) and ϕ _{ k }(t) for t = 1,…,T, the realvalued^{b} PLF, or phase locking value, between those two oscillators is defined as
where 〈·〉 is the time average operator. The PLF satisfies 0 ≤ ϱ _{ jk } ≤ 1. The value ϱ _{ jk } = 1 corresponds to two oscillators that are fully synchronized (i.e., their phase lag is constant). In terms of Equation (1), a PLF of 1 is obtained only if the phase jitter δ _{ j }(t) is zero. The value ϱ _{ jk } = 0 is attained, for example, if the phase difference ϕ _{ j }(t) − ϕ _{ k }(t)modulo 2Π is uniformly distributed in [−Π,Π[. Values between 0 and 1 represent partial synchrony; in general, higher values of the standard deviation of the phase jitter δ _{ j }(t) yield lower PLF values.
Note that a PLF of 1 is obtained if and only if ϕ _{ j }(t) − ϕ _{ k }(t) is constant^{c}. Thus, studying the separation of sources with constant phase lags can equivalently become the study of separation of sources with pairwise PLFs of 1.
Throughout this article, phase synchrony is measured using the PLF; two signals are perfectly synchronous if and only if they a PLF of 1. Other approaches exist, e.g., for chaotic systems or specific types of oscillators[27]. Studying separation algorithms based on such other definitions is outside of the scope of this article. The definition used here has the advantages of being tractable from an algorithmic point of view, and of being applicable to any situation where ϕ _{ j }(t) − ϕ _{ k }(t) is constant^{d}, regardless of the type of oscillator.
2.4 Effect of linear mixing on synchrony
Assume that we have N sources which have PLFs of 1 with each other. Let s(t), for t = 1,…,T, denote the vector of sources and x(t) = A s(t) denote the mixed signals, where A is the mixing matrix, which is assumed to be square and nonsingular^{e}. Our goal is to find a square unmixing matrix W such that the estimated sources y(t) = W ^{T} x(t) = W ^{T} As(t) are as close to the true sources as possible, up to permutation, scaling, and sign change.
The effect of linear mixing on the PLF matrix is illustrated in Figure1 for a set of simulated sources. This set has three sources, with PLFs of 1 with each other. These sources are of the form (1) with negligible phase jitter, and the phase lags α _{ j } are 0,$\frac{\Pi}{6}$, and$\frac{\Pi}{3}$ radians, respectively. The common oscillation is a timedependent sinusoid. The amplitudes are generated by adding a small constant baseline to a random number of “bursts” with Gaussian shape. Each “burst” has a random center and a random width, and each source amplitude has 1 to 5 such “bursts”.
The first row of Figure1 shows on the left the original sources and on the right their PLF matrix. The second row depicts the mixed signals x(t) on the left and their PLFs on the right; the mixing matrix has random entries uniformly distributed between −1 and 1. It is clear that the mixed signals have lower pairwise PLFs than the sources, although signals 2 and 3 still exhibit a rather high mutual PLF. This example suggests that linear mixing of synchronous sources reduces their synchrony, a fact that will be proved in Section 3.3, ahead; this fact will be used to extract the sources from the mixtures by trying to maximize the PLF of the estimated sources.
3 Algorithm
In this section, we describe the IPA algorithm. As mentioned in Section 1, this algorithm first performs subspace separation, and then performs separation within each subspace. In this article, we only study the performance of IPA in the case where all the sources are phaselocked; in this situation, the intersubspace separation can entirely be skipped, since there is only one subspace of locked sources. Therefore, we will not discuss here the part of IPA relating to subspace separation; the reader is referred to[14] for a discussion on that subject.
3.1 Preprocessing
3.1.1 Whitening
As happens in ICA and other source separation techniques, whitening is a useful preprocessing step for IPA. Whitening, or sphering, is a procedure that linearly transforms the data so that the transformed data have the identity as its covariance matrix; in particular, the whitened data are uncorrelated[7]. In ICA, there are clear reasons to pursue uncorrelatedness: independent data are also uncorrelated, and therefore whitening the data already fulfills one of the required conditions to find independent sources. If D denotes the diagonal matrix containing the eigenvalues of the covariance matrix of the data and V denotes an orthonormal matrix which has, in its columns, the corresponding eigenvectors, then whitening can be performed in a PCAlike manner by multiplying the data x(t) by a matrix B, where[7]
The whitened data are given by z(t) = B A s(t). Therefore, whitening merely transforms the original source separation problem with mixing matrix A into a new problem with mixing matrix B A. The advantage is that B A is an orthogonal mixing matrix, and its estimation becomes easier[7].
The above reasoning is not valid for the separation of phaselocked sources. However, under rather general assumptions, satisfied by the data studied here, it can be shown that whitening places a relatively low upper bound on the condition number of the equivalent mixing matrix (see[28] and references therein). Therefore, we always whiten the mixture data before applying the procedures described in Section 3.2.
3.1.2 Number of sources
As will be seen below, IPA assumes knowledge of the number of sources, and also assumes that the mixing matrix is square: if this is not the case, a simple procedure can be used to detect the number of sources and to transform the data to obey these constraints. If the mixing process is noiseless and is given by x(t) = A s(t), where A has more rows than columns and has maximum rank^{f}, the number of nonzero eigenvalues of the covariance matrix of x is N, where N is the number of sources (or equivalently, the number of columns of A). If the mixture is noisy with a low level of i.i.d. Gaussian additive noise, the former zerovalued eigenvalues now have small nonzero values, but detection of N is still easy to do by detecting how many eigenvalues are large relative to the plateau level of the small eigenvalues[7]^{g}. After N is known, the data need only be multiplied by a matrix${B}^{\prime}={{D}^{\prime}}^{1/2}{{V}^{\prime}}^{T}$ in a similar fashion to Equation (4), where D ^{′} is a smaller N × N diagonal matrix containing only the N largest eigenvalues in D and V ^{′} is a rectangular matrix containing only the N columns of V corresponding to those eigenvalues. The mixture to be separated now becomes
Since B ^{′} A is a square matrix and the number of sources is now given simply by the number of components of x ^{′}, the problem now has a known number of sources and a square mixing matrix.
A remark should be made about complexvalued data. The above procedure is appropriate when both the mixing matrix and the sources are realvalued. If both the mixing matrix and the sources are complexvalued, Equation (4) still applies (V will now have complex values). However, in our case the sources and measurements are complexvalued (due of the Hilbert transform), but the mixing matrix is real. When this is the case, Equation (4) is not directly applicable. The above procedure must instead be applied not to the original data x(t), but to new data x _{ 0 } with twice as many time samples, given by${x}_{0}\left(t\right)=\mathcal{R}\left(\mathbf{x}\right(t\left)\right)$ for t = 1,…,T and${x}_{0}\left(t\right)=\mathcal{I}\left(\mathbf{x}\right(tT\left)\right)$ for t = T + 1,…,2T, where$\mathcal{R}$ and$\mathcal{I}$ denote the real and imaginary parts of a complex number, respectively. The matrix B which results from applying Equation (4) to x _{ 0 }(or B ^{′} if appropriate) is then applied to the original data x as before, and the remainder of the procedure is similar[28].
3.2 Separation of phaselocked sources
The goal of the IPA algorithm is to separate a set of N fully phaselocked sources which have linearly been mixed. Since these sources have a maximal PLF with each other and the mixture components do not (as motivated in Section 2.4 above and proved in Section 3.3 below), we can unmix them by searching for projections that maximize the resulting PLFs. Specifically, this corresponds to finding a N × N matrix W such that the estimated sources, y(t) = W ^{T} x(t) = W ^{T} A s(t), have the highest possible PLFs.
The optimization problem that we shall solve is
where w _{ j } is the j th column of W. In the first term, we sum the squared PLFs between all pairs of sources. The second term penalizes unmixing matrices that are close to singular, and λ is a parameter controlling the relative weights of the two terms. This second term serves the purpose of preventing the algorithm from finding, e.g., solutions where two columns j and k of W are colinear, which trivially yields ϱ _{ jk } = 1 (a similar term is used in some ICA algorithms[7]). Each column of W is constrained to have unit norm to prevent trivial decreases of that term.
The optimization problem in Equation (6) is highly nonconvex: the objective function is a sum of two terms, each of which is nonconvex in the variable W. Furthermore, the unit norm constraint is also nonconvex. Despite this, as we show below in Section 3.3, it is possible to characterize all the global maxima of this problem for the case λ=0 and to devise an optimization strategy taking advantage of that result.
The above optimization problem can be tackled through various maximization algorithms. Our choice was to use a gradient ascent algorithm with momentum and adaptive step sizes; after this gradient algorithm has run for 200 iterations, we use the BFGS algorithm implemented in MATLAB to improve the solution. The result of this optimization for the sources shown in Figure1 is shown in Figure2 for λ = 0.1, illustrating that IPA successfully recovers the original sources for this dataset.
3.3 Unicity of solution
In[14], we proved that a few mild assumptions on the sources, which are satisfied in the vast majority of realworld situations, suffice for a useful characterization of the global maxima of Problem (6): it turns out that there are infinitely many such maxima, and that they correspond either to correct solutions (i.e., the original sources up to permutation, scaling, and sign changes) or to singular matrices W. More specifically, we proved the following: assume that we have a set of complexvalued and linearly independent sources denoted by s(t), which have a PLF of 1 with one another. Consider also linear combinations of the sources of the form y(t) = Cs(t) where C is a square matrix of appropriate dimensions. Further assume that the following conditions hold: Then, the only linear combination y(t) = Cs(t) of the sources s(t) in which the PLF between any two components of y is 1 is y(t) = s(t), up to permutation, scaling, and sign changes[14].

1.
Neither s _{ j }(t) nor y _{ j }(t) can identically be zero, for all j.

2.
C is nonsingular.

3.
The phase lag between any two sources is different from 0 or Π.

4.
The amplitudes of the sources, a _{ j }(t) = s _{ j }(t), are linearly independent.
3.4 Comparison to ICA
The above result is simple, but some relevant remarks should be made. If the optimum is found using λ = 0 and the second assumption is not violated (or equivalently, det(C) = det(W)det(A) ≠ 0, which is equivalent to det(W) ≠ 0 if A is nonsingular), then we can be certain that the correct solution has been found. However, if the optimization is made using λ = 0, there is a possibility that the algorithm will estimate a bad solution where, for example, some of the estimated sources are all equal to one another (in which case the PLFs between those estimated sources is trivially equal to 1). On the other hand, if we use λ ≠ 0 to guarantee that W is nonsingular, the unicity result stated above cannot be applied to the complete objective function. We call “nonsingular solutions” and “singular solutions” those in which det(W) ≠ 0 and det(W) = 0, respectively. The result expressed in Section 3.3 is thus equivalent to stating that “all nonsingular global optima of Equation (6) with λ = 0 correspond to correct solutions”.
This contrasts strongly with ICA, where singular solutions are not an issue, because ICA algorithms attempt to find independent sources and one signal is never independent from itself[7]. In other words, singular solutions always yield poor values of the objective function of ICA algorithms. Here we are attempting to estimate phaselocked sources, and any signal is perfectly phaselocked with itself. Thus, one must always use λ≠0 in the objective function of Equation (6) when attempting to separate phaselocked sources.
We use a simple strategy to deal with this problem. We start by optimizing Equation (6) for a relatively large value of λ(λ = 0.4), and once convergence has been obtained, we use the result as the starting point for a new optimization, this time with λ = 0.2. The same process is repeated with the value of λ halved each time, until five such epochs have been run. The early optimization steps move the algorithm away from the singular solutions discussed above, whereas the final steps are done with a very low value of λ, where the above unicity conditions are approximately valid. As the following experimental results show, this strategy can successfully prevent singular solutions from being found, while making the influence of the second term of Equation (6) on the final result negligible.
4 Experimental results
4.1 Data generation
As mentioned earlier, the main goal of this study is to study the applicability of IPA to realworld electrophysiological data from human brain EEG and MEG. The choice of the data for this study was not trivial, since we need to know the true sources in order to quantitatively measure the quality of the results. On the one hand, to know the actual sources in the brain would require simultaneous data from outside the scalp (EEG or MEG, which would be the mixed signals) and from inside the scalp (intracraneal recordings, corresponding to the sources). If intracraneal recordings are not available, results cannot quantitatively be assessed; they can only qualitatively be assessed by experts who can tell whether the extracted sources are meaningful or not. On the other hand, due to their extreme simplicity, synthetic data such as those used so far to illustrate IPA, shown in Figure1, cannot be used to assess the usefulness of the method in realworld situations.
In an attempt to obtain “the best of both worlds”, we have generated a pseudoreal dataset from actual MEG recordings. By doing this, we know the true sources and the true mixing matrix, while still using sources that are of a nature similar to what one observes in realworld MEG. We begin by describing the process that we used to generate a perfectly phaselocked dataset; we then explain how we modified these data to analyze nonperfect cases as well. It is important to stress that the generation process described below has no relation to the one used to generate the data of Figure1, even though both processes generate sources with maximum PLF.
Our first step was to obtain a realistic mixing matrix. To do so, we used the wellknown EEGIFT software package[29]. This package includes a realworld sample EEG dataset with 64 channels. Using all the default options of the software package, we extracted 20 independent components from the data of Subject 1 in that dataset. The results that was important for us, in this process, were not the independent components themselves (which were discarded), but rather the 64 × 20 mixing matrix. As discussed in Section 3.1, we have opted for using a square mixing matrix, with little loss of generality. Therefore, we selected N random rows and N random columns of that mixing matrix (without repetition), and formed an N × N mixing matrix from the corresponding values of the original 64 × 20 matrix. We will later show results for datasets ranging from N = 2 to N = 5 sources; in the following, assume, for the sake of concreteness, that N = 4.
Having generated a physiologically plausible mixing matrix, the next step was to generate a set of four sources. For this, we used the MEG dataset studied previously in[30]^{h}, which has 122 channels with 17,730 samples per channel. The sampling frequency is 297 Hz, and the data have already been subjected to lowpass filtering with cutoff at 90 Hz. Since bandpass filtering is a very common preprocessing step in the analysis of MEG data[20–22] and is useful for the use of the Hilbert transform, we performed a further bandpass filtering with no phase distortion, keeping only the 18–24 Hz band^{i}. The resulting filtered data were used to generate a complex signal through the Hilbert transform; these data were whitened as described in Section 3.1, and from the whitened data we extracted the timedependent amplitudes and phases.
We then selected four random channels of these filtered MEG data. Since none of these MEG recordings were actually phaselocked (recall that they were themselves the result of a mixing process) and we wanted to study the performance on fully phaselocked sources (possibly corrupted by jitter, as explained below), we replaced the phase of the second of these channels with the phase of the first channel with a constant phase lag of$\frac{\Pi}{6}$ radians. The phase of the third channel was replaced with the phase of the first channel with a constant phase lag of$\frac{\Pi}{3}$ radians, and that of the fourth channel with the phase of the first channel with a lag of$\frac{\Pi}{2}$ radians. The amplitudes of the four sources were kept as the original amplitudes of the four random channels themselves. The process is illustrated in Figure3. The above process, including the choice of the 4 × 4 submatrix, was repeated 100 times, with different initializations of the random number generator. This way of constructing the data ensured that the sources were fully phaselocked.
We also constructed datasets in which the sources were not perfectly phase locked. For this, we used the same 100 sets of sources, but with those sources now corrupted by phase jitter: each sample t of each source j was multiplied by e ^{i} δ _{ j }(t), where the phase jitter δ _{ j }(t) was drawn from a random Gaussian distribution with zero mean and standard deviation σ. We tested IPA for σ from 0 to 20 degrees, in 5 degrees steps. One example with σ = 5 degrees is shown in Figure4, and one with σ = 20 degrees is shown in Figure5.
Finally, we studied the effect of N on the results of the proposed algorithm. We created 100 datasets similar to the jitterless datasets mentioned earlier, using N = 2,3 and 5. In all of these, and similarly to the data with N = 4, we used sources with phase lags multiple of$\frac{\Pi}{6}$.
4.2 Results
We measured the separation quality using two measures: the Amari performance index (API)[31] and the wellknown signaltonoise ratio (SNR). The API measures how far the gain matrix W ^{T} A is from a permutated diagonal matrix; the SNR measures how far the estimated sources are from the true sources. In summary, the API measures the quality of the estimation of the mixing matrix, while the SNR measures the quality of the estimation of the sources themselves.
Figure6 presents the means and standard deviations of these measures for the 100 runs mentioned in Section 4.1, for each of the jitter levels. The results indicate that IPA has very good performance on the jitterless case, in data of this kind, and that this level of performance is approximately maintained even in the presence of low levels of phase jitter, up to 5 degrees of standard deviation. Some deterioration in performance occurs from 5 to 10 degrees of phase jitter standard deviation, but with a SNR of 27 dB and an API below 0.1 the sources can still be considered to be well estimated.
The results for high jitter levels (sigma equal to 15 or 20 degrees) show that there is a limit to IPA’s robustness; this limit lies somewhere between 10 and 15 degrees. Equivalently, in terms of the PLF, the algorithm shows good robustness to PLF values smaller than 1 as long as they are above 0.95, but below that value its performance deteriorates progressively up to a PLF of approximately 0.9, at which point only partial separations are obtained.
Figure7 shows the effect of varying the number of sources N. The figure shows that IPA can handle values of N up to N = 5 with only a slight decrease in performance.
Figure7 also shows something rather peculiar: for N = 2, the results are mediocre (with an average API around 0.4)^{j}. This is not an effect of lowering the number of sources N, but rather an indirect effect of the phase lag between the sources. To verify this, we generated datasets of jitterless data with N = 2, using phase lags of$\frac{\Pi}{12}$,$\frac{2\Pi}{12}$ (the value used in Figure7),$\frac{3\Pi}{12}$, and$\frac{4\Pi}{12}$ (100 datasets for each of these values). Figure8 shows that a phase lag of$\frac{2\Pi}{12}$ yields poor API values, as we already knew, but$\frac{3\Pi}{12}$ yields very good values. Naively, one could conclude that when the sources have a phase lag of$\frac{2\Pi}{12}$, or less, the separation cannot be accurately performed.
The effect is, however, not so clearcut. The results for N = 3,4,5 also involve sources with phase lags of$\frac{\Pi}{6}$, but the API values for those experiments are very good. We do not have a solid explanation for this fact; we conjecture that the presence of some pairs of sources with larger phase lags (e.g., for N = 4, the first and third sources have a phase lag of$\frac{\Pi}{3}$ and the first and fourth sources have a phase lag of$\frac{\Pi}{2}$) aids in the separation of all the sources.
5 Discussion
IPA has a parameter, λ, which controls the relative weights given to the optimization of the PLF matrix and to the penalization of closetosingular solutions. Our optimization procedure starts with a high value of λ, which is lowered as the optimization progresses. We confirmed that this variation of the parameter’s value is necessary: the quality of the results is noticeably degraded if λ is kept at a constant value, no matter how high or low it is. Table1 confirms this: while λ = 0.1, the best fixed value, yields decent results, the results with a varying value of λ are considerably better. Furthermore, although the final epoch in the optimization is not done with λ = 0, we have verified that the results are virtually the same as if we had used λ = 0 at the last epoch.
The above paragraph illustrates something already mentioned in Section 3.4: separation of phaselocked sources is a nontrivial change from ICA because there are wrong, singular solutions that yield exactly the same values of the PLF matrix as the correct nonsingular solutions. Our approach to distinguish these two types of solutions consists in adding a term depending on the determinant of the matrix W. This approach works correctly, as our results show. However, it is perhaps inelegant to do this through matrix W, instead of doing it directly through the estimated sources. It would be preferable to replace this term with one depending directly on the estimated sources.
The size of the optimization variable, W, is N ^{2}; there are N constraints on this variable, yielding N(N − 1) independent parameters. This means that the IPA algorithm is quadratic in the number of sources N, which is the main reason why we do not present results for N > 5; while running IPA on 100 datasets with N = 2 takes a few hours, doing so for N = 5 takes several days.
The results that we obtained show that IPA can separate perfectly locked MEGlike sources. However, while the phase locking in the jitterless pseudoreal MEG data is perfect, in real MEG data it will probably be less than perfect. This is the reason why we also studied data with phase jitter, which have pairwise PLFs smaller than 1. The results indicate that IPA has some robustness to PLFs smaller than 1, but the sources still need to exhibit considerable phase locking for the separation to be accurate; weaker synchrony results only in partial separation. Note, however, that the partially separated data are usually still closer to the true sources than the original mixtures.
The comments made in the previous paragraph raise an additional optimization challenge: if the true sources have PLFs smaller than 1, optimization of the objective function in Equation (6) can lead to overfitting. The results presented here show that IPA has some robustness to sources which have a PLF smaller than 1, while being stationary (since the phase jitter is stationary, the distribution of the PLF does not vary with time). In realworld cases, it is likely that the PLF is nonstationary: for example, some sources may be phaselocked at the start of the observation period and not phaselocked at its end. While simple techniques such as windowing can be devised to tackle smaller time intervals where stationarity is (almost) verified, one would still need to find a way to integrate the information from different intervals. Such integration is out of the scope of this article.
One interesting extension of this article would be the separation of specific types of systems, such as van der Pol oscillators[27]. For those, fully entrained oscillators may even present a PLF < 1, and a different measure of synchrony, tailored to those oscillators, may need to be used. Such a study would fall out of the scope of this article. Nevertheless, it is expected that additional knowledge of the oscillator type can be exploited to improve the algorithm’s performance or its robustness to deviations from the ideal case.
One can derive a relationship between additive Gaussian noise (e.g., from the sensors) and the phase jitter used throughout this article. Figure5 depicts, in the complex plane, a sample of a noiseless signal x(t) ≡ a(t)e ^{iϕ(t)}, to which complex noise n(t) is added to form the noisy signal x _{ n } ≡ a(t)e ^{iϕ(t)} + n(t)^{k}. That figure also shows n _{⊥}(t), which is the projection of n(t) on the direction orthogonal to x(t), and x _{ n⊥}(t) ≡ x(t) + n _{⊥}(t). Also depicted are ϕ(t), ϕ _{ n }(t) and ϕ _{ n⊥}(t), which are defined as the phases of x(t), x _{ n }(t) and x _{ n⊥}(t), respectively.
It can easily be shown that, if n(t) << x(t) = a(t), then${\varphi}_{n}\left(t\right)\approx {\varphi}_{n\perp}\left(t\right)\approx \varphi \left(t\right)+\frac{{n}_{\perp}\left(t\right)}{a\left(t\right)}$[32]. This is an important relationship, because it shows that, under additive noise, portions of the signal with a large amplitude will have a better phase estimate than portions with a small amplitude, in which even small amounts of additive noise can severely disrupt the phase estimation. We thus believe that the PLF quantity, while attractive and elegant in theory, and despite working well with low amounts of additive noise[14], will probably need to be changed to factor in the amplitude in an appropriate way to deal with applications where considerable amounts of additive noise are present.
6 Conclusion
We have shown that IPA can successfully separate phaselocked sources from linear mixtures in pseudoreal MEG data. We showed that IPA tolerates deviations from the ideal case, yielding excellent results for low amounts of phase jitter, and that it exhibits some robustness to moderate amounts of phase jitter. We also showed that it can handle numbers of sources up to N = 5. We believe that these results bring us closer to the goal of successfully separating phaselocked sources in realworld signals.
Endnotes
^{a}In EEG and MEG, the sources are not individual neurons, whose oscillations are too weak to be detected from outside the scalp. In these cases, the sources are populations of closely located neurons oscillating together.^{b}The term “realvalued” is used here to distinguish from other phasebased algorithms where a complex quantity is used[14].^{c}Technically, this condition could be violated in a set with zero measure. Since we will deal with a discrete and finite number of time points, no such sets exist and this technicality is not important.^{d}We will also show results where this phase difference is not exactly constant; see Figure6.^{e}These assumptions are not as restrictive as they may sound; see Section 3.1.^{f}This is usually called the overdetermined case. The underdetermined case, where A has fewer rows than columns, is more difficult and is not addressed here.^{g}There are more rigorous criteria that can be used to choose N. Two very popular methods are the Akaike information criterion and the minimum description length. It is out of the scope of this article to discuss these two criteria; the reader is referred to[7] and references therein for more information.^{h}Freely available from http://research.ics.tkk.fi/ica/ eegmeg/MEG_data.html.^{i}The choice of this specific band is rather arbitrary. The band is narrow enough that the Hilbert transform will allow correct estimation of instantaneous amplitude and phase, but wide enough that the instantaneous frequency of the signals retains some variability. The passband is also of a similar width as in typical studies using MEG[20].^{j}It might appear contradictory that the average SNR has a good value, 40 dB, when the average API has a mediocre score. In reality, when the standard deviation of the SNR is very high, it is usually an indication that the separation is poor. As an example, consider a case where one source is very well estimated, with an SNR of 80 dB, and one is poorly estimated, with an SNR of 0 dB. The average SNR would be 40, but with a very high standarddeviation. Good values of the average SNR are indicators of a good separation only when the standarddeviation of the SNR is small.^{k}In most real applications, one will be dealing with models consisting of real signals to which realvalued noise is added. However, the linearity of the Hilbert transform allows the same type of analysis for that case as for the case of complex signals with complex additive noise which is considered here.
References
 1.
A Pikovsky M, Rosenblum J: Kurths, Synchronization: A Universal Concept in Nonlinear Sciences. Cambridge, MA: Cambridge Nonlinear Science Series (Cambridge University Press); 2001.
 2.
Palva JM, Palva S, Kaila K: Phase synchrony among neuronal oscillations in the human cortex. J. Neurosci 2005, 25(15):39623972. 10.1523/JNEUROSCI.425004.2005
 3.
Schoffelen JM, Oostenveld R, Fries P: Imaging the human motor system’s betaband synchronization during isometric contraction. NeuroImage 2008, 41: 437447. 10.1016/j.neuroimage.2008.01.045
 4.
Uhlhaas PJ, Singer W: Neural synchrony in brain disorders: relevance for cognitive dysfunctions and pathophysiology. Neuron 2006, 52: 155168. 10.1016/j.neuron.2006.09.020
 5.
Nunez PL, Srinivasan R, Westdorp AF, Wijesinghe RS, Tucker DM, Silberstein RB, Cadusch PJ: EEG coherency I: statistics, reference electrode, volume conduction, Laplacians, cortical imaging, and interpretation at multiple scales. Electroencephalogr. Clin. Neurophysiol 1997, 103: 499515. 10.1016/S00134694(97)000667
 6.
Vigário R, Särelä J, Jousmäki V, Hämäläinen M, Oja E: Independent component approach to the analysis of EEG and MEG recordings. IEEE Trans. Biomed. Eng 2000, 47(5):589593. 10.1109/10.841330
 7.
Hyvärinen A, Karhunen J, Oja E: Independent Component Analysis. New York: Wiley; 2001.
 8.
Akhtar M, Mitsuhashi W, James C: Employing spatially constrained ICA and wavelet denoising for automatic removal of artifacts from multichannel EEG data. Signal Process 2012, 92: 401416. 10.1016/j.sigpro.2011.08.005
 9.
de Vos M, de Lathauwer L, van Huffel S: Spatially constrained ICA algorithm with an application in EEG processing. Signal Process 2011, 91: 19631972. 10.1016/j.sigpro.2011.02.019
 10.
Lee D, Seung H: Algorithms for nonnegative matrix factorization. Adv. Neural Inf. Process. Syst 2001, 13: 556562.
 11.
Chan TH, Ma WK, Chi CY, Wang Y: A convex analysis framework for blind separation of nonnegative sources. IEEE Trans. Signal Process 2008, 56: 51205134.
 12.
de Frein R, Rickard S: The synchronized shorttimeFouriertransform: properties and definitions for multichannel source separation. IEEE Trans. Signal Process 2011, 59: 91103.
 13.
Hosseini S, Deville Y, Saylani H: Blind separation of linear instantaneous mixtures of nonstationary signals in the frequency domain. Signal Process 2009, 89: 819830. 10.1016/j.sigpro.2008.10.024
 14.
Almeida M, Schleimer JH, BioucasDias J, Vigário R: Source separation and clustering of phaselocked subspaces. IEEE Trans. Neural Netw 2011, 22(9):14191434.
 15.
Almeida M, BioucasDias J, Vigário R: Independent phase analysis: separating phaselocked subspaces. Proceedings of the International Conference on Independent Component Analysis and Signal Separation 2010, 189196.
 16.
Ziehe A, Müller KR: TDSEP—an efficient algorithm for blind separation using time structure. International Conference on Artificial Neural Networks 1998, 675680.
 17.
Torrence C, Compo GP: A practical guide to wavelet analysis. Bull. Am. Meteorol. Soc 1998, 79: 6178. 10.1175/15200477(1998)079<0061:APGTWA>2.0.CO;2
 18.
Oppenheim AV, Schafer RW, Buck JR: DiscreteTime Signal Processing. Englewood Cliffs, NJ: PrenticeHall International Editions; 1999.
 19.
Quyen MLV, Foucher J, Lachaux JP, Rodriguez E, Lutz A, Martinerie J, Varela FJ: Comparison of Hilbert transform and wavelet methods for the analysis of neuronal synchrony. J. Neurosci. Methods 2001, 111: 8398. 10.1016/S01650270(01)003727
 20.
Varela F, Lachaux JP, Rodriguez E, Martinerie J: The Brainweb: phase synchronization and largescale integration. Nat. Rev. Neurosci 2001, 2: 229239.
 21.
Niedermeyer E, da Silva FHL: Electroencephalography: Basic Principles, Clinical Applications, and Related Fields. Philadelphia: Lippincott Williams and Wilkins; 2005.
 22.
Nunez P, Srinivasan R: Electric Fields of the Brain: the Neurophysics of EEG. New York: Oxford University Press; 2006.
 23.
Gold B, Oppenheim AV, Rader CM: Theory and implementation of the discrete Hilbert transform. Symposium on Computer Processing in Communications 1973.
 24.
Breakspear M, Heitmann S, Daffertshofer A: Generative models of cortical oscillations: neurobiological implications of the Kuramoto model. Front. Human Neurosci 2010, 4: 190202.
 25.
Kuramoto Y: Chemical Oscillations, Waves and Turbulences. Berlin: Springer; 1984.
 26.
Strogatz S: Nonlinear Dynamics and Chaos. Boulder: Westview Press; 2000.
 27.
Izhikevich E: Dynamic Systems in Neuroscience. Cambridge, MA: MIT Press; 2007.
 28.
Almeida M, Vigário R, BioucasDias J: The role of whitening for separation of synchronous sources. Proceedings of the International Conference on Latent Variable Analysis and Signal Separation 2012, 139146.
 29.
Eichele T, Rachakonda S, Brakedal B, Eikeland R, Calhoun VD: EEGIFT: group independent component analysis for eventrelated EEG data. Comput. Intell. Neurosci 2011, 2011: 19.
 30.
Vigário R, Jousmäki V, Hämäläinen M, Hari R, Oja E: Independent component analysis for identification of artifacts in magnetoencephalographic recordings. Advances in NIPS 1997.
 31.
Amari S, Cichocki A, Yang HH: A new learning algorithm for blind signal separation. Advances in NIPS 1996, 757763.
 32.
Carlson A, Crilly P, Rutledge J: Communication Systems: An Introduction to Signals and Noise in Electrical Communication. New York: McGrawHill; 2001.
Acknowledgements
This work was partially supported by project DECABio of Instituto de Telecomunicacoes, PEstOE/EEI/LA0008/2011.
Author information
Additional information
Competing interests
The author declare that they have no competing interests.
Authors’ original submitted files for images
Rights and permissions
About this article
Received
Accepted
Published
DOI
Keywords
 Independent Component Analysis
 Independent Component Analysis
 Singular Solution
 Blind Source Separation
 True Source