Open Access

Separation of phase-locked sources in pseudo-real MEG data

EURASIP Journal on Advances in Signal Processing20132013:32

https://doi.org/10.1186/1687-6180-2013-32

Received: 1 April 2012

Accepted: 2 February 2013

Published: 22 February 2013

Abstract

This article addresses the blind separation of linear mixtures of synchronous signals (i.e., signals with locked phases), which is a relevant problem, e.g., in the analysis of electrophysiological signals of the brain such as the electroencephalogram and the magnetoencephalogram (MEG). Popular separation techniques such as independent component analysis are not adequate for phase-locked signals, because such signals have strong mutual dependency. Aiming at unmixing this class of signals, we have recently introduced the independent phase analysis (IPA) algorithm, which can be used to separate synchronous sources. Here, we apply IPA to pseudo-real MEG data. The results show that this algorithm is able to separate phase-locked MEG sources in situations where the phase jitter (i.e., the deviation from the perfectly synchronized case) is moderate. This represents a significant step towards performing phase-based source separation on real data.

1 Introduction

In recent years, the interest of the scientific community in synchrony has risen. This interest is both in its physical manifestations and in the development of a theory unifying and describing those manifestations in various systems such as laser beams, astrophysical objects, and brain neurons[1].

It is believed that synchrony plays a relevant role in the way different parts of the human brain interact. For example, when humans engage in a motor task, several brain regions oscillate coherently[2, 3]. Also, several pathologies such as autism, Alzheimer, and Parkinson are associated with a disruption in the synchronization profile of the brain, whereas epilepsy is associated with an anomalous increase in synchrony (see[4] for a review).

To perform inference on the synchrony of networks present in the brain or in other real-world systems, one must have access to the phase dynamics of the individual oscillators (which we will call “sources”). Unfortunately, in brain electrophysiological signals such as encephalograms (EEG) and magnetoencephalograms (MEG), and in other real-world situations, individual oscillator signals are not directly measurable, and one has only access to a superposition of the sourcesa. In fact, EEG and MEG signals measured in one sensor contain components coming from several brain regions[5]. In this case, spurious synchrony may occur, as we will illustrate later.

The problem of undoing this superposition is called blind source separation (BSS). Typically, one assumes that the mixing is linear and instantaneous, which is a valid approximation in brain signals[6]. One must also make some assumptions on the sources, such as in independent component analysis (ICA) where the assumption is mutual statistical independence of the sources[7]. ICA has seen multiple applications in EEG and MEG processing (for recent applications see, e.g.,[8, 9]). Different BSS approaches use criteria other than statistical independence, such as non-negativity of sources[10, 11] or time-dependent frequency spectrum criteria[12, 13]. In our case, independence of the sources is not a valid assumption, because phase-locked sources are highly mutually dependent. Also, phase-locking is not equivalent to frequency coherence: in fact, two signals may have a severe overlap between their frequency spectra but still exhibit low or no phase synchrony at all[14]. In this article, we address the problem of how to separate such phase-locked sources using a phase-specific criterion.

Recently, we have presented a two-stage algorithm called independent phase analysis (IPA) which performed very well in noiseless simulated data[15] and with moderate levels of added Gaussian white noise[14]. The separation algorithm we then proposed uses temporal decorrelation separation[16] as a first step, followed by the maximization of an objective function involving the phases of the estimated sources. In[14], we presented a “proof-of-concept” of IPA, laying down the theoretical foundations of the algorithm and applying it to a toy dataset of manually generated data. However, in that article we were not concerned with the application of IPA to real-world data. In this article, we study the applicability of IPA to pseudo-real MEG data. These data are not yet meant to allow inference about the human brain; however, they are generated in such a way that both the sources and the mixing process mimic what actually happens in the human brain. The advantage of using such pseudo-real data is that the true solution is known, thus allowing a quantitative assessment of the performance of the algorithm. We also study the robustness of IPA to the case where the sources are not perfectly phase-locked. It should however be reinforced that the algorithm presented here makes no assumptions that are specific of brain signals, and should work in any situation where phase-locked sources are mixed approximately linearly and noise levels are low.

This article is organized as follows. In Section 2, we introduce the Hilbert transform. We also introduce there the phase locking factor (PLF), a measurement of synchrony which is central to the algorithm; finally, we show that synchrony is disrupted when the sources undergo a linear mixing. Section 3 describes the IPA algorithm in detail, including illustrations using a toy dataset. In Section 4, we explain how the pseudo-real MEG data are generated and show the results obtained by IPA on those data. These results are discussed in Section 5 and conclusions are drawn in Section 6.

2 Background

2.1 Hilbert transform: phase of a real-valued signal

Usually, the signals under study are real-valued discrete signals. To obtain the phase of a real signal, one can use a complex Morlet (or Gabor) wavelet transform, which can be seen as a bank of bandpass filters[17]. Alternatively, one can use the Hilbert transform, which should be applied to a locally narrowband signal or be preceded by appropriate filtering[18] for the meaning of the phase extracted by the Hilbert transform to be clear. The two transforms have been shown to be equivalent for the study of brain signals[19], but they may differ for other kinds of signals. In this article, we chose to use the Hilbert transform. To ensure that this transform yields meaningful results, we will precede its use by band-pass filtering the pseudo-real MEG sources used in this article (see Section 4.1). Note that this is a very common preprocessing step in the analysis of real MEG signals (cf.,[2022]).

The discrete Hilbert transform x h (t) of a band-limited discrete-time signal x(t), t Z , is given by a convolution[18]
x h ( t ) x ( t ) h ( t ) , where h ( t ) 0 , for even t 2 Π t , for odd t .

Note that the Hilbert transform is a linear operator. The Hilbert filter h(t) is not causal and has infinite duration, which makes direct implementation of the above formula impossible. In practice, the Hilbert transform is usually computed in the frequency domain, where the above convolution becomes a product of the discrete Fourier transforms of x(t) and h(t). A more thorough mathematical explanation of this transform is given in[18, 23]. We used the Hilbert transform as implemented by MATLAB.

The analytic signal of x(t), denoted by x ~ ( t ) , is given by x ~ ( t ) x ( t ) + i x h ( t ) , where i = 1 is the imaginary unit. The phase of x(t) is defined as the angle of its analytic signal. In the remainder of the article, we drop the tilde notation; it should be clear from the context whether the signals under consideration are the real signals or the corresponding analytic signals.

2.2 Phase-locked sources

Throughout this article, we assume that the sought sources, in number of N and denoted by s j ,j = 1,…,N, are phase-locked. In other words, s j ,j = 1,…,N are complex valued signals with nonnegative amplitudes and equal phase up to a constant plus small perturbations. Formally,
s j ( t ) = a j ( t ) e i ( α j + ϕ ( t ) + δ j ( t ) ) ,
(1)

where a j (t) are the amplitudes of the sources, which are by definition non-negative and real-valued. α j is the constant dephasing (or phase lag) between the sources (it does not depend on the time t), ϕ(t) represents an oscillation common to all the sources (it does not depend on the source j), and δ j (t) is the phase jitter, which represents the deviation of the j th source from its nominal phase α j + ϕ(t). Throughout this article, we will assume that the phase jitter is Gaussian with zero mean and a standard deviation σ.

One situation where signals follow the model in (1) is the one described by the (time-dependent) Kuramoto model, under some circumstances. This simple model has extensively been used in the context of, e.g., modeling neuronal excitation and inhibition interactions, as well as large-scale experimental neuroscience data[20, 24]. Under this model, the interactions between oscillators are weak relative to the stability of their limit cycles, and thus affect the oscillators’ phases only, not their amplitudes. The phase of oscillator j is governed by[1, 25, 26]
ϕ ̇ j ( t ) = ω j ( t ) + 1 N k = 1 N κ jk sin ϕ k ( t ) ϕ j ( t ) ,
(2)

where ϕ j (t) is the phase of oscillator j (it is unrelated to ϕ(t) in Equation (1)), ω j (t) is its natural frequency, and κ jk measures the strength of the interaction between oscillators j and k. If the κ jk coefficients are large enough and ω j (t) = ω k (t) for all j,k, then the solutions of the Kuramoto model are of the form (1) with small δ j (t).

2.3 PLF

Given two oscillators with phases ϕ j (t) and ϕ k (t) for t = 1,…,T, the real-valuedb PLF, or phase locking value, between those two oscillators is defined as
ϱ jk 1 T t = 1 T e i ϕ j ( t ) ϕ k ( t ) = e i ϕ j ϕ k ,
(3)

where 〈·〉 is the time average operator. The PLF satisfies 0 ≤ ϱ jk  ≤ 1. The value ϱ jk  = 1 corresponds to two oscillators that are fully synchronized (i.e., their phase lag is constant). In terms of Equation (1), a PLF of 1 is obtained only if the phase jitter δ j (t) is zero. The value ϱ jk  = 0 is attained, for example, if the phase difference ϕ j (t) − ϕ k (t)modulo 2Π is uniformly distributed in [−Π,Π[. Values between 0 and 1 represent partial synchrony; in general, higher values of the standard deviation of the phase jitter δ j (t) yield lower PLF values.

Note that a PLF of 1 is obtained if and only if ϕ j (t) − ϕ k (t) is constantc. Thus, studying the separation of sources with constant phase lags can equivalently become the study of separation of sources with pairwise PLFs of 1.

Throughout this article, phase synchrony is measured using the PLF; two signals are perfectly synchronous if and only if they a PLF of 1. Other approaches exist, e.g., for chaotic systems or specific types of oscillators[27]. Studying separation algorithms based on such other definitions is outside of the scope of this article. The definition used here has the advantages of being tractable from an algorithmic point of view, and of being applicable to any situation where ϕ j (t) − ϕ k (t) is constantd, regardless of the type of oscillator.

2.4 Effect of linear mixing on synchrony

Assume that we have N sources which have PLFs of 1 with each other. Let s(t), for t = 1,…,T, denote the vector of sources and x(t) = A s(t) denote the mixed signals, where A is the mixing matrix, which is assumed to be square and non-singulare. Our goal is to find a square unmixing matrix W such that the estimated sources y(t) = W T x(t) = W T As(t) are as close to the true sources as possible, up to permutation, scaling, and sign change.

The effect of linear mixing on the PLF matrix is illustrated in Figure1 for a set of simulated sources. This set has three sources, with PLFs of 1 with each other. These sources are of the form (1) with negligible phase jitter, and the phase lags α j are 0, Π 6 , and Π 3 radians, respectively. The common oscillation is a time-dependent sinusoid. The amplitudes are generated by adding a small constant baseline to a random number of “bursts” with Gaussian shape. Each “burst” has a random center and a random width, and each source amplitude has 1 to 5 such “bursts”.
Figure 1

Top row: The three original sources (left) and PLFs between them (right). Bottom row: The three mixed signals (left) and PLFs between them (right). On the right column, the area of the square in position (i,j) is proportional to the PLF between the signals i and j. Therefore, large squares represent PLFs close to 1, while small squares represent values close to zero. In this example, the second and third sources have phase lags of Π 6 and Π 3 radians relative to the first source, respectively.

The first row of Figure1 shows on the left the original sources and on the right their PLF matrix. The second row depicts the mixed signals x(t) on the left and their PLFs on the right; the mixing matrix has random entries uniformly distributed between −1 and 1. It is clear that the mixed signals have lower pairwise PLFs than the sources, although signals 2 and 3 still exhibit a rather high mutual PLF. This example suggests that linear mixing of synchronous sources reduces their synchrony, a fact that will be proved in Section 3.3, ahead; this fact will be used to extract the sources from the mixtures by trying to maximize the PLF of the estimated sources.

3 Algorithm

In this section, we describe the IPA algorithm. As mentioned in Section 1, this algorithm first performs subspace separation, and then performs separation within each subspace. In this article, we only study the performance of IPA in the case where all the sources are phase-locked; in this situation, the inter-subspace separation can entirely be skipped, since there is only one subspace of locked sources. Therefore, we will not discuss here the part of IPA relating to subspace separation; the reader is referred to[14] for a discussion on that subject.

3.1 Preprocessing

3.1.1 Whitening

As happens in ICA and other source separation techniques, whitening is a useful preprocessing step for IPA. Whitening, or sphering, is a procedure that linearly transforms the data so that the transformed data have the identity as its covariance matrix; in particular, the whitened data are uncorrelated[7]. In ICA, there are clear reasons to pursue uncorrelatedness: independent data are also uncorrelated, and therefore whitening the data already fulfills one of the required conditions to find independent sources. If D denotes the diagonal matrix containing the eigenvalues of the covariance matrix of the data and V denotes an orthonormal matrix which has, in its columns, the corresponding eigenvectors, then whitening can be performed in a PCA-like manner by multiplying the data x(t) by a matrix B, where[7]
B = D 1 / 2 V T .
(4)

The whitened data are given by z(t) = B A s(t). Therefore, whitening merely transforms the original source separation problem with mixing matrix A into a new problem with mixing matrix B A. The advantage is that B A is an orthogonal mixing matrix, and its estimation becomes easier[7].

The above reasoning is not valid for the separation of phase-locked sources. However, under rather general assumptions, satisfied by the data studied here, it can be shown that whitening places a relatively low upper bound on the condition number of the equivalent mixing matrix (see[28] and references therein). Therefore, we always whiten the mixture data before applying the procedures described in Section 3.2.

3.1.2 Number of sources

As will be seen below, IPA assumes knowledge of the number of sources, and also assumes that the mixing matrix is square: if this is not the case, a simple procedure can be used to detect the number of sources and to transform the data to obey these constraints. If the mixing process is noiseless and is given by x(t) = A s(t), where A has more rows than columns and has maximum rankf, the number of non-zero eigenvalues of the covariance matrix of x is N, where N is the number of sources (or equivalently, the number of columns of A). If the mixture is noisy with a low level of i.i.d. Gaussian additive noise, the former zero-valued eigenvalues now have small non-zero values, but detection of N is still easy to do by detecting how many eigenvalues are large relative to the plateau level of the small eigenvalues[7]g. After N is known, the data need only be multiplied by a matrix B = D 1 / 2 V T in a similar fashion to Equation (4), where D is a smaller N × N diagonal matrix containing only the N largest eigenvalues in D and V is a rectangular matrix containing only the N columns of V corresponding to those eigenvalues. The mixture to be separated now becomes
x ( t ) = B x ( t ) = B A s ( t ) .
(5)

Since B A is a square matrix and the number of sources is now given simply by the number of components of x , the problem now has a known number of sources and a square mixing matrix.

A remark should be made about complex-valued data. The above procedure is appropriate when both the mixing matrix and the sources are real-valued. If both the mixing matrix and the sources are complex-valued, Equation (4) still applies (V will now have complex values). However, in our case the sources and measurements are complex-valued (due of the Hilbert transform), but the mixing matrix is real. When this is the case, Equation (4) is not directly applicable. The above procedure must instead be applied not to the original data x(t), but to new data x 0 with twice as many time samples, given by x 0 ( t ) = R ( x ( t ) ) for t = 1,…,T and x 0 ( t ) = I ( x ( t T ) ) for t = T + 1,…,2T, where R and I denote the real and imaginary parts of a complex number, respectively. The matrix B which results from applying Equation (4) to x 0 (or B if appropriate) is then applied to the original data x as before, and the remainder of the procedure is similar[28].

3.2 Separation of phase-locked sources

The goal of the IPA algorithm is to separate a set of N fully phase-locked sources which have linearly been mixed. Since these sources have a maximal PLF with each other and the mixture components do not (as motivated in Section 2.4 above and proved in Section 3.3 below), we can unmix them by searching for projections that maximize the resulting PLFs. Specifically, this corresponds to finding a N × N matrix W such that the estimated sources, y(t) = W T x(t) = W T A s(t), have the highest possible PLFs.

The optimization problem that we shall solve is
max W ( 1 λ ) j , k = 1 j > k N ϱ jk 2 + λ log | det W | s.t. w j = 1 , for j = 1 , , N
(6)

where w j is the j th column of W. In the first term, we sum the squared PLFs between all pairs of sources. The second term penalizes unmixing matrices that are close to singular, and λ is a parameter controlling the relative weights of the two terms. This second term serves the purpose of preventing the algorithm from finding, e.g., solutions where two columns j and k of W are colinear, which trivially yields ϱ jk  = 1 (a similar term is used in some ICA algorithms[7]). Each column of W is constrained to have unit norm to prevent trivial decreases of that term.

The optimization problem in Equation (6) is highly non-convex: the objective function is a sum of two terms, each of which is non-convex in the variable W. Furthermore, the unit norm constraint is also non-convex. Despite this, as we show below in Section 3.3, it is possible to characterize all the global maxima of this problem for the case λ=0 and to devise an optimization strategy taking advantage of that result.

The above optimization problem can be tackled through various maximization algorithms. Our choice was to use a gradient ascent algorithm with momentum and adaptive step sizes; after this gradient algorithm has run for 200 iterations, we use the BFGS algorithm implemented in MATLAB to improve the solution. The result of this optimization for the sources shown in Figure1 is shown in Figure2 for λ = 0.1, illustrating that IPA successfully recovers the original sources for this dataset.
Figure 2

The three sources estimated by IPA (left), PLFs between them (middle), and the gain matrix W T A(right). Black squares represent negative values of the gain matrix, while white squares represent positive values. Since the gain matrix is very close to a permutated diagonal matrix, we can conclude that IPA successfully recovered the sources, up to permutation, scaling, and sign change.

3.3 Unicity of solution

In[14], we proved that a few mild assumptions on the sources, which are satisfied in the vast majority of real-world situations, suffice for a useful characterization of the global maxima of Problem (6): it turns out that there are infinitely many such maxima, and that they correspond either to correct solutions (i.e., the original sources up to permutation, scaling, and sign changes) or to singular matrices W. More specifically, we proved the following: assume that we have a set of complex-valued and linearly independent sources denoted by s(t), which have a PLF of 1 with one another. Consider also linear combinations of the sources of the form y(t) = Cs(t) where C is a square matrix of appropriate dimensions. Further assume that the following conditions hold: Then, the only linear combination y(t) = Cs(t) of the sources s(t) in which the PLF between any two components of y is 1 is y(t) = s(t), up to permutation, scaling, and sign changes[14].
  1. 1.

    Neither s j (t) nor y j (t) can identically be zero, for all j.

     
  2. 2.

    C is non-singular.

     
  3. 3.

    The phase lag between any two sources is different from 0 or Π.

     
  4. 4.

    The amplitudes of the sources, a j (t) = |s j (t)|, are linearly independent.

     

3.4 Comparison to ICA

The above result is simple, but some relevant remarks should be made. If the optimum is found using λ = 0 and the second assumption is not violated (or equivalently, det(C) = det(W)det(A) ≠ 0, which is equivalent to det(W) ≠ 0 if A is non-singular), then we can be certain that the correct solution has been found. However, if the optimization is made using λ = 0, there is a possibility that the algorithm will estimate a bad solution where, for example, some of the estimated sources are all equal to one another (in which case the PLFs between those estimated sources is trivially equal to 1). On the other hand, if we use λ ≠ 0 to guarantee that W is non-singular, the unicity result stated above cannot be applied to the complete objective function. We call “non-singular solutions” and “singular solutions” those in which det(W) ≠ 0 and det(W) = 0, respectively. The result expressed in Section 3.3 is thus equivalent to stating that “all non-singular global optima of Equation (6) with λ = 0 correspond to correct solutions”.

This contrasts strongly with ICA, where singular solutions are not an issue, because ICA algorithms attempt to find independent sources and one signal is never independent from itself[7]. In other words, singular solutions always yield poor values of the objective function of ICA algorithms. Here we are attempting to estimate phase-locked sources, and any signal is perfectly phase-locked with itself. Thus, one must always use λ≠0 in the objective function of Equation (6) when attempting to separate phase-locked sources.

We use a simple strategy to deal with this problem. We start by optimizing Equation (6) for a relatively large value of λ(λ = 0.4), and once convergence has been obtained, we use the result as the starting point for a new optimization, this time with λ = 0.2. The same process is repeated with the value of λ halved each time, until five such epochs have been run. The early optimization steps move the algorithm away from the singular solutions discussed above, whereas the final steps are done with a very low value of λ, where the above unicity conditions are approximately valid. As the following experimental results show, this strategy can successfully prevent singular solutions from being found, while making the influence of the second term of Equation (6) on the final result negligible.

4 Experimental results

4.1 Data generation

As mentioned earlier, the main goal of this study is to study the applicability of IPA to real-world electrophysiological data from human brain EEG and MEG. The choice of the data for this study was not trivial, since we need to know the true sources in order to quantitatively measure the quality of the results. On the one hand, to know the actual sources in the brain would require simultaneous data from outside the scalp (EEG or MEG, which would be the mixed signals) and from inside the scalp (intra-craneal recordings, corresponding to the sources). If intra-craneal recordings are not available, results cannot quantitatively be assessed; they can only qualitatively be assessed by experts who can tell whether the extracted sources are meaningful or not. On the other hand, due to their extreme simplicity, synthetic data such as those used so far to illustrate IPA, shown in Figure1, cannot be used to assess the usefulness of the method in real-world situations.

In an attempt to obtain “the best of both worlds”, we have generated a pseudo-real dataset from actual MEG recordings. By doing this, we know the true sources and the true mixing matrix, while still using sources that are of a nature similar to what one observes in real-world MEG. We begin by describing the process that we used to generate a perfectly phase-locked dataset; we then explain how we modified these data to analyze non-perfect cases as well. It is important to stress that the generation process described below has no relation to the one used to generate the data of Figure1, even though both processes generate sources with maximum PLF.

Our first step was to obtain a realistic mixing matrix. To do so, we used the well-known EEGIFT software package[29]. This package includes a real-world sample EEG dataset with 64 channels. Using all the default options of the software package, we extracted 20 independent components from the data of Subject 1 in that dataset. The results that was important for us, in this process, were not the independent components themselves (which were discarded), but rather the 64 × 20 mixing matrix. As discussed in Section 3.1, we have opted for using a square mixing matrix, with little loss of generality. Therefore, we selected N random rows and N random columns of that mixing matrix (without repetition), and formed an N × N mixing matrix from the corresponding values of the original 64 × 20 matrix. We will later show results for datasets ranging from N = 2 to N = 5 sources; in the following, assume, for the sake of concreteness, that N = 4.

Having generated a physiologically plausible mixing matrix, the next step was to generate a set of four sources. For this, we used the MEG dataset studied previously in[30]h, which has 122 channels with 17,730 samples per channel. The sampling frequency is 297 Hz, and the data have already been subjected to low-pass filtering with cutoff at 90 Hz. Since band-pass filtering is a very common preprocessing step in the analysis of MEG data[2022] and is useful for the use of the Hilbert transform, we performed a further band-pass filtering with no phase distortion, keeping only the 18–24 Hz bandi. The resulting filtered data were used to generate a complex signal through the Hilbert transform; these data were whitened as described in Section 3.1, and from the whitened data we extracted the time-dependent amplitudes and phases.

We then selected four random channels of these filtered MEG data. Since none of these MEG recordings were actually phase-locked (recall that they were themselves the result of a mixing process) and we wanted to study the performance on fully phase-locked sources (possibly corrupted by jitter, as explained below), we replaced the phase of the second of these channels with the phase of the first channel with a constant phase lag of Π 6 radians. The phase of the third channel was replaced with the phase of the first channel with a constant phase lag of Π 3 radians, and that of the fourth channel with the phase of the first channel with a lag of Π 2 radians. The amplitudes of the four sources were kept as the original amplitudes of the four random channels themselves. The process is illustrated in Figure3. The above process, including the choice of the 4 × 4 submatrix, was repeated 100 times, with different initializations of the random number generator. This way of constructing the data ensured that the sources were fully phase-locked.
Figure 3

The process used to generate the pseudo-real MEG sources.

We also constructed datasets in which the sources were not perfectly phase locked. For this, we used the same 100 sets of sources, but with those sources now corrupted by phase jitter: each sample t of each source j was multiplied by e i δ j (t), where the phase jitter δ j (t) was drawn from a random Gaussian distribution with zero mean and standard deviation σ. We tested IPA for σ from 0 to 20 degrees, in 5 degrees steps. One example with σ = 5 degrees is shown in Figure4, and one with σ = 20 degrees is shown in Figure5.
Figure 4

Example of a dataset where σ = 5 degrees. Only a short segment of the signals is shown, for clarity. Top row: original sources (left) and PLFs between them (right). Middle row: mixed signals (left) and PLFs between them (right). Bottom row: estimated sources, after manual compensation of permutation, scaling, and sign (left); PLFs between them (middle); and the gain matrix W T A(right). The gain matrix is virtually equal to the identity matrix, indicating a correct separation.

Figure 5

Example of a dataset where σ = 20 degrees. Only a short segment of the signals is shown, for clarity. Top row: original sources (left) and PLFs between them (right). Middle row: mixed signals (left) and PLFs between them (right). Bottom row: estimated sources, after manual compensation of permutation, scaling, and sign (left); PLFs between them (middle); and the gain matrix W T A(right). The gain matrix has significant values outside the diagonal, indicating that a complete separation was not achieved. Nevertheless, the largest values are in the diagonal, corresponding to a partial separation.

Finally, we studied the effect of N on the results of the proposed algorithm. We created 100 datasets similar to the jitterless datasets mentioned earlier, using N = 2,3 and 5. In all of these, and similarly to the data with N = 4, we used sources with phase lags multiple of Π 6 .

4.2 Results

We measured the separation quality using two measures: the Amari performance index (API)[31] and the well-known signal-to-noise ratio (SNR). The API measures how far the gain matrix W T A is from a permutated diagonal matrix; the SNR measures how far the estimated sources are from the true sources. In summary, the API measures the quality of the estimation of the mixing matrix, while the SNR measures the quality of the estimation of the sources themselves.

Figure6 presents the means and standard deviations of these measures for the 100 runs mentioned in Section 4.1, for each of the jitter levels. The results indicate that IPA has very good performance on the jitterless case, in data of this kind, and that this level of performance is approximately maintained even in the presence of low levels of phase jitter, up to 5 degrees of standard deviation. Some deterioration in performance occurs from 5 to 10 degrees of phase jitter standard deviation, but with a SNR of 27 dB and an API below 0.1 the sources can still be considered to be well estimated.
Figure 6

Result of applying IPA to pseudo-real MEG data with N = 4 , with varying phase jitter: SNR (left) and API (right).

The results for high jitter levels (sigma equal to 15 or 20 degrees) show that there is a limit to IPA’s robustness; this limit lies somewhere between 10 and 15 degrees. Equivalently, in terms of the PLF, the algorithm shows good robustness to PLF values smaller than 1 as long as they are above 0.95, but below that value its performance deteriorates progressively up to a PLF of approximately 0.9, at which point only partial separations are obtained.

Figure7 shows the effect of varying the number of sources N. The figure shows that IPA can handle values of N up to N = 5 with only a slight decrease in performance.
Figure 7

Effect of applying IPA to pseudo-real MEG data with varying values of N : SNR (left) and API (right).

Figure7 also shows something rather peculiar: for N = 2, the results are mediocre (with an average API around 0.4)j. This is not an effect of lowering the number of sources N, but rather an indirect effect of the phase lag between the sources. To verify this, we generated datasets of jitterless data with N = 2, using phase lags of Π 12 , 2 Π 12 (the value used in Figure7), 3 Π 12 , and 4 Π 12 (100 datasets for each of these values). Figure8 shows that a phase lag of 2 Π 12 yields poor API values, as we already knew, but 3 Π 12 yields very good values. Naively, one could conclude that when the sources have a phase lag of 2 Π 12 , or less, the separation cannot be accurately performed.
Figure 8

Effect of applying IPA to pseudo-real MEG data with varying phase lags between the sources, with N = 2 : SNR (left) and API (right).

The effect is, however, not so clear-cut. The results for N = 3,4,5 also involve sources with phase lags of Π 6 , but the API values for those experiments are very good. We do not have a solid explanation for this fact; we conjecture that the presence of some pairs of sources with larger phase lags (e.g., for N = 4, the first and third sources have a phase lag of Π 3 and the first and fourth sources have a phase lag of Π 2 ) aids in the separation of all the sources.

5 Discussion

IPA has a parameter, λ, which controls the relative weights given to the optimization of the PLF matrix and to the penalization of close-to-singular solutions. Our optimization procedure starts with a high value of λ, which is lowered as the optimization progresses. We confirmed that this variation of the parameter’s value is necessary: the quality of the results is noticeably degraded if λ is kept at a constant value, no matter how high or low it is. Table1 confirms this: while λ = 0.1, the best fixed value, yields decent results, the results with a varying value of λ are considerably better. Furthermore, although the final epoch in the optimization is not done with λ = 0, we have verified that the results are virtually the same as if we had used λ = 0 at the last epoch.
Table 1

Values of SNR and API for jitterless data with N = 3, for various fixed values of λ , as well as for the varying-lambda strategy detailed in the text

 

λ

0.025

0.05

0.1

0.2

0.4

SNR

Fixed

17.5 ± 21.2

27.5 ± 18.0

34.4 ± 4.3

27.2 ± 3.6

13.5 ± 5.5

 

Varying

  

48.9 ± 8.7

  

API

Fixed

0.795 ± 0.570

0.369 ± 0465

0.048 ± 0.057

0.079 ± 0.027

0.327 ± 0.097

 

Varying

  

0.013 ± 0.015

  

While the best fixed value, λ = 0.1, yields decent results, the results using a varying value of λ are consistently better, with a large margin.

The above paragraph illustrates something already mentioned in Section 3.4: separation of phase-locked sources is a non-trivial change from ICA because there are wrong, singular solutions that yield exactly the same values of the PLF matrix as the correct non-singular solutions. Our approach to distinguish these two types of solutions consists in adding a term depending on the determinant of the matrix W. This approach works correctly, as our results show. However, it is perhaps inelegant to do this through matrix W, instead of doing it directly through the estimated sources. It would be preferable to replace this term with one depending directly on the estimated sources.

The size of the optimization variable, W, is N 2; there are N constraints on this variable, yielding N(N − 1) independent parameters. This means that the IPA algorithm is quadratic in the number of sources N, which is the main reason why we do not present results for N > 5; while running IPA on 100 datasets with N = 2 takes a few hours, doing so for N = 5 takes several days.

The results that we obtained show that IPA can separate perfectly locked MEG-like sources. However, while the phase locking in the jitterless pseudo-real MEG data is perfect, in real MEG data it will probably be less than perfect. This is the reason why we also studied data with phase jitter, which have pairwise PLFs smaller than 1. The results indicate that IPA has some robustness to PLFs smaller than 1, but the sources still need to exhibit considerable phase locking for the separation to be accurate; weaker synchrony results only in partial separation. Note, however, that the partially separated data are usually still closer to the true sources than the original mixtures.

The comments made in the previous paragraph raise an additional optimization challenge: if the true sources have PLFs smaller than 1, optimization of the objective function in Equation (6) can lead to overfitting. The results presented here show that IPA has some robustness to sources which have a PLF smaller than 1, while being stationary (since the phase jitter is stationary, the distribution of the PLF does not vary with time). In real-world cases, it is likely that the PLF is non-stationary: for example, some sources may be phase-locked at the start of the observation period and not phase-locked at its end. While simple techniques such as windowing can be devised to tackle smaller time intervals where stationarity is (almost) verified, one would still need to find a way to integrate the information from different intervals. Such integration is out of the scope of this article.

One interesting extension of this article would be the separation of specific types of systems, such as van der Pol oscillators[27]. For those, fully entrained oscillators may even present a PLF < 1, and a different measure of synchrony, tailored to those oscillators, may need to be used. Such a study would fall out of the scope of this article. Nevertheless, it is expected that additional knowledge of the oscillator type can be exploited to improve the algorithm’s performance or its robustness to deviations from the ideal case.

One can derive a relationship between additive Gaussian noise (e.g., from the sensors) and the phase jitter used throughout this article. Figure5 depicts, in the complex plane, a sample of a noiseless signal x(t) ≡ a(t)e iϕ(t), to which complex noise n(t) is added to form the noisy signal x n  ≡ a(t)e iϕ(t) + n(t)k. That figure also shows n (t), which is the projection of n(t) on the direction orthogonal to x(t), and x n(t) ≡ x(t) + n (t). Also depicted are ϕ(t), ϕ n (t) and ϕ n(t), which are defined as the phases of x(t), x n (t) and x n(t), respectively.

It can easily be shown that, if |n(t)| << |x(t)| = a(t), then ϕ n ( t ) ϕ n ( t ) ϕ ( t ) + n ( t ) a ( t ) [32]. This is an important relationship, because it shows that, under additive noise, portions of the signal with a large amplitude will have a better phase estimate than portions with a small amplitude, in which even small amounts of additive noise can severely disrupt the phase estimation. We thus believe that the PLF quantity, while attractive and elegant in theory, and despite working well with low amounts of additive noise[14], will probably need to be changed to factor in the amplitude in an appropriate way to deal with applications where considerable amounts of additive noise are present.
Figure 9

Diagram illustrating the relationship between phase jitter and additive noise. A single time sample is shown, and the time argument has been dropped for simplicity.

6 Conclusion

We have shown that IPA can successfully separate phase-locked sources from linear mixtures in pseudo-real MEG data. We showed that IPA tolerates deviations from the ideal case, yielding excellent results for low amounts of phase jitter, and that it exhibits some robustness to moderate amounts of phase jitter. We also showed that it can handle numbers of sources up to N = 5. We believe that these results bring us closer to the goal of successfully separating phase-locked sources in real-world signals.

Endnotes

aIn EEG and MEG, the sources are not individual neurons, whose oscillations are too weak to be detected from outside the scalp. In these cases, the sources are populations of closely located neurons oscillating together.bThe term “real-valued” is used here to distinguish from other phase-based algorithms where a complex quantity is used[14].cTechnically, this condition could be violated in a set with zero measure. Since we will deal with a discrete and finite number of time points, no such sets exist and this technicality is not important.dWe will also show results where this phase difference is not exactly constant; see Figure6.eThese assumptions are not as restrictive as they may sound; see Section 3.1.fThis is usually called the over-determined case. The under-determined case, where A has fewer rows than columns, is more difficult and is not addressed here.gThere are more rigorous criteria that can be used to choose N. Two very popular methods are the Akaike information criterion and the minimum description length. It is out of the scope of this article to discuss these two criteria; the reader is referred to[7] and references therein for more information.hFreely available from http://research.ics.tkk.fi/ica/ eegmeg/MEG_data.html.iThe choice of this specific band is rather arbitrary. The band is narrow enough that the Hilbert transform will allow correct estimation of instantaneous amplitude and phase, but wide enough that the instantaneous frequency of the signals retains some variability. The passband is also of a similar width as in typical studies using MEG[20].jIt might appear contradictory that the average SNR has a good value, 40 dB, when the average API has a mediocre score. In reality, when the standard deviation of the SNR is very high, it is usually an indication that the separation is poor. As an example, consider a case where one source is very well estimated, with an SNR of 80 dB, and one is poorly estimated, with an SNR of 0 dB. The average SNR would be 40, but with a very high standard-deviation. Good values of the average SNR are indicators of a good separation only when the standard-deviation of the SNR is small.kIn most real applications, one will be dealing with models consisting of real signals to which real-valued noise is added. However, the linearity of the Hilbert transform allows the same type of analysis for that case as for the case of complex signals with complex additive noise which is considered here.

Declarations

Acknowledgements

This work was partially supported by project DECA-Bio of Instituto de Telecomunicacoes, PEst-OE/EEI/LA0008/2011.

Authors’ Affiliations

(1)
Instituto de Telecomunicacoes
(2)
Aalto University School of Science

References

  1. A Pikovsky M, Rosenblum J: Kurths, Synchronization: A Universal Concept in Nonlinear Sciences. Cambridge, MA: Cambridge Nonlinear Science Series (Cambridge University Press); 2001.View ArticleGoogle Scholar
  2. Palva JM, Palva S, Kaila K: Phase synchrony among neuronal oscillations in the human cortex. J. Neurosci 2005, 25(15):3962-3972. 10.1523/JNEUROSCI.4250-04.2005View ArticleGoogle Scholar
  3. Schoffelen JM, Oostenveld R, Fries P: Imaging the human motor system’s beta-band synchronization during isometric contraction. NeuroImage 2008, 41: 437-447. 10.1016/j.neuroimage.2008.01.045View ArticleGoogle Scholar
  4. Uhlhaas PJ, Singer W: Neural synchrony in brain disorders: relevance for cognitive dysfunctions and pathophysiology. Neuron 2006, 52: 155-168. 10.1016/j.neuron.2006.09.020View ArticleGoogle Scholar
  5. Nunez PL, Srinivasan R, Westdorp AF, Wijesinghe RS, Tucker DM, Silberstein RB, Cadusch PJ: EEG coherency I: statistics, reference electrode, volume conduction, Laplacians, cortical imaging, and interpretation at multiple scales. Electroencephalogr. Clin. Neurophysiol 1997, 103: 499-515. 10.1016/S0013-4694(97)00066-7View ArticleGoogle Scholar
  6. Vigário R, Särelä J, Jousmäki V, Hämäläinen M, Oja E: Independent component approach to the analysis of EEG and MEG recordings. IEEE Trans. Biomed. Eng 2000, 47(5):589-593. 10.1109/10.841330View ArticleGoogle Scholar
  7. Hyvärinen A, Karhunen J, Oja E: Independent Component Analysis. New York: Wiley; 2001.View ArticleGoogle Scholar
  8. Akhtar M, Mitsuhashi W, James C: Employing spatially constrained ICA and wavelet denoising for automatic removal of artifacts from multichannel EEG data. Signal Process 2012, 92: 401-416. 10.1016/j.sigpro.2011.08.005View ArticleGoogle Scholar
  9. de Vos M, de Lathauwer L, van Huffel S: Spatially constrained ICA algorithm with an application in EEG processing. Signal Process 2011, 91: 1963-1972. 10.1016/j.sigpro.2011.02.019View ArticleGoogle Scholar
  10. Lee D, Seung H: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst 2001, 13: 556-562.Google Scholar
  11. Chan TH, Ma WK, Chi CY, Wang Y: A convex analysis framework for blind separation of non-negative sources. IEEE Trans. Signal Process 2008, 56: 5120-5134.MathSciNetView ArticleGoogle Scholar
  12. de Frein R, Rickard S: The synchronized short-time-Fourier-transform: properties and definitions for multichannel source separation. IEEE Trans. Signal Process 2011, 59: 91-103.MathSciNetView ArticleGoogle Scholar
  13. Hosseini S, Deville Y, Saylani H: Blind separation of linear instantaneous mixtures of non-stationary signals in the frequency domain. Signal Process 2009, 89: 819-830. 10.1016/j.sigpro.2008.10.024View ArticleGoogle Scholar
  14. Almeida M, Schleimer JH, Bioucas-Dias J, Vigário R: Source separation and clustering of phase-locked subspaces. IEEE Trans. Neural Netw 2011, 22(9):1419-1434.View ArticleGoogle Scholar
  15. Almeida M, Bioucas-Dias J, Vigário R: Independent phase analysis: separating phase-locked subspaces. Proceedings of the International Conference on Independent Component Analysis and Signal Separation 2010, 189-196.Google Scholar
  16. Ziehe A, Müller KR: TDSEP—an efficient algorithm for blind separation using time structure. International Conference on Artificial Neural Networks 1998, 675-680.Google Scholar
  17. Torrence C, Compo GP: A practical guide to wavelet analysis. Bull. Am. Meteorol. Soc 1998, 79: 61-78. 10.1175/1520-0477(1998)079<0061:APGTWA>2.0.CO;2View ArticleGoogle Scholar
  18. Oppenheim AV, Schafer RW, Buck JR: Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall International Editions; 1999.Google Scholar
  19. Quyen MLV, Foucher J, Lachaux JP, Rodriguez E, Lutz A, Martinerie J, Varela FJ: Comparison of Hilbert transform and wavelet methods for the analysis of neuronal synchrony. J. Neurosci. Methods 2001, 111: 83-98. 10.1016/S0165-0270(01)00372-7View ArticleGoogle Scholar
  20. Varela F, Lachaux JP, Rodriguez E, Martinerie J: The Brainweb: phase synchronization and large-scale integration. Nat. Rev. Neurosci 2001, 2: 229-239.View ArticleGoogle Scholar
  21. Niedermeyer E, da Silva FHL: Electroencephalography: Basic Principles, Clinical Applications, and Related Fields. Philadelphia: Lippincott Williams and Wilkins; 2005.Google Scholar
  22. Nunez P, Srinivasan R: Electric Fields of the Brain: the Neurophysics of EEG. New York: Oxford University Press; 2006.View ArticleGoogle Scholar
  23. Gold B, Oppenheim AV, Rader CM: Theory and implementation of the discrete Hilbert transform. Symposium on Computer Processing in Communications 1973.Google Scholar
  24. Breakspear M, Heitmann S, Daffertshofer A: Generative models of cortical oscillations: neurobiological implications of the Kuramoto model. Front. Human Neurosci 2010, 4: 190-202.View ArticleGoogle Scholar
  25. Kuramoto Y: Chemical Oscillations, Waves and Turbulences. Berlin: Springer; 1984.View ArticleGoogle Scholar
  26. Strogatz S: Nonlinear Dynamics and Chaos. Boulder: Westview Press; 2000.Google Scholar
  27. Izhikevich E: Dynamic Systems in Neuroscience. Cambridge, MA: MIT Press; 2007.Google Scholar
  28. Almeida M, Vigário R, Bioucas-Dias J: The role of whitening for separation of synchronous sources. Proceedings of the International Conference on Latent Variable Analysis and Signal Separation 2012, 139-146.View ArticleGoogle Scholar
  29. Eichele T, Rachakonda S, Brakedal B, Eikeland R, Calhoun VD: EEGIFT: group independent component analysis for event-related EEG data. Comput. Intell. Neurosci 2011, 2011: 1-9.View ArticleGoogle Scholar
  30. Vigário R, Jousmäki V, Hämäläinen M, Hari R, Oja E: Independent component analysis for identification of artifacts in magnetoencephalographic recordings. Advances in NIPS 1997.Google Scholar
  31. Amari S, Cichocki A, Yang HH: A new learning algorithm for blind signal separation. Advances in NIPS 1996, 757-763.Google Scholar
  32. Carlson A, Crilly P, Rutledge J: Communication Systems: An Introduction to Signals and Noise in Electrical Communication. New York: McGraw-Hill; 2001.Google Scholar

Copyright

© Almeida et al.; licensee Springer. 2013

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.