2.1 Notation and definitions
Vectors are denoted by boldface lower case letters, e.g. a. Matrices are denoted by boldface capital letters, e.g. A, while tensors are denoted by calligraphic letters, e.g.
. An entry of a vector a, a matrix A, or a tensor
is denoted by a
i
, ai,j, ai,j,k, etc., depending on the number of modes. Mode -n vectors are the generalisation of matrix rows and columns to tensors. A mode -n vector is a vector in which all but one of the indices are fixed. The Kronecker product of two matrices A and B is denoted by .
Definition 1.
The mode-n product of a tensor with a matrix is denoted as and is of size I1× ⋯ × In-1× J × In+1× ⋯ × I
N
. The entries of the mode-n product are defined as
(1)
Definition 2.
The outer product of a tensor and a tensor is the tensor defined by
(2)
for all different values of the indices.
Definition 3
The Khatri-Rao product of two matrices and is defined as .
Definition 4
The k-rank of a matrix A, denoted as k
A
, is defined as the maximum value k such that any k columns of A are linearly independent.
Definition 5
The mode -n matricisation A(n) of an N th-order tensor maps the tensor element with indices (i1,…,i
N
) to a matrix element (i
n
,j) such that
(3)
(4)
2.2 Tensor decompositions
2.2.1 Canonical polyadic decomposition
CPD approximates a third-order tensor with a sum of R rank-1 tensors:
(5)
CPD is visualised in Figure 1. Note that the definition is formulated for third-order tensors; however, the model can be extended to higher-order tensors in a straightforward manner. The rank of the tensor is defined as the smallest R for which (5) is exact. Let A = [a1…a
R
], B = [b1…b
R
] and C = [c1…c
R
] be the factor matrices corresponding to each mode. Then, CPD can be alternatively written as
(6)
The advantage of the CPD model is its uniqueness up to permutation and scaling under mild conditions [28]:
(7)
A more general framework for uniqueness has been recently presented in [29, 30].
2.2.2 Block term decomposition
The rank- (L
r
,L
r
,1) block term decomposition [21, 22] of a third-order tensor into a sum of rank- (L
r
,L
r
,1) terms (1 ≤ r ≤ R) is given as
(8)
in which the matrix has rank L
r
and the vector c
r
is nonzero. In addition to permutation and scaling, inherited from the CPD, the factors A
r
may be postmultiplied by any nonsingular matrix , provided that is premultiplied by the inverse of F
r
. When the matrices [A1…A
R
] and [B1…B
R
] are full column rank and the matrix [c1…c
R
] does not contain collinear columns, the decomposition is guaranteed to be unique up to the above indeterminacies. Figure 2 visualises the decomposition of a tensor in rank- (L
r
,L
r
,1) terms. Note that BTD- (L
r
,L
r
,1) is a generalisation of CPD for third-order tensors.
2.2.3 Algorithms
Different types of algorithms have been derived and discussed in the literature for tensor decompositions. The Alternating Least Squares (ALS) algorithm [31] was proposed for calculating CPD by updating the factor matrices in an alternating manner. Other computational schemes, such as Nonlinear Least Squares (NLS) [32], offer better robustness for difficult decompositions (notably, when the terms in the decomposition are somewhat collinear) and can improve the linear convergence rate of ALS to a quadratic rate. Each NLS step can be interpreted as starting from an ALS update that updates all factor matrices simultaneously, which is then iteratively refined with a preconditioned conjugate gradient algorithm so that it approximates the Newton step. Here, we used the NLS implementation of CPD and BTD- (L
r
,L
r
,1) available in Tensorlab[33].
2.3 Tensor construction
Multichannel EEG data naturally take the form of a matrix , where S and Ch correspond to the number of samples and channels, respectively. Below, we present two different approaches to extend this to a tensorial representation by expanding the time course into an extra dimension, with the aim of conveying additional information about the signal.
2.3.1 Wavelet expansion
As the frequency content of EEG signals carries crucial information, wavelet transformation is often used to expand the EEG matrix into a tensor , where F is the number of wavelet scales or frequencies [11, 13, 34, 35]. Before wavelet transformation, the EEG data is normalised by subtracting the mean and dividing each channel signal by its standard deviation. Note that after decomposition, the scalp potentials are multiplied again with this standard deviation in order to preserve topographic information. Continuous wavelet transform (CWT) was performed using the Mexican hat wavelet of 30 scales, corresponding to a linear range of frequencies between 1 to 30 Hz. After tensor decomposition, the different modes describe the spatial, spectral and temporal signatures of the components. The source signals can be reconstructed by an inverse CWT (ICWT) of the retrieved time-frequency planes. We will refer to a BTD decomposition performed on tensors obtained by wavelet expansion as CWT-BTD.
2.3.2 Hankel expansion
EEG signals can be modelled as the sum of exponentially damped sinusoids [23–25]. Such signal model allows unique blind source separation in rank- (L
r
,L
r
,1) terms. A detailed proof of this concept is presented in [27]. Below, we give a brief overview of the main considerations. We assume that the underlying EEG sources can be expressed as the sum of exponentials:
(9)
This model also subsumes that the sources might be exponentially damped sinusoids:
(10)
To exploit the desired structure, each EEG channel signal a
ch
= [a
ch
(1) a
ch
(2) ⋯ a
ch
(S)], c h = 1,…,C h is mapped to a Hankel matrix with if N is odd, or and if N is even. The Hankel matrix is structured as follows:
Since this mapping is linear and assuming that the channel signals are linear combinations of the underlying sources, the above matrix is the linear combination of the Hankel matrices associated with the sources. If the source s
r
(n) can be written as (9), its associated Hankel matrix H
r
admits the Vandermonde decomposition:
(11)
where and are, respectively,
(12)
Assuming that I,J ≥ m a x(L1,L2,…,L
R
), and considering the fact that a Vandermonde matrix generated by distinct poles is full rank, H
r
is rank- L
r
. Therefore, (8) solves the blind source separation problem if the underlying sources follow the structure described in (9).
For example, the Hankel matrix of a pure exponential is rank 1, while the one of a pure sinusoids or an exponentially damped sinusoid is rank -2. Noisy or nonstationary signals such as chirps give rise to Hankel matrices of higher rank. Before creating the Hankel matrices, the EEG channel signals are divided by their standard deviation. Note that the mean is not subtracted here as this could introduce an additional pole. There are two ways to interpret the sources retrieved by this decomposition. First, one can reconstruct the source time course by taking the mean along the anti-diagonals of the retrieved matrix. Alternatively, one can retrieve the poles generating each source using the reconstructed Hankel matrices. The consecutive algorithmic steps of retrieving the signal poles from the Hankel matrices are given, e.g. in [36]. However, in this paper, we restricted ourselves to the first method. We will refer to a BTD decomposition performed on tensors obtained by Hankel expansion as H-BTD.
2.4 Model selection
Certain model parameters have to be determined prior to performing blind source separation. The number of extracted components or terms R have to be chosen for both CPD and BTD. Additionally, the rank of each mode needs to be set for BTD. In case of BTD- (L
r
,L
r
,1), this means to determine which mode should be rank-1 and choose the rank L
r
for the two other modes. If not stated otherwise, we set L1= L2= … = L
R
.
Several procedures have been proposed for automatic model selection in tensor decompositions. For CPD type models, the core consistency diagnostic [37] seems to be the most powerful approach [38] and has been successfully used to guide the blind source separation of epilepsy tensors [11, 13].
The core consistency diagnostic is based on the following principle. The CPD model can be formulated as a restricted Tucker model where the core tensor has nonzero values only on its superdiagonal. Considering the Tucker model as a regression of a tensor onto subspaces defined by the factor matrices, it is clear that a CPD model is appropriate, if the least squares fitted core tensor on the CPD factors has off-diagonal elements close to zero. The optimal number of CPD components is the last one in a series of models with increasing number of components, where the least squares fitted core tensor is still similar with the ideal Tucker core tensor.
The parameter selection for more flexible tensor models such as BTD- (L
r
,L
r
,1) is the topic of still ongoing research (see Section 4 for an overview) and is out of the scope of this paper. Our aim is rather to give an insight to the sensitivity of CPD and BTD to the different parameters and to illustrate what can be achieved with well-chosen model parameters.
Therefore, we simulated various ictal activity patterns superimposed on artefacts and background activity. The signals were subsequently decomposed with CPD and BTD using a wide range of values for each model parameter in order to investigate the impact of the chosen model parameters.
2.5 Simulation study
EEG activity of 2-s length was simulated in different scenarios following [14].
Scenario i Stationary seizure: One dipole with a sinusoidally varying moment at 5.7 Hz, located at coordinates (x,y,z) = (-0.5,0,0.1) with orientation (1,0,0), where x, y and z indicate left ear to right ear, posterior to anterior and from down upwards through the Cz electrode, respectively. Throughout the text, we might refer to the ictal source in this scenario as ‘source with stationary frequency’ or as ‘sinusoidal source’.
Scenario ii Seizure with varying frequency: One dipole with a moment of linearly decreasing frequency from 8 to 4 Hz located at coordinates (x,y,z) = (-0.5,0,0.1) with orientation (1,0,0). Throughout the text, we might refer to the ictal source of this scenario as ‘source with evolving frequency’ or as ‘chirp source’.
Scenario iii Seizure with varying localisation: Two dipoles, each with a sinusoidally varying moment at 5.7 Hz located at (x,y,z) = (-0.5,-0.2,0.1) and (x,y,z) = (-0.5,0.5,0.1), i.e. 6.4 cm from each other. The orientation of both dipoles is (1,0,0). While the activity of the first dipole gradually decreased, the activity of the second dipole increased in amplitude.
The forward problem was solved for each scenario in a three-shell spherical head model consisting of a brain, a skull and a scalp compartment [39]. The ratio between the conductivities of the brain, skull and scalp compartment was equal to 1:1/16:1, respectively [40], where the conductivity of the brain and scalp was 3.3 · 10-4Ω/m m[41]. The radii of the outer boundary of the brain, skull and scalp compartments were set to 8, 8.5 and 9.2 cm, respectively. The forward solution was computed for 21 electrodes placed according to the 10/20 system with two additional electrodes over the temporal region. The time course of the scalp potentials was stored in a 500 × 21 dimensional matrix A, representing 2 s of EEG with sample frequency of 250 Hz. Awake background EEG activity was recorded with the same electrode configuration from a healthy subject. Muscle artefacts were separated from a contaminated segment of background activity using BSS-CCA [7]. Subsequently, the muscle artefacts were superimposed on a clean background EEG segment, and the data was stored in a noise matrix B. In the simulation study, the noise matrix B was superimposed on the signal matrix A containing the ictal activity: X (λ) = A + λ · B with . We varied the parameter λ resulting in various signal-to-noise ratio (SNR) levels, quantified as
(13)
where the root mean square value (RMS) of a signal matrix consisting of Ch channels and S samples, is defined as
(14)
The noisy ictal EEG segments were expanded with the wavelet or Hankel method and were subsequently decomposed with CPD and BTD in order to extract the ictal component. Note that CPD was not applied on tensors obtained with Hankel expansion, as the Hankel matrix of a sinusoidal or chirp signal is always different from rank-1. The component corresponding to the ictal source was selected automatically as the one showing the lowest root mean square error (RMSE) in spatial distribution with the simulated ictal source. Subsequently, one dipole was fitted on the extracted ictal source signal to compute the localisation error. The goal of the simulation study was to assess the robustness of each method against noise. Furthermore, as explained above, it also serves to investigate the impact of different choices of model parameters and ultimately to determine the optimal model parameters.
2.6 Clinical examples
Ictal EEG recordings were selected from the database used in [13, 42]. The original database consisted of 37 refractory partial epilepsy patients who underwent full presurgical evaluation including seizure semiology, structural MRI, interictal EEG, subtraction of ictal SPECT coregistered with MRI (SISCOM) and neuropsychological assessment. A patient was included in the database if all measurements were concordant and reliably defined the epileptogenic zone. In a majority of cases, the seizure onsets were correctly localised using CPD of wavelet-transformed EEG tensors [13]. In these cases, the trilinear signal model assumed by CPD is sufficient; therefore, we do not expect an improvement using BTD. However, in cases where no perfect separation was obtained by CPD due to severe artefacts, BTD might provide improved results. Although [13] focussed on localising the seizure onset zone, one might be interested in modelling other aspects of the seizures, such as its evolution in morphology or topography. As opposed to CPD, BTD can model such nonstationary sources. Here, we will discuss the following patients, each representing a particular case (severe artefacts or presence of nonstationarities), where we expect that BTD can provide more appropriate signal models than CPD.
2.6.1 Patient 1
Patient 1 suffers from right temporal lobe epilepsy. The seizure consists of 5- to 6-Hz activity lateralised to the right, most prominently present over the right anterior and midtemporal region (F8, T4 and right sphenoidal channels). Severe eye blinks and muscle artefacts are superimposed on the low-voltage ictal activity at onset (Figure 3a). Our aim here is to separate the seizure activity from the artefacts and background using a 2-s EEG segment at onset and thereby localise the seizure onset zone as in [13]. The window length of 2 s was chosen considering that the number of samples provide sufficient amount of information about the signal, but it is short enough to assume that the seizure does not spread yet from the onset region. As we are interested in the exact onset localisation of the seizure, the spatial mode of BTD is chosen to be rank-1, while the frequency and temporal modes are higher rank.
2.6.2 Patient 2
Patient 2 suffers from left temporal lobe epilepsy. The seizure starts with a 4-Hz delta rhythm which is most prominent over the left anterior and midtemporal region (F7, T3 and left sphenoidal channel). Eleven seconds after onset, the seizure pattern evolves in amplitude and frequency into a sharp, up to 8-Hz theta activity. Our aim here is to correctly model the frequency evolution of the seizure. Therefore, the frequency and temporal modes of the BTD is chosen to be higher rank while we assume a stationary localisation, i.e. rank-1 spatial mode. As the transition takes place over a longer period of time, here, we use a 10-s long EEG segment, shown in Figure 4a.
2.6.3 Patient 3
Patient 3 suffers from right temporal lobe epilepsy. The seizure starts with a high amplitude 4-Hz delta activity over the right anterior, mid- and posterior temporal regions (F8, T4, T6 and right sphenoidal channels). After 14 s, the seizure activity spreads to the bi-fronto-central region. Our aim here is to correctly model the spatial spread of the seizure using a 10-s EEG segment shown in Figure 5a. Therefore, the spatial and temporal mode of the BTD is chosen to be higher rank, and we assume a stationary frequency, i.e. rank-1 frequency mode.
2.7 Evaluation criteria
The goodness of the model fit is evaluated in terms of several measures. For scenarios i and ii, where the ictal pattern had fixed topography, the RMSE between the spatial distribution of the simulated ictal pattern and the spatial signature of the extracted ictal source was computed. Moreover, the RMSE between the time×frequency matrices, computed as the product of the mode-2 and mode-3 factors, was also taken into account. Similarly, RMSE between the Hankel matrices of the simulated and extracted ictal sources was assessed as well. Finally, the RMSE between the simulated and reconstructed EEG time courses was investigated. The source time courses were reconstructed using inverse wavelet transform from the time×frequency matrices in case of CWT-BTW and by averaging along the antidiagonals of the Hankel matrices in case of H-BTD.
For scenario iii, where the ictal pattern has varying topography, the spatial and temporal signatures cannot be interpreted independently. Therefore, the EEG was reconstructed from the ictal sources and dipoles were fitted on the reconstructed data. The goodness of the decomposition was evaluated in terms of the dipole localisation error.
In the clinical examples, the true underlying ictal source is quantitatively not known. The clinical description of the ictal patterns contains information on the channels where the seizure onset is observed, with additional information on the frequency and the morphology of the seizure pattern. Therefore, the extracted ictal sources are visually inspected and compared to the written qualitative clinical description.