A regularized matrix factorization approach to induce structured sparse-low-rank solutions in the EEG inverse problem

Montoya-Martínez, Jair; Artés-Rodríguez, Antonio; Pontil, Massimiliano; Hansen, Lars Kai

doi:10.1186/1687-6180-2014-97

Research
Open access
Published: 24 June 2014

A regularized matrix factorization approach to induce structured sparse-low-rank solutions in the EEG inverse problem

Jair Montoya-Martínez¹,
Antonio Artés-Rodríguez¹,
Massimiliano Pontil² &
…
Lars Kai Hansen³

EURASIP Journal on Advances in Signal Processing volume 2014, Article number: 97 (2014) Cite this article

2674 Accesses
5 Citations
Metrics details

Abstract

We consider the estimation of the Brain Electrical Sources (BES) matrix from noisy electroencephalographic (EEG) measurements, commonly named as the EEG inverse problem. We propose a new method to induce neurophysiological meaningful solutions, which takes into account the smoothness, structured sparsity, and low rank of the BES matrix. The method is based on the factorization of the BES matrix as a product of a sparse coding matrix and a dense latent source matrix. The structured sparse-low-rank structure is enforced by minimizing a regularized functional that includes the ℓ₂₁-norm of the coding matrix and the squared Frobenius norm of the latent source matrix. We develop an alternating optimization algorithm to solve the resulting nonsmooth-nonconvex minimization problem. We analyze the convergence of the optimization procedure, and we compare, under different synthetic scenarios, the performance of our method with respect to the Group Lasso and Trace Norm regularizers when they are applied directly to the target matrix.

1 Introduction

The solution of the electroencephalographic (EEG) inverse problem to obtain functional brain images is of high value for neurological research and medical diagnosis. It involves the estimation of the Brain Electrical Sources (BES) distribution from noisy EEG measurements, whose relation is modeled according to the linear model

\begin{array}{lcr} Y = AS + E, \end{array}

(1)

where $Y \in R^{M \times T}$ and $A \in R^{M \times N}$ are known and represent, respectively, the EEG measurements matrix and the forward operator (a.k.a lead field matrix), $S \in R^{N \times T}$ denotes the BES matrix, and $E \in R^{M \times T}$ is a noise matrix. M denotes the number of EEG electrodes, N is the number of brain electrical sources, and T is the number of time instants.

This estimation problem is very challenging: N≫M, and the existence of silent BES (BES that produce nonmeasurable fields on the scalp surface) implies that the EEG inverse problem has infinite solutions: a silent BES can always be added to a solution of the inverse problem without affecting the EEG measurements. For all these reasons, the EEG inverse problem is an undetermined ill-posed problem [1–4].

A classical approach to solve an ill-posed problem is to use regularization theory, which involves the replacement of the original ill-posed problem with a ‘nearby’ well-posed problem whose solution approximates the required solution [5]. Solutions developed by this theory are stated in terms of a regularization function, which helps us to select, among the infinite solutions, the one that best fulfills a prescribed constrain (e.g., smoothness, sparsity, and low rank). To define the constrain, we can use mathematical restrictions (minimum norm estimates) or anatomical, physiological, and functional prior information. Some examples of useful neurophysiological information are [1, 6]: the irrotational character of the brain current sources, the (smooth) dynamic of the neural signals, the clusters formed by neighboring or functional related BES, and the smoothness and focality of the electromagnetic fields generated and propagated within the volume conductor media (brain cortex, skull, and scalp).

Several regularization functions have been proposed in the EEG community: Hämäläinen and Ilmoniemi in [7] proposed a squared Frobenius norm penalty ( $∥ S ∥_{F}^{2}$ ), which they named Minimum Norm Estimate (MNE). This regularization function usually induces solutions that spread over a considerable part of the brain. Uutela et al. in [8] proposed an ℓ₁-norm penalty (∥S∥₁). They named their approach Minimum Current Estimate (MCE). This penalty function promotes solutions that tend to be scattered around the true sources. Mixed ℓ₁ℓ₂-norm penalties have also been proposed in the framework of the time basis, time frequency dictionaries, and spatial basis decomposition. These mixed norm approaches induce structured sparse solutions and depend on decomposing the BES signals as linear combinations of multiple basis functions, e.g., Ou et al. in [9] proposed the use of temporal basis functions obtained with singular value decomposition (SVD), Gramfort et al. in [10, 11] proposed the use of time-frequency Gabor dictionaries, and Haufe et al. in [12] proposed the use of spatial basis Gaussian functions. For a more detailed overview on inverse methods for EEG, see [2, 3, 13] and references therein. For a more detailed overview on regularization functions applied to structured sparsity problems, see [14–16] and references therein.

All of these regularizers try to induce neurophysiological meaningful solutions, which take into account the smoothness and structured sparsity of the BES matrix: during a particular cognitive task, only the BES related with the brain area involved in such a task will be active, and their corresponding time evolution will vary smoothly, that is, the BES matrix will have few nonzero rows, and in addition, the columns will vary smoothly. In this paper, we propose a regularizer that takes into account not only the smoothness and structured sparsity of the BES matrix but also its low rank, capturing this way the linear relation between the active sources and their corresponding neighbors. In order to do so, we propose a new method based on matrix factorization and regularization, with the aim of recovering the latent structure of the BES matrix. In the factorization, the first matrix, which acts as a coding matrix, is penalized using the ℓ₂₁-norm, and the second one, which acts as a dense, full rank latent source matrix, is penalized using the squared Frobenius norm.

In our approach, the resulting optimization problem is nonsmooth and nonconvex. A standard approach to deal with the nonsmoothness introduced by the nonsmooth regularizers mentioned above is to reformulate the regularization problem as a second-order cone programming (SOCP) problem [12] and use interior point-based solvers. However, interior point-based methods can not handle large scale problems, which is the case of large EEG inverse problems involving thousands of brain sources. Another approach is to try to solve the nonsmooth problem directly, using general nonsmooth optimization methods, for instance, the subgradient method [17]. This method can be used if a subgradient of the objective function can be computed efficiently [14]. However, its convergence rate is, in practice, slow ( $O (1 / \sqrt{k})$ ), where k is the iteration counter. In this paper, in order to tackle the nonsmoothness of the optimization problem, we depart from these optimization methods and use instead efficient first-order nonsmooth optimization methods [5, 18, 19]: forward-backward splitting methods. These methods are also called proximal splitting because the nonsmooth function is involved via its proximity operator. Forward-backward splitting methods were first introduced in the EEG inverse problem by Gramfort et al. [10, 11, 13], where they used them to solve nonsmooth optimization problems resulting from the use of mixed ℓ₁ℓ₂-norm penalties functions. These methods have drawn, increasing attention in the EEG, machine learning, and signal processing community, especially because of their convergence rates and their ability to deal with large problems [19–21].

On the other hand, in order to handle the nonconvexity of the optimization problem, we use an iterative alternating minimization approach: minimizing over the coding matrix while maintaining fixed the latent source matrix and viceversa. Both of these optimization problems are convex: the first one can be solved using proximal splitting methods, while the second one can be solve directly in terms of a matrix inversion.

The rest of the paper is organized as follows. In Section 2, we give an overview of the EEG inverse problem. In Section 3, we present the mathematical background related with the proximal splitting methods. The resulting nonsmooth and nonconvex optimization problem is formally described in Section 4. In Section 5, we propose an alternating minimization algorithm, and its convergence analysis is presented in Section 6. Section 7 is devoted to the numerical evaluation of the algorithm and its comparison with the Group Lasso and Trace Norm regularizers, which consider partially the characteristics of the matrix S: its structured sparsity by using the ℓ₂₁-norm and its low rank by using the ℓ_∗-norm, respectively. The advantages of considering both characteristics in a single method, like in the proposed one, become clear in comparison with the independent use of the Group Lasso and Trace Norm regularizers. Finally, conclusions are presented in Section 8.

2 EEG inverse problem background

The EEG signals represent the electrical activity of one or several assemblies of neurons [22]. The area of a neuron assembly is small compared to the distance to the observation point (the EEG sensors). Therefore, the electromagnetic fields produced by an active neuron assembly at the sensor level are very similar to the field produced by a current dipole [23]. This simplified model is known as the equivalent current dipole (ECD). These ECDs are also known by other names such as BES and current sources. Due to the uniform spatial organization of their dendrites (perpendicular to the brain cortex), the pyramidal neurons are the only neurons that can generate a net current dipole over a piece of cortical surface, whose field is detectable on the scalp [3]. According to [24], it is necessary to add the field of ∼10⁴ pyramidal neurons in order to produce a voltage that is detectable on the scalp. These voltages can be recorded by using different types of electrodes [22], such as disposable (gel-less, and pre-gelled types), reusable disc electrodes (gold, silver, stainless steel, or tin), headbands and electrode caps, saline-based electrodes, and needle electrodes.

Under the quasi-static approximation of Maxwell’s equations, we can express the general model for the observed EEG signals y(t) at time t as linear functions of the BES s(t) [9]:

y (t) = As (t) + e (t),

(2)

where $y (t) \in R^{M \times 1}$ is the EEG measurements vector, $s (t) \in R^{N \times 1}$ is the BES vector, $e (t) \in R^{M \times 1}$ is the noise vector, and $A \in R^{M \times N}$ is the lead field matrix. In a typical experimental setup, the number of electrodes (M) is ∼10², and the number of BES (N) is ∼10³, 10⁴. We can express the former model for all time instants {t₁,t₂,…,t_T} (corresponding to some observation time window) by using the matrix formulation (1), where $Y = [y (t_{1}), y (t_{2}), \dots, y (t_{T})] \in R^{M \times T}$ , $S = [s (t_{1}), s (t_{2}), \dots, s (t_{T})] \in R^{N \times T}$ , and $E = [e (t_{1}), e (t_{2}), \dots, e (t_{T})] \in R^{M \times T}$ . The i th row of the matrix Y represents the electrical activity recorded by the i th EEG electrode during the observation time window. In the BES matrix S, each row represents the time evolution of one brain electrical source, and each column represents the activity of all the corresponding sources in a particular time instant. Finally, the forward operator A summarizes the geometric and electric properties of the conducting media (brain, skull, and scalp) and establishes the link between the current sources and EEG sensors (A_{i
j} tells us how the j th BES influences the measure obtained by the i th electrode). Following this notation, the EEG inverse problem can be stated as follows: Given a set of EEG signals (Y) and a forward model (A), estimate the current sources within the brain (S) that produce these signals.

3 Mathematical background

3.1 Proximity operator

The proximity operator [19, 25] corresponding to a convex function f is a mapping from $R^{n}$ to itself and is defined as follows:

\begin{array}{lcr} {prox}_{f} (z) = \underset{x \in R^{n}}{argmin} \{f (x) + \frac{1}{2} ∥ x - z ∥^{2}\}, \end{array}

(3)

where ∥·∥ denotes the Euclidean norm. Note that the proximity operator is well defined, because the above minimum exists and is unique (the objective function if strongly convex).

3.2 Subdifferential-proximity operator relationship

If f is a convex function on $R^{n}$ and $y \in R^{n}$ , then [26]

\begin{array}{lcr} x \in ∂f (y) \Leftrightarrow y = {prox}_{f} (x + y), \end{array}

(4)

where ∂ f(y) denotes the subdifferential of f at y.

3.3 Principles of proximal splitting methods

Proximal splitting methods are specifically tailored to solve an optimization problem of the form

\begin{array}{lcr} \underset{S}{minimize} f (S) + r (S), \end{array}

(5)

where f(S) is a smooth convex function, and r(S) is also a convex function, but nonsmooth. From convex analysis [17], we know that S is a minimizer of (5) if and only if 0∈∂(f+r)(S). This implies the following [18]:

\begin{array}{rcl} 0 \in \partial (f + r) (S) & \Leftrightarrow & 0 \in {∂f (S) + ∂r (S)} \\ \Leftrightarrow & - \nabla f (S) \in ∂r (S) \\ \Leftrightarrow & - γ \nabla f (S) \in γ∂r (S) \\ \Leftrightarrow & (S - γ \nabla f (S)) - S \in ∂γr (S) \end{array}

Using (4) in the former expression, we get

\begin{array}{lcr} S = {prox}_{γr} (S - γ \nabla f (S)) \end{array}

(6)

Equation 6 suggests that we can solve (5) using a fixed point iteration:

\begin{array}{lcr} S_{k + 1} = {prox}_{γr} (S_{k} - γ \nabla f (S_{k})) \end{array}

(7)

In optimization, (7) is known as forward-backward splitting process [19]. It consists of two steps: first, it performs a forward gradient descend step $S_{k}^{*} = S_{k} - γ \nabla f (S_{k})$ and then it performs a backward step $S_{k + 1} = {prox}_{γr} (S_{k}^{*})$ .

From (7), we can see the importance of the proximity operator (associated to γ r(S)) with respect to the forward-backward splitting methods, since their main step is to calculate it. If we would have a closed-form expression for such proximity operator or if we could approximate it efficiently (with the approximation errors decreasing at appropriate rates [27]), then we could efficiently solve (7). Furthermore, when f has a Lipschitz continuous gradient, there are fast algorithms to solve (7). For instance, the Iterative Soft Thresholding Algorithm (ISTA) has a convergence rate of O(1/k), and the Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) has a convergence rate of O(1/k²) [5].

4 Problem formulation

The regularized EEG inverse problem can be stated as follows:

\begin{array}{lcr} \hat{S} = \underset{S}{argmin} \{\frac{1}{2} | | AS - Y | |_{F}^{2} + λΩ (S), λ > 0\}, \end{array}

(8)

where $\frac{1}{2} | | AS - Y | |_{F}^{2}$ is the square loss function (∥ ∥_F denotes the Frobenius norm), and λ Ω(S) is a nonsmooth penalty term that is used to encode the prior knowledge about the structure of the target matrix S.

In order to induce structured sparse-low-rank solutions, we propose to reformulate (1) using a matrix factorization approach, which involves expressing the matrix S as the product of two matrices, S=B C, obtaining the following nonlinear estimation model:

\begin{array}{lcr} Y = ABC + E, \end{array}

(9)

where B and C are penalized using the ℓ₂₁-norm and the squared Frobenius norm, respectively. The resulting optimization problem can be stated as follows:

\begin{align} \begin{array}{l} \hat{B}, \hat{C} = \underset{B, C}{argmin} & \{\frac{1}{2} ∥ A(BC) - Y ∥_{F}^{2} + λ (\sum_{i = 1}^{N} ∥ B (i, :) ∥_{2} \\ + \frac{ρ}{2} \sum_{i = 1}^{K} ∥ C (i, :) ∥_{2}^{2})\} \\ = \underset{B, C}{argmin} & \{\frac{1}{2} ∥ A(BC) - Y ∥_{F}^{2} + λ (∥ B ∥_{2, 1} + \frac{ρ}{2} ∥ C ∥_{F}^{2})\}, \end{array} \end{align}

(10)

where λ>0, ρ>0, $B \in R^{N \times K}$ , $C \in R^{K \times T}$ , and B(i,:), C(i,:) denote the i th row of B and C, respectively. K≪{N,T}, λ, and ρ are parameters of the model that must be adjusted.

In this formulation, which we denote as matrix factorization approach, the ℓ₂₁-norm and the squared Frobenius norm induce structured sparsity and smoothness in the rows of B and C, respectively, and therefore also in the rows of S. Finally, the parameter K encloses the low rank of S:

\begin{array}{l} rank (B) & \leq min \{N, K\} \Rightarrow rank (B) \leq K \\ rank (C) & \leq min \{K, T\} \Rightarrow rank (C) \leq K \\ rank (BC) & \leq min \{rank (B), rank (C)\} \leq K \\ \Rightarrow rank (S) & \leq K \end{array}

Hence, the proposed regularization framework takes into account all the prior knowledge about the structure of the target matrix S.

5 Optimization algorithm

5.1 Matrix factorization approach

In this section, we address the issue of implementing the learning method (10) numerically. We propose the following reparameterization of (10):

\begin{align} B = \sqrt{λρ} \tilde{B}, C = \frac{1}{\sqrt{λρ}} \tilde{C} & \Rightarrow BC = (\sqrt{λρ} \tilde{B}) (\frac{1}{\sqrt{λρ}} \tilde{C}) \\ \Rightarrow BC = \tilde{B} \tilde{C} \end{align}

(11)

Using (11) in the objective function of (10), we get

\begin{array}{l} \Rightarrow \frac{1}{2} ∥ A (\tilde{B} \tilde{C}) - Y ∥_{F}^{2} + λ (∥ \sqrt{λρ} \tilde{B} ∥_{2, 1} + \frac{ρ}{2} ∥ \frac{1}{\sqrt{λρ}} \tilde{C} ∥_{F}^{2}) \\ \Rightarrow \frac{1}{2} ∥ A (\tilde{B} \tilde{C}) - Y ∥_{F}^{2} + λ \sqrt{λρ} ∥ \tilde{B} ∥_{2, 1} + \frac{λρ}{2 {(\sqrt{λρ})}^{2}} ∥ \tilde{C} ∥_{F}^{2} \\ \Rightarrow \frac{1}{2} ∥ A (\tilde{B} \tilde{C}) - Y ∥_{F}^{2} + \tilde{λ} ∥ \tilde{B} ∥_{2, 1} + \frac{1}{2} ∥ \tilde{C} ∥_{F}^{2} \end{array}

where $\tilde{λ} = λ \sqrt{λρ}$ , and therefore, we get an optimization problem with only one regularization parameter:

\begin{matrix} \hat{B}, \hat{C} & = \underset{B,C}{argmin} \{\frac{1}{2} ∥ A(BC) - Y ∥_{F}^{2} + λ ∥ B ∥_{2, 1} + \frac{1}{2} ∥ C ∥_{F}^{2}, λ > 0\} \end{matrix}

(12)

The optimization problem (12) is a simultaneous minimization over matrices B and C. For a fixed C, the minimum over B can be obtained using FISTA. On the other hand, for a fixed B, the minimum over C can be solved directly in terms of a matrix inversion. These observations suggest an alternating minimization algorithm [15, 28]:

In order to obtain the initialization matrix C₀, we use an approach based on the singular value decomposition of Y. Without loss of generality, let us work with (9) in the noiseless case:

\begin{align} Y = ABC \end{align}

(15)

From (15), we can see that {Y₁,Y₂,…,Y_M}⊂RowSpace(C), where Y_i denotes the i th row of Y.

Now, let us obtain a rank-K approximation of Y by using a truncated SVD (truncated at the singular value σ_K):

\begin{align} Y \approx U_{M \times K} Σ_{K \times K} V_{K \times T}^{⊤} \end{align}

(16)

From the SVD theory [29], we know that {Y₁,Y₂,…,Y_M}⊂Row Space(V^⊤); therefore, we can choose C₀=V^⊤. Then, given C₀, we can start iterating using (13) and (14).

5.1.1 Minimization over B(fixed C)

The minimization over B can be stated as follows:

\begin{array}{lcr} B_{t} = \underset{B}{argmin} \{F_{B} (B) + λ ∥ B ∥_{2, 1}, λ > 0\}, \end{array}

(17)

where $F_{B} (B) = \frac{1}{2} ∥ {A(BC}_{t-1}) - Y ∥_{F}^{2} + \frac{1}{2} ∥ C_{t-1} ∥_{F}^{2}$ . This is a composite convex optimization problem involving the sum of a smooth function (F_B(B)) and a nonsmooth function (λ∥B∥_2,1). As we have seen in Section 3, this kind of problem can be efficiently handled using proximal splitting methods (e.g., FISTA). In order to apply FISTA to solve (17), we first need to compute the following:

1.
The gradient of the smooth function F _B(B)
$\begin{array}{lcr} \nabla F_{B} (B) = \frac{\partial F_{B} (B)}{\partial B} = A^{⊤} (A (B C_{t - 1}) - Y) C_{t - 1}^{⊤} \end{array}$

where A^⊤ denotes the transpose of the matrix A.
2.
An upper bound of the Lipschitz constant (L) of ∇F _B(B) (it can also be estimated using a backtracking search routine [5])
$\begin{array}{l} ∥ \nabla F_{B} (B_{1}) - \nabla F_{B} (B_{2}) ∥_{2}^{2} & = ∥ A^{⊤} A B_{1} C_{t - 1} C_{t - 1}^{⊤} \\ - A^{⊤} A B_{2} C_{t - 1} C_{t - 1}^{⊤} ∥_{2}^{2} \\ = ∥ A^{⊤} A (B_{1} - B_{2}) C_{t - 1} C_{t - 1}^{⊤} ∥_{2}^{2} \\ = \sum_{j = 1}^{K} ∥ (A^{⊤} A (B_{1} - B_{2})) {(C_{t - 1} C_{t - 1}^{⊤})}_{j} ∥_{2}^{2} \end{array}$

where ${(C_{t - 1} C_{t - 1}^{⊤})}_{j}$ denotes the j-th column of the matrix $C_{t - 1} C_{t - 1}^{⊤}$ . Taking into account that $∥ Q x ∥_{2} \leq ∥ | Q ∥ |_{2} ∥ x ∥_{2}, \forall x \in R^{N}, \forall Q \in R^{M \times N}$ [29], where ∥|·∥|₂ denotes the spectral norm, we get:
$\begin{array}{l} ∥ \nabla F_{B} (B_{1}) - \nabla F_{B} (B_{2}) ∥_{2}^{2} & = \sum_{j = 1}^{K} ∥ (A^{⊤} A (B_{1} - B_{2})) \\ \times {(C_{t - 1} C_{t - 1}^{⊤})}_{j} ∥_{2}^{2} \\ \leq \sum_{j = 1}^{K} ∥ | A^{⊤} A (B_{1} - B_{2}) ∥ |_{2}^{2} \\ \times ∥ {(C_{t - 1} C_{t - 1}^{⊤})}_{j} ∥_{2}^{2} \\ \leq ∥ | A^{⊤} A (B_{1} - B_{2}) ∥ |_{2}^{2} \\ \times \sum_{j = 1}^{K} ∥ {(C_{t - 1} C_{t - 1}^{⊤})}_{j} ∥_{2}^{2} \\ \leq ∥ | A^{⊤} A (B_{1} - B_{2}) ∥ |_{2}^{2} \\ \times ∥ C_{t - 1} C_{t - 1}^{⊤} ∥_{2}^{2} \end{array}$
(18)

From (18), taking into account that the spectral norm is submultiplicative ( $∥ | PQ ∥ |_{2} \leq ∥ | P ∥ |_{2} ∥ | Q ∥ |_{2}, \forall P \in R^{M \times N}, \forall Q \in R^{N \times T}$ ), it follows that:
$\begin{array}{rcl} ∥ \nabla F_{B} (B_{1}) & - & \nabla F_{B} (B_{2}) ∥_{2}^{2} \leq ∥ | A^{⊤} A ∥ |_{2}^{2} \\ ∥ | B_{1} & - & B_{2} ∥ |_{2}^{2} ∥ | C_{t - 1} C_{t - 1}^{⊤} ∥_{2}^{2} \end{array}$

and, using the fact that $∥ | P ∥ |_{2}^{2} \leq ∥ P ∥_{2}^{2}, \forall P \in R^{M \times N}$ , we obtain:
$\begin{align} ∥ \nabla F_{B} (B_{1}) - \nabla F_{B} (B_{2}) ∥_{2}^{2} & \leq ∥ | A^{⊤} A ∥ |_{2}^{2} ∥ B_{1} - B_{2} ∥_{2}^{2} \\ \times ∥ C_{t - 1} C_{t - 1}^{⊤} ∥_{2}^{2} \\ \leq L ∥ B_{1} - B_{2} ∥_{2}^{2} \end{align}$
(19)

where $L = ∥ | A^{⊤} A ∥ |_{2} ∥ C_{t - 1} C_{t - 1}^{⊤} ∥_{2}$ .
3.
Proximal operator associated to the nonsmooth function λ∥·∥_2,1
$\begin{align} {prox}_{λ ∥ \cdot ∥_{2, 1}} (B) & = \underset{X}{argmin} \{λ ∥ X ∥_{2, 1} + \frac{1}{2} ∥ X - B ∥_{2}^{2}\} \\ = {[{({prox}_{λ ∥ \cdot ∥_{2, 1}} (B))}_{i, :}]}_{i = 1}^{i = N} \\ = {[\frac{B_{i, :}}{∥ B_{i, :} ∥_{2}} {(∥ B_{i, :} ∥_{2} - λ)}_{+}]}_{i = 1}^{i = N} \end{align}$
(20)

where (·)₊= max(·,0), and by convention $\frac{0}{0} = 0$ .

5.1.2 Minimization over C(fixed B)

The minimization over C can be stated as follows:

C_{t} = \underset{C}{argmin} \{F_{C} (C)\}

(21)

where $F_{C} (C) = \frac{1}{2} ∥ {A(B}_{t} C) - Y ∥_{F}^{2} + λ ∥ B_{t} ∥_{2, 1} + \frac{1}{2} ∥ C ∥_{F}^{2}$ is a smooth function of C. In what follows, we show how the minimum over C can be solved directly in terms of a matrix inversion:

\begin{array}{l} \nabla F_{C} (C) = \frac{\partial F_{C} (C)}{\partial C} & = B_{t}^{⊤} A^{⊤} (A (B_{t} C) - Y) + C \\ \nabla F_{C} (C) = 0 & \Rightarrow B_{t}^{⊤} A^{⊤} (A (B_{t} C_{t}) - Y) + C_{t} = 0 \\ \Rightarrow C_{t} = {[B_{t}^{⊤} A^{⊤} A B_{t} + I_{K}]}^{- 1} B_{t}^{⊤} A^{⊤} Y \end{array}

(22)

The matrix $[B_{t}^{⊤} A^{⊤} A B_{t} + I_{K}] \in R^{K \times K}$ , and K is supposed to be small; therefore, calculating its corresponding inverse matrix is quite cheap.

6 Convergence analysis

We are going to analyze the convergence behavior of Algorithm 1 by using the global convergence theory of iterative algorithms developed by Zangwill [30]. Note that in this theory, the term ‘global convergence’ do not imply convergence to a global optimum for all initial points. The property of global convergence expresses, in a sense, the certainty that the algorithm converges to the solution set. Formally, an iterative algorithm ξ, on the set X, is said to be globally convergent provided, for any starting point x₀∈X, the sequence {x_n} generated by ξ has a limit point [31].

In order to use the global convergence theory of iterative algorithms, we need a formal definition of iterative algorithm, as well as the definition of a set-valued mapping (a.k.a point-to-set mapping) [30]:

Definition 6.1

Set-valued mapping. Given two sets, X and Y, a set-valued mapping defined on X, with range in the power set of Y, $P (Y)$ , is a map, Φ, which assigns to each x∈X a subset $Φ (x) \in P (Y)$ ,

Φ : X \to P (Y)

Definition 6.2

Iterative algorithm. Let X be a set and x₀∈X a given point. Then, an iterative algorithm ξ, with initial point x₀, is a set-valued mapping

ξ : X \to P (X)

which generates a sequence ${\{x_{n}\}}_{n = 1}^{\infty}$ via the rule x_n+1∈ξ(x_n), n=0,1,…

Now that we know the main building blocks of the global convergence theory of iterative algorithms, we are in a position to state the convergence theorem related to Algorithm 1:

Theorem 6.1

Let Φ denotes the iterative Algorithm 1, and suppose that given $Y \in R^{M \times T}$ , $A \in R^{M \times N}$ , $B_{0} \in R^{N \times K}$ , $C_{0} \in R^{K \times T}$ , K, and λ, the sequence ${\{B_{t}, C_{t}\}}_{t = 1}^{\infty}$ is generated and satisfies {B_t+1,C_t+1}∈Φ(B_t,C_t). Also, let Ω_B and Ω_C denote the solution sets of (13) and (14), respectively:

\begin{align} Ω_{B} & = \{B \in R^{N \times K} |0 \in \partial (\frac{1}{2} | | {A(BC}_{t-1}) - Y | |_{F}^{2} + λ | | B | |_{2, 1} \\ + \frac{1}{2} | | C_{t-1} | |_{F}^{2})\} \\ Ω_{C} & = \{C \in R^{K \times T} |\nabla (\frac{1}{2} | | {A(B}_{t} C) - Y | |_{F}^{2} + λ | | B_{t} | |_{2, 1} \frac{1}{2} | | C | |_{F}^{2}) = 0\} \end{align}

Then, the limit of any convergent subsequence of ${\{B_{t}, C_{t}\}}_{t = 1}^{\infty}$ is in Ω_B and Ω_C.

This convergence theorem is a direct application of Zangwill’s global convergence theorem [30]. Before going in this assertion, let us show some definitions and theorems used in the proof.

Definition 6.3

Compact set. A set X is said to be compact if any sequence (or subsequence) contains a convergent subsequence whose limit is in X. More explicitly, given a subsequence ${\{x_{n}\}}_{n \in \hat{N}}$ in X, there exists a $\hat{N_{1}} \subset \hat{N}$ such that

x_{n} \to x_{\infty}, n \in \hat{N_{1}}

with x_∞∈X (we write convergence of subsequences as x_n→x_∞, which is equivalent to $lim_{n \to \infty} x_{n} = x_{\infty}$ ).

Definition 6.4

Composite map. Let Π_A:X→Y and Π_B:Y→Z be two set-valued mappings. The composite map Π_C=Π_B∘Π_A which takes points x∈X to sets Π_C(x)⊂Z is defined by

Π_{C} (x) : = ⋃_{y \in Π_{A} (x)} Π_{B} (y)

Definition 6.5

Closed map. A set-valued mapping $Π : X \to P (Y)$ is closed at x₀∈X provided

1.
x _n→x ₀ a s n→∞, x _n∈X
2.
y _n→y ₀ a s n→∞, y _n, y ₀∈Y
3.
y _n∈Π(x _n)

implies y₀∈Π(x₀). The map Π is called closed on S⊂X provided is closed at each x∈S.

Theorem 6.2

Composition of closed maps. Let Π_A:X→Y and Π_B:Y→Z be two set-valued mappings. Suppose

1.
Π _A is closed at x ₀
2.
Π _B is closed on Π _A(x ₀)
3.
if x _n→x ₀ and y _n∈Π _A(x _n), then there exists y ₀∈Y, such that for some sequence $\{y_{n_{j}}\}$ , $y_{n_{j}} \to y_{0}$ as j→∞.

Then, the composite map Π_C=Π_B∘Π_A is closed at x₀.

Lemma 6.1

[32] Given a real-valued function defined on X×Y, define the set-valued mapping $Ψ : X \to P (Y)$ by

Ψ (x) = \underset{y \in Y}{argmin} h (x, y)

then, Ψ is closed at x if Ψ(x) is nonempty.

Theorem 6.3

Weierstrass theorem. If f is a real continuous function on a compact set $S \subset R^{n}$ , then the problem

\underset{x \in R^{n}}{argmin} \{f (x), x \in S\}

has an optimal solution x^∗∈S.

Theorem 6.4

[30] Zangwill’s global convergence theorem. Let the set-valued mapping $M_{x} (x) : X \to P (X)$ determine an algorithm that given a point x₀ generates a sequence ${\{x_{n}\}}_{n = 0}^{\infty}$ through the iteration x_n+1∈M_x(x_n). Also, let a solution set Γ be given. Suppose

1.
All point x _n are in a compact set S⊂X.
2.
There is a continuous function $α : X \to R$ such that
1. (a)
  if x∉Γ, then α(x ^′)<α(x)∀x ^′∈M _x(x).
2. (b)
  if x∈Γ, then α(x ^′)≤α(x)∀x ^′∈M _x(x).
3.
The map M _x(x) is closed at x if x∉Γ.

Then, the limit of any convergent subsequence of ${\{x_{n}\}}_{n = 0}^{\infty}$ is in Γ. That is, accumulation points x^∗ of the sequence x_n lie in Γ. Furthermore, α(x_n) converges to α^∗, and α(x^∗)= α^∗ for all accumulation points x^∗.

Proof

Theorem 6.1. The iterative algorithm Φ can be decomposed into two well-defined iterative algorithms Φ_B and Φ_C:

\begin{align} Φ_{B} (C_{t-1}) = B_{t} & = \underset{B}{argmin} \{\frac{1}{2} | | {A(BC}_{t-1}) - Y | |_{F}^{2} + λ | | B | |_{2, 1} \\ + \frac{1}{2} | | C_{t-1} | |_{F}^{2}\} \end{align}

(23)

\begin{align} Φ_{C} (B_{t}) = C_{t} & = \underset{C}{argmin} \{\frac{1}{2} | | {A(B}_{t} C) - Y | |_{F}^{2} \\ + λ | | B_{t} | |_{2, 1} + \frac{1}{2} | | C | |_{F}^{2}\} \end{align}

(24)

As we can see from (23) and (24), at iteration t, the result of Φ_B becomes the input of Φ_C, and at iteration t+1, the result of Φ_C becomes the input of Φ_B; therefore, we can express Φ as the composition of Φ_C and Φ_B, that is, Φ(C_t-1)= Φ_C(Φ_B(C_t-1)):

\begin{align} Φ_{C} (Φ_{B} (C_{t-1})) = C_{t} & = \underset{C}{argmin} \{\frac{1}{2} | | {A(B}_{t} C) - Y | |_{F}^{2} \\ + λ | | B_{t} | |_{2, 1} + \frac{1}{2} | | C | |_{F}^{2}\} \end{align}

(25)

\begin{align} subject to B_{t} & = \underset{B}{argmin} \{\frac{1}{2} | | {A(BC}_{t-1}) - Y | |_{F}^{2} \\ + λ | | B | |_{2, 1} + \frac{1}{2} | | C_{t-1} | |_{F}^{2}\} \end{align}

Let Γ be the solution set of Φ

Γ = \{C \in R^{K \times T} | \frac{∂Z (C, t)}{\partial C} = 0\},

where $Z (C, t) = \frac{1}{2} | | {A(B}_{t} C) - Y | |_{F}^{2} + λ | | B_{t} | |_{2, 1} + \frac{1}{2} | | C | |_{F}^{2}$ .

To prove this theorem by using Zangwill’s global convergence theorem, we need to prove that all its corresponding assumptions are fulfilled. In order to prove assumption 1, let us analyze the sequences ${\{B_{t}\}}_{t = 1}^{\infty}$ and ${\{C_{t}\}}_{t = 1}^{\infty}$ . The sequence ${\{B_{t}\}}_{t = 1}^{\infty}$ is generated by using FISTA, which is a convergent algorithm (B_t→B_∞) that guarantees that B_t∈Ω_B[5, 18]. Hence, using Definition 6.3, we can see that the sequence ${\{B_{t}\}}_{t = 1}^{\infty}$ generated by (23) lies in a compact set. On the other hand, the sequence ${\{C_{t}\}}_{t = 1}^{\infty}$ is generated by (22), which guarantees that C_t∈Ω_C. This sequence always converges to a point inside Ω_C, which implies that Ω_C also lies in a compact set. This concludes the proof of assumption 1.

To prove assumption 2, let us use Z(C,t) as the function α(·); thus, in order to verify the fulfillment of assumption 2, we need to prove that

(a)
if C _t∉Γ, then Z(C _t+1,t+1)<Z(C _t,t) ∀C _t+1∈Φ(C _t)
(b)
if C _t∈Γ, then Z(C _t+1,t+1)≤Z(C _t,t) ∀C _t+1∈Φ(C _t)

From (25), we can see that the sequence ${\{C_{t}\}}_{t = 1}^{\infty}$ will always lie in Γ (because C_t is generated by (22)); therefore, we only need to prove (b).

Let C_t+1 be the solution of (25) at iteration t+1; this implies

\begin{align} \begin{array}{l} \frac{1}{2} | | {A(B}_{t+1} C_{t+1}) - Y | |_{F}^{2} & + λ | | B_{t+1} | |_{2, 1} + \frac{1}{2} | | C_{t+1} | |_{F}^{2} \\ \leq \frac{1}{2} | | {A(B}_{t+1} C) - Y | |_{F}^{2} + λ | | B_{t+1} | |_{2, 1} \\ + \frac{1}{2} | | C | |_{F}^{2}, \forall C \in R^{K \times T} \\ \leq \frac{1}{2} | | {A(B}_{t+1} C_{t}) - Y | |_{F}^{2} + λ | | B_{t+1} | |_{2, 1} \\ + \frac{1}{2} | | C_{t} | |_{F}^{2} \end{array} \end{align}

(26)

On the other hand, if B_t+1 is the solution of (23) at iteration t+1, this implies

\begin{array}{l} \frac{1}{2} | | {A(B}_{t+1} C_{t}) - Y | |_{F}^{2} & + λ | | B_{t+1} | |_{2, 1} + \frac{1}{2} | | C_{t} | |_{F}^{2} \\ \leq \frac{1}{2} | | {A(BC}_{t}) - Y | |_{F}^{2} + λ | | B | |_{2, 1} \\ + \frac{1}{2} | | C_{t} | |_{F}^{2}, \forall B \in R^{N \times K} \\ \leq \frac{1}{2} | | {A(B}_{t} C_{t}) - Y | |_{F}^{2} + λ | | B_{t} | |_{2, 1} \\ + \frac{1}{2} | | C_{t} | |_{F}^{2} \end{array}

(27)

and from (26) and (27), we can prove assumption 2(b):

\begin{align} \begin{array}{l} \frac{1}{2} | | {A(B}_{t+1} C_{t+1}) - Y | |_{F}^{2} & + λ | | B_{t+1} | |_{2, 1} + \frac{1}{2} | | C_{t+1} | |_{F}^{2} \\ \leq \frac{1}{2} | | {A(B}_{t} C_{t}) - Y | |_{F}^{2} + λ | | B_{t} | |_{2, 1} \\ + \frac{1}{2} | | C_{t} | |_{F}^{2} \\ Z (C_{t+1}, t + 1) & \leq Z (C_{t}, t) \end{array} \end{align}

In order to prove assumption 3, we need to prove that Φ is closed at C if C∉Γ. To do so, we are going to use Theorem 6.2; therefore, we need to prove that Φ_B and Φ_C are both closed maps: from (23) and (24), we can see that their corresponding objective functions are both continuous $\forall B \in R^{N \times K}$ and $\forall C \in R^{K \times T}$ , respectively; hence, by using Weierstrass Theorem and Lemma 6.1, we can conclude that Φ_B and Φ_C are both closed maps for any C_t-1 and B_t, respectively, and by using Theorem 6.2, we can conclude that Φ is closed on any C_t-1.

Finally, from all the previous proofs and Zangwill’s global convergence theorem, it follows that the limit of any convergent subsequence of ${\{B_{t}, C_{t}\}}_{t = 1}^{\infty}$ is in Ω_B and Ω_C.

7 Numerical experiments

In this section, we evaluate the performance of the matrix factorization approach and compare it with the Group Lasso regularizer:

\begin{matrix} \hat{S} = \underset{S}{argmin} \{\frac{1}{2} ∥ AS - Y ∥_{F}^{2} + λ \sum_{i = 1}^{N} ∥ S (i, :) ∥_{2}, λ > 0\} \end{matrix}

(28)

and the Trace Norm regularizer:

\begin{matrix} \hat{S} & = \underset{S}{argmin} \{\frac{1}{2} ∥ AS - Y ∥_{F}^{2} + λ \sum_{i = 1}^{q} σ_{i} (S), λ > 0\} \end{matrix}

(29)

where q=min{N,T} and σ_i(S) denotes the i th singular value of S. Both problems (28) and (29) were solved using the FISTA implementation of the SPArse Modeling Software (SPAMS) [33, 34].

In order to have a reproducible comparison of the different regularization approaches, we generated two synthetic scenarios:

M=128 EEG electrodes, T=161 time instants, N=413 current sources within the brain, but only 12 of them are active: 4 main active sources with their corresponding 2 nearest neighbor sources are also active. The other 401 sources are not active (zero electrical activity). Therefore, in this scenario, the synthetic matrix S is a structured sparse matrix with only 12 nonzero rows (the rows associated to the active sources).
M=128 EEG electrodes, T=161 time instants, N=2,052 current sources within the brain, but only 40 of them are active: 4 main active sources with their corresponding 9 nearest neighbor sources are also active. The other 2012 sources are not active (zero electrical activity). Therefore, in this scenario, the synthetic matrix S is a structured sparse matrix with only 40 nonzero rows (the rows associated to the active sources).

In both scenarios, the simulated electrical activity (simulated waveforms) associated to the four Main Active Sources (MAS) was obtained from a face perception-evoked potential study [35, 36]. To obtain the simulated electrical activity associated to each one of the active neighbor sources, we simply set it as a scaled version of the electrical activity of its corresponding nearest MAS (with a scaled factor equal to 0.5). Hence, there is a linear relation between the four MAS and their corresponding nearest neighbor sources; therefore, in both scenarios, the rank of the synthetic matrix S is equal to 4.

As forward model (A), we used a three-shell concentric spherical head model. In this model, the inner sphere represents the brain, the intermediate layer represents the skull, and the outer layer represents the scalp [37]. To obtain the values of each one of the components of the matrix A, we need to solve the EEG forward problem [38]: Given the electrical activity of the current sources within the brain and a model for the geometry of the conducting media (brain, skull and scalp, with its corresponding electric properties), compute the resulting EEG signals. This problem was solved by using the SPM software [39]. Taking into account the comments mentioned in Section 2, the N simulated current sources were positioned on a mesh located on the brain cortex, with an orientation fixed perpendicular to it.

Finally, the simulated EEG signals were generated according to (1), where E is a Gaussian noise G(0,σ²I) whose variance was set to satisfy a $SNR = 20 \underset{10}{log} (\frac{| | AS | |_{F}}{| | E | |_{F}}) = 10 dB$ . Summarizing, our synthetic problems can be stated as follows: Given matrices $Y \in R^{128 \times 161}$ and $A \in R^{128 \times N}$ , recover the synthetic BES matrix $S \in R^{N \times 161}$ . According to this, in both scenarios, we want to estimate a BES matrix which is structured sparse and low rank, with its rank equal to the number of MAS simulated. The activity of the four MAS, the synthetic EEG measurements as well as the sparsity pattern of the synthetic BES matrix are shown in Figures 1 and 2 (Ground Truth).

We have used cross-validation to select the regularization parameter λ associated to the Group Lasso and Trace Norm regularizers, as well as the parameters λ and K in the case of the Matrix Factorization approach (K∈ [ 1,2,3,…,10], λ∈ [ 10⁻³,10⁻²,10⁻¹,…,10³]): the rows of Y are randomly partitioned into three groups of approximately equal size. Each union of two groups forms a train set (TrS), while the remaining group forms a test set (TS). This procedure is carried out three times, each time selecting a different test group. Inverse reconstructions are carried out based on the training sets, obtaining different regression matrices ${\hat{S}}_{i}$ . We then evaluate the root mean square error (RMSE) using the test sets and the regression matrices ${\hat{S}}_{i}$ :

\begin{matrix} RMSE : \frac{1}{3} \sum_{i = 1}^{3} (\frac{1}{\sqrt{M_{{TS}_{i}} \times T}} ∥ A_{{TS}_{i}} {\hat{S}}_{i} - Y_{{TS}_{i}} ∥_{F}), \end{matrix}

where $Y_{{TS}_{i}} \in R^{M_{{TS}_{i}} \times T}$ , and $A_{{TS}_{i}} \in R^{M_{{TS}_{i}} \times N}$ (TS_i denotes the index set of the rows that belongs to the i th test set). Once the estimated matrix $\hat{S}$ has been found, we apply a threshold to remove spurious sources with almost zero activity. We have set this threshold equal to the 1% of the mean energy of all the sources.

7.1 Performance evaluation

In order to evaluate the performance of the regularizers, we compare the waveform and localization of the four MAS present in the synthetic BES matrix against the four MAS estimated by each one of the regularizers. We also compare the sparsity pattern of the estimated BES matrix $\hat{S}$ against the sparsity pattern of the synthetic BES matrix S, as well as the synthetic and predicted EEG measurements.

As we can see from Figures 1 and 2, the Group Lasso and Trace Norm regularizers do not reveal the correct number of linear independent sources, while the Matrix Factorization does: it finds out four linear independent sources in both scenarios. To select such four linear independent MAS, we find a basis for the $Column Space ({\tilde{S}}^{⊤})$ (using a QR factorization), where $\tilde{S}$ is a matrix whose rows are a sorted version of the rows of S (sorted in a descending order of their corresponding energy value). To get the four linear independent MAS estimated by the Group Lasso and Trace Norm regularizers, we followed the same procedure described before and retained the first four components of the basis of the $Column Space ({\tilde{S}}^{⊤})$ .According to Figures 1 and 2, the Matrix Factorization approach is able to estimate a BES matrix with the correct rank and whose sparsity pattern follows closely the sparsity pattern of the true BES matrix, that is, both matrices have a similar structure, which implies that the proposed approach is able to induce the desired solution: A row-structured sparse matrix, whose nonzero rows encode the linear relation between the active sources and their corresponding nearest neighbor sources. Using the estimated BES matrix, the Matrix Factorization approach is also able to predict a smooth version of the noisy EEG, and the waveforms of the estimated MAS follow closely the waveforms of the true MAS.As we can see from Figures 1 and 2, Group Lasso is able to estimate a BES matrix with a similar row-sparsity pattern to the true BES matrix, but it does not take into account the linear relation between the nonzero rows, which can be seen from the rank of the estimated BES matrix. The waveforms of the estimated MAS are very similar to the true MAS, but they are not so smooth as the ones estimated by the Matrix Factorization approach.As we can see from Figures 1 and 2, the Trace Norm regularizer takes into account the linear relation of the active sources by inducing solutions which are low rank, but, on the other hand, it does not take into account the structured sparsity pattern of the BES matrix. All of this implies that the Trace Norm tends to induce low rank dense solutions, which are not biologically plausible.According to Figures 3 and 4, the position of the MAS obtained from the BES matrix estimated by the Matrix Factorization approach, the Group Lasso, and Trace Norm regularizers follows closely the position of the true MAS. Nevertheless, it is worth highlighting that before selecting the MAS, we first need an accurate estimation of their number, and the Group Lasso and Trace Norm regularizers were not able to get a precise estimate of it, only the Matrix Factorization were able to.

From these results, we can see that the proposed Matrix Factorization approach outperforms both the Group Lasso and Trace Norm regularizers. The main reason for this is because it combines their two main features: it combines the structured sparsity (from Group Lasso) and the low rank (from Trace Norm) into one unified framework, which implies that it is able to induce structured sparse-low-rank solutions which are biologically plausible: few active sources, with linear relations between them.

8 Conclusions

We have presented a novel approach to solve the EEG inverse problem, which is based on matrix factorization and regularization. Our method combines the ideas behind the Group Lasso (structured sparsity) and Trace Norm (low rank) regularizers into one unified framework. We have also developed and analyzed the convergence of an alternating minimization algorithm to solve the resulting nonsmooth-nonconvex regularization problem. Finally, using simulation studies, we have compared our method with the Group Lasso and Trace Norm regularizers when they are applied directly to the target matrix, and we have shown the gain in performance obtained by our method, hence proving the effectiveness and efficiency of the proposed algorithm.

References

Hämäläinen M, Hari R, Ilmoniemi RJ, Knuutila J, Lounasmaa OV: Magnetoencephalography—theory, instrumentation, and applications to noninvasive studies of the working human brain. Rev. Mod. Phys 1993, 65(2):413.
Article Google Scholar
Pascual-Marqui RD: Review of methods for solving the EEG inverse problem. Int. J. Bioelectromagnetism 1999, 1(1):75-86.
Google Scholar
Baillet S, Mosher JC, Leahy RM: Electromagnetic brain mapping. IEEE Signal Process. Mag 2001, 18(6):14-30.
Article Google Scholar
Grech R, Cassar T, Muscat J, Camilleri KP, Fabri SG, Zervakis M, Xanthopoulos P, Sakkalis V, Vanrumste B: Review on solving the inverse problem in EEG source analysis. J. Neuroeng. Rehabil 2008, 5(1):25.
Article Google Scholar
Beck A, Teboulle M: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci 2009, 2(1):183-202.
Article MathSciNet MATH Google Scholar
Menendez RGdP, Murray MM, Michel CM, Martuzzi R, Andino SLG: Electrical neuroimaging based on biophysical constraints. NeuroImage 2004, 21(2):527-539.
Article Google Scholar
Hämäläinen MS, Ilmoniemi R: Interpreting magnetic fields of the brain: minimum norm estimates. Med. Biol. Eng. Comput. 1994, 32(1):35-42.
Article Google Scholar
Uutela K, Hämäläinen M, Somersalo E: Visualization of magnetoencephalographic data using minimum current estimates. NeuroImage 1999, 10(2):173-180.
Article Google Scholar
Ou W, Hämäläinen MS, Golland P: A distributed spatio-temporal EEG/MEG inverse solver. NeuroImage 2009, 44(3):932-946.
Article Google Scholar
Gramfort A, Strohmeier D, Haueisen J, Hamalainen M, Kowalski M: Functional brain imaging with M/EEG using structured sparsity in time-frequency dictionaries. In Information Processing in Medical Imaging, Lecture Notes in Computer Science. Edited by: Székely G, Hahn HK. Springer, Berlin; 2011:600-611.
Chapter Google Scholar
Gramfort A, Strohmeier D, Haueisen J, Hämäläinen M, Kowalski M: Time-Frequency Mixed-Norm Estimates: Sparse M/EEG imaging with non-stationary source activations. NeuroImage 2013, 70: 410-22.
Article Google Scholar
Haufe S, Tomioka R, Dickhaus T, Sannelli C, Blankertz B, Nolte G, Müller KR: Large-scale EEG/MEG source localization with spatial flexibility. NeuroImage 2011, 54(2):851-859.
Article Google Scholar
Gramfort A, Kowalski M, Hämäläinen M: Mixed-norm estimates for the M/EEG inverse problem using accelerated gradient methods. Phys. Med. Biol 2012, 57(7):1937.
Article Google Scholar
Bach F, Jenatton R, Mairal J, Obozinski G: Optimization with Sparsity-Inducing Penalties. Foundations Trends®; Mach. Learn 2011, 4(1):1-106.
Article MATH Google Scholar
Micchelli CA, Morales JM, Pontil M: Regularizers for structured sparsity. Adv. Comput. Math. 2013, 38(3):455-489.
Article MathSciNet MATH Google Scholar
Sra S, Nowozin S, Wright SJ: Optimization for Machine Learning. MIT Press, Cambridge; 2012.
Google Scholar
Bertsekas DP: Nonlinear Programming. Athena Scientific, Belmont; 1999.
MATH Google Scholar
Combettes PL, Wajs VR: Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul 2005, 4(4):1168-1200.
Article MathSciNet MATH Google Scholar
Combettes PL, Pesquet J-C: Proximal splitting methods in signal processing. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Springer Optimization and Its Applications. Edited by: Bauschke HH, Burachik RS, Combettes PL, Elser V, Luke DR, Wolkowicz H. Springer, New York; 2011:185-212.
Chapter Google Scholar
Nesterov Y: Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076, Center for Operations Research and Econometrics (CORE), Université Catholique de Louvain. 2007.
Google Scholar
Wright SJ, Nowak RD, Figueiredo MA: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 2009, 57(7):2479-2493.
Article MathSciNet Google Scholar
Sanei S, Chambers JA: EEG Signal Processing. Wiley, West Sussex; 2008.
Google Scholar
Malmivuo J, Plonsey R: Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields. Oxford University Press, Oxford; 1995.
Book Google Scholar
Murakami S, Okada Y: Contributions of principal neocortical neurons to magnetoencephalography and electroencephalography signals. J Phys 2006, 575(3):925-936.
Google Scholar
Moreau JJ: Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 1965, 93(2):273-299.
MathSciNet MATH Google Scholar
Micchelli CA, Shen L, Xu Y: Proximity algorithms for image models: denoising. Inverse Probl 2011, 27: 045009.
Article MathSciNet MATH Google Scholar
Schmidt M, Roux NL, Bach F: Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inform. Process. Syst. 2011, 24: 1458-1466.
Google Scholar
Argyriou A, Evgeniou T, Pontil M: Convex multi-task feature learning. Mach. Learn 2008, 73(3):243-272.
Article Google Scholar
Horn RA, Johnson CR: Matrix Analysis. Cambridge university press, Cambridge; 1990.
MATH Google Scholar
Zangwill WI: Nonlinear Programming: a Unified Approach. Prentice-Hall, Englewood Cliffs; 1969.
MATH Google Scholar
Sriperumbudur B, Lanckriet G: On the convergence of the concave-convex procedure. Adv. Neural Inform. Process. Syst 2009, 22: 1759-1767.
Google Scholar
Gunawardana A, Byrne W: Convergence theorems for generalized alternating minimization procedures. J. Mach. Learn. Res 2005, 6: 2049-2073.
MathSciNet MATH Google Scholar
Jenatton R, Mairal J, Obozinski G, Bach F: Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the International Conference on Machine Learning (ICML). Haifa; 21–24 June 2010.
Google Scholar
Mairal J, Jenatton R, Bach FR, Obozinski GR: Network flow algorithms for structured sparsity. Adv. Neural Inform. Process. Syst 2010, 23: 1558-1566.
MATH Google Scholar
Friston K, Harrison L, Daunizeau J, Kiebel S, Phillips C, Trujillo-Barreto N, Henson R, Flandin G, Mattout J: Multiple sparse priors for the M/EEG inverse problem. NeuroImage 2008, 39(3):1104-1120.
Article Google Scholar
Henson R, Goshen-Gottstein Y, Ganel T, Otten L, Quayle A, Rugg M: Electrophysiological and haemodynamic correlates of face perception, recognition and priming. Cereb. Cortex 2003, 13(7):793.
Article Google Scholar
Hallez H, Vanrumste B, Grech R, Muscat J, De Clercq W, Vergult A, D’Asseler Y, Camilleri KP, Fabri SG, Van Huffel S, Lemahieu I: Review on solving the forward problem in EEG source analysis. J. Neuroeng. Rehabil 2007, 4(1):46.
Article Google Scholar
Mosher JC, Leahy RM, Lewis PS: EEG and MEG: forward solutions for inverse methods. IEEE Trans. Biomed. Eng. 1999, 46(3):245-259.
Article Google Scholar
Litvak V, Mattout J, Kiebel S, Phillips C, Henson R, Kilner J, Barnes G, Oostenveld R, Daunizeau J, Flandin G, Penny W, Friston K: EEG and MEG data analysis in SPM8. Comput. Intell. Neurosci 2011. doi:10.1155/2011/852961
Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions to improve the quality of the paper. They are also grateful to Dr. Carsten Stahlhut, from DTU Informatics, for valuable discussions about EEG brain imaging and also to Dr. Alexandre Gramfort, from Telecom ParisTech, for his advices related with EEG signal processing and nonsmooth convex programming theory applied to the EEG inverse problems. This work has been partly supported by Ministerio de Economía of Spain (projects ‘DEIPRO’ (id. TEC2009-14504-C02-01) and ‘COMONSENS’ (id. CSD2008-00010), ‘ALCIT’ (id. TEC2012-38800-C03-01), and ‘COMPREHENSION’ (id. TEC2012-38883-C02-01)). Authors LKH and MP were funded by Banco Santander and Universidad Carlos III de Madrid’s Excellence Chair programme.

Author information

Authors and Affiliations

Department of Signal Processing and Communications, Universidad Carlos III de Madrid, Avda. Universidad 30, Leganés, Madrid, 28911, Spain
Jair Montoya-Martínez & Antonio Artés-Rodríguez
Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, UK
Massimiliano Pontil
DTU Informatics, Cognitive Systems Section, Technical University of Denmark, Kongens, Lyngby, 2800, Denmark
Lars Kai Hansen

Authors

Jair Montoya-Martínez
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Artés-Rodríguez
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Pontil
View author publications
You can also search for this author in PubMed Google Scholar
Lars Kai Hansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jair Montoya-Martínez.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Montoya-Martínez, J., Artés-Rodríguez, A., Pontil, M. et al. A regularized matrix factorization approach to induce structured sparse-low-rank solutions in the EEG inverse problem. EURASIP J. Adv. Signal Process. 2014, 97 (2014). https://doi.org/10.1186/1687-6180-2014-97

Download citation

Received: 27 March 2014
Accepted: 16 June 2014
Published: 24 June 2014
DOI: https://doi.org/10.1186/1687-6180-2014-97

A regularized matrix factorization approach to induce structured sparse-low-rank solutions in the EEG inverse problem

Abstract

1 Introduction

2 EEG inverse problem background

3 Mathematical background

3.1 Proximity operator

3.2 Subdifferential-proximity operator relationship

3.3 Principles of proximal splitting methods

4 Problem formulation

5 Optimization algorithm

5.1 Matrix factorization approach

5.1.1 Minimization over B(fixed C)

5.1.2 Minimization over C(fixed B)

6 Convergence analysis

Definition 6.1

Definition 6.2

Theorem 6.1

Definition 6.3

Definition 6.4

Definition 6.5

Theorem 6.2

Lemma 6.1

Theorem 6.3

Theorem 6.4

Proof

7 Numerical experiments

7.1 Performance evaluation

8 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords