EURASIP Journal on Applied Signal Processing 2005:19, 3089–3102 c ○ 2005 Hindawi Publishing Corporation Clustering of Dependent Components: A New Paradigm for fMRI Signal Detection

Abs.

which enables to recover underlying signals, or independent components (ICs) from linear data mixtures. Therefore, it is an excellent method to be applied for the spatial localization and temporal characterization of sources of BOLD activation. ICA can be applied to fMRl both temporal or spatial. Spatial ICA has dominated so far in fMRI applications because the spatial dimension is much larger than the temporal dimension in fMRI. However, recent literature results have suggested that temporal and spatial ICA yield similar results for experiments where two predictable task-related components are present.
A new methodology has attracted a lot of attention in the ICA community during the last two years: the idea of fnding "clusters" of independent components. Two leading papers implemented this new paradigm in a striking way. Clusters are defned as connected components of a graphical model (lattice in [HHOI] and treestructured in [BJO31). Both models attempt a decomposition of the source variables such that they are dependent within a cluster and independent between the clusters. This idea emerged from multidimensional ICA, where the sources are not assumed to be all mutually independent [Car98]. Instead, it is assumed that they can be grouped in n-tuples, such that within these tuples they are dependent on each other, but are independent outside.
The two paradigms differ in terms of topology and the knowledge of number and sizes of components. In [HHOI], the components are arranged on a twdimensional grid or lattice as is typical in topographic models. The goal is to defne a statistical model where the topographic organization re3lects the statistical dependencies between components. The components (simple cells) are placed on the grid such that any two cells that are close to each other model dependent components whereas cells that are far from each other model independent components. The measure of dependency is based on the correlation of energies. Energy means in this context the squaring operation. Nonlinear correlations are of importance

I. INTRODUCTION
Functional magnetic resonance imaging with high temporal and spatial resolution represents a powerful technique for visualizing rapid and f n e activation patterns of the human brain. As is known from both theoretical estimations and experimental results, an activated signal variation appears very low on a clinical scanner. This motivates the application of analysis methods to determine the response waveforms and associated activated regions. Generally, these techniques can be divided into two groups: Model-based techniques require prior knowledge about activation patterns, whereas model-free techniques do not. However, model-based analysis methods impose some limitations on data analysis under complicated experimental conditions. Therefore, analysis methods that do not rely on any assumed model of functional response are considered more powerful and relevant. We distinguish two groups of model-free methods: transformation-based and clustering-based. There are two kinds of model-free methods.
The frst method, principal component analysis (PCA) and also independent component analysis (ICA), transforms original data into high-dimensional vector space to separate functional response and various noise sources from each other. The since they cannot be easily set to zero by standard whitening procedures. Translated to our model, this means that energies are strongly positively correlated for neighboring components.
The topology of the model is fxed. This model also requires that the number and sizes of the components have to be fxed in advance. Learning is based on the maximization of the likelihood. A totally different concept is employed in [BJ03]. Here a non-spanning tree and its topology are not fxed in advance. The nodes of the graph forming clusters are connected while nodes belonging to different clusters have no edges. Thus, an intercluster independence is modeled by different clusters and the intracluster dependence is given by connected nodes. Learning related to this paradigm means determining the best possible non-spanning tree that fts a given distribution. It such leams the number and size of the clusters for that given distribution. It's interesting to point out that in traditional ICA the graphical model has no edges meaning that the random variables are mutually independent.
We have seen that both clustering methods as well as ICA techniques have their particular strengths in fMRI signal detection. Therefore, it is natural to look for an unifying technique that combines those two processing mechanisms and applies this to fMRI. The topographic and the treedependent ICA, as previously described, have the computational advantages associated with both techniques.
In this paper, we perform a detailed comparative study for fMRI among the treedependent and topographic ICA with standard ICA techniques such as FastICA [Hyv99], TDSEP [ZM98], and Jade [BAMCM97]. In a systematic manner, we will compare and evaluate the results obtained based on each technique and present the benefts associated with each paradigm.

TREE-DEPENDENT COMPONENT ANALYSIS MODEL
The paradigm of TCA is derived from the theory of treestructured graphical models. In [CL681 was shown a strategy to approximate optimally an ndimensional discrete probability distribution by a product of second-order distributions, or the distribution of the frst-order tree dependence. A tree is an undirected graph with at most a single edge between two nodes. This tree concept can be easily interpreted with respect to ICA. A graph with no edges means that the random variables are mutually independent and this pertains to ICA. On the other hand, if no assumptions are made about independence, then the corresponding family of probability distributions represents the set of all distributions.
A probability distribution can be approximated in several ways. Here, we look into approximations based on a product of n-1 second-order component distributions. In [CL681 was developed a strategy of the best approximation of an nth-xder distribution by a product of n -1 second-order component distributions: where P ( x ) is a joint probability distribution of n discrete variables w i t h x = q , . . . ,zn beingavector, ( m l , . . . ,mn) is an unknown permutation of integers 1.2, . . . , n and P(z;lz~) is by defnition equal to P ( z i ) . The above introduced probability distribution is named a probability distribution of frstorder tree dependence.
To determine the goodness of an approximation, it is necessary to defne a closeness as where P ( x ) and P,(x) are two probability distributions of the n random variables x. The quantity I ( P , Pa) has the property Translated to random variables, the above defnition is named mutual information and is always nonnegative: In the following, we will state the solution to the approximation of the probability distribution. We are searching for a distribution of tree dependence P T ( z l , . . . ,zn) such that I ( P , P,) 5 I ( P , Pt) for all t E Tn where Tn represents the set of all possible frst-order dependence trees. Thus, the solution 7 is defned as the optimal frst-order dependence tree.
In parlance of graph theory, every branch of the dependence tree is assigned a branch weight I ( z ; , z j ( ; ) ) , Thus being given a dependence tree t , the sum of all branch weights becomes a useful quantity.
In [CL681 was shown that a maximum-wejght dependence tree is a dependence tree t such that for all t in T, In other words, a probability distribution of tree dependence Pt(x) is an optimum approximation to P ( x ) if and only if its dependence tree t has maximum weight. Or, minimizing the closeness measure I(P, Pt) is equivalent to maximizing the total branch weight. The idea of approximating discrete probability distributions with dependence trees described before and adapted from

TOPOGRAPHICAL INDEPENDENT COMPONENT ANALYSIS
The topographic independent component analysis [HHOl] represents a unifying model which combines topographic mapping with ICA.
Achieved by a slight modifcation of the ICA model, it can at the same time be used to defne a topographic order between the components, and thus has the usual computational advantages associated with topographic maps. The paradigm of topographic ICA has its roots in [HHOO] where a combination of invariant feature subspaces [Koh96] and independent subspaces [ C a M ] is proposed.
In [HHOO] it is shown that a fusion between the concepts of invariant and independent subspaces can be achieved by considering probability distributions for the n-tuples of si being spherically symmetric, that is. depending on the norm.
In other words, the pdf pi(.) has to be expressed as a function of the sum of the squares of the si, i E Sj only. Additionally, it is assumed that the pdfs are equal for all subspaces.
The log-likelihood of this new data model for i = 1, . . . , n is given by

iCS, iCS,
Since it is known that the projection of visual data on any subspace has a super-Gaussian distribution, the pdf has to be chosen to be sparse. Thus. we will choose G(u) = a&+@ yielding a multidimensional version of an exponential distribution. a and 0 are constants and enforce that s; is of unit variance.
To introduce a topographic representation in the ICA model, it is necessary to relax the assumption of independence among neighboring components si. This makes it necessary to adopt an idea from self-organized neural networks, that of a lattice. It was shown in [HHOI] that a representation which models topographic correlation of energies is an adequate approach for introducing dependencies between neighboring components.
In other words, the variances corresponding to neighboring components are positively correlated while the other variances are in a broad sense independent. The architecture of this new approach is shown in Figure 1.
This idea leads to the following representation of the source signals: where t i is a random variable having the same distribution as si, and the variance a; is fxed to unity.
The variance a, is further modeled by a nonlinearity: where U; are the higher order independent components used to generate the variances, while 4 describes some nonlinearity. The neighborhood function h ( i , j ) can either be a twodimensional grid or have a ring-like stmcture. Further U; and z; are all mutually independent. The learning rule is based on the maximization of the likelihood. First, it is assumed that the data are preprocessed by whitening and that the estimates of the components are uncorrelated. The log-likelihood is for i = 1,. . . , n given by:

) d z h ( k , j ) ( w T x ) ' )
where n n k=l j=1 The function g is the derivative of G = -@I& + 0 1 . After every iteration, the vectors wi in eq. (1 1) are normalized to unit variance and orthogonalized. This equation represents a I949 modulated learning rule, where the learning term is modulated by the term T,.
The classic ICA results from the topographic ICA by setting h(i,j) = a, j .

Iv. RESULTS AND DISCUSSION
FMRI data were recorded from six subjects (3 female, 3 male, age 20-37) performing a visual task. In fve subjects, fve slices with 100 images (TR/TE53000/60msec) were acquired with Eve periods of rest and fve photic simulation periods with rest. Simulation and rest periods comprised IO repetitions each, i.e. 30s. Resolution was 3 x 3 x 4 mm. The slices were oriented parallel to the calcarine fssure. Photic stimulation was The clustering results were evaluated by ( I ) task-related activation maps, (2) associated time-courses and (3) ROC curves.

A. Estimation of the ICA Model
To decide to what extent spatial ICA of fMRI time-series depends on the employed algorithm, we have f n t to look at the optimal number of principal components selected by PCA and used in the ICA decomposition. ICA is a generalization of PCA. In case no ICA is performed, then the number of independent components equals zero, and this means there is no PCA decomposition performed. In the following we will give the set parameters. For PCA, no parameters had to be set. For FastICA we choose: ( I ) t = (2) lo5 as the maximal number of iterations, and (3) the nonlinearity g(u) = tanbu. And last, for topograph ic ICA we set: ( I ) stop criterium is fullflled if the synaptic weights difference between two consecutive iterations is less than x Number of IC, (2) the function g(u) = U, and (3) io4 is the maximal number of iterations.
It is signifcant to fnd a fxed number of ICs that can theoretically predict new observations in same conditions, assuming the basic ICA model actually holds. To do so, we compared the six proposed algorithms for 8, 9, and 16 components in terms of ROC analysis using correlation map with a chosen threshold of 0.4. The obtained results are plotted in Figure 2. It can be seen that topographic ICA outperforms all other ICA methods for 8 and 9 ICs. However. for 16 ICs topographic ICA is outperformed by both FastICA and t r d e p e n d e n t ICA (KGV) using as an approximation of the mutual information the kernel generalized variance.
An interesting aspect can be observed if we compare the computed reference functions at the maximum correlation for tree-dependent and topographic ICA. Figure 3 (a) and (c) illustrates the so-called assignment maps where all the pixels belonging to a specifc cluster are highlighted. The assignment between a pixel and a specifc cluster is given by the minimum distance between the pixel and a IC from the established codebook. On the other hand, each IC shown in Figure 3 (b) and (d) can be viewed as the cluster-specifc weighted average of all pixel time courses. The tree-dependent ICA provides a better approximation than the topographic ICA. Besides of a lower achieved maximum correlation cceffcient, the IC of the latter fails to provide a good approximation of the rectangular pulse.

B. Characterization of Task-Related Effects
For all subjects, and runs, unique task-related activation maps and associated time-courses were obtained by the treedependent and topographic ICA techniques. The correlation of the component time course most closely associated with the visual task for the these two techniques is shown in Table I for IC=8,9,and 16. From the Table, we see for the treedependnet ICA a continuous increase for the correlation coeffcient while for ttie topographic ICA this correlation coeffcient decreases for IC=16.

V. CONCLUSION
In the present paper, we have experimentally compared four standard ICA algorithms already adopted in the fMRI literature with two new algorithms, the tree-dependent and topographic ICA. The goal of the paper was to determine the robustness and reliability of extracting task-related activation maps and time-courses from fMRI data sets. It can be seen that topographic ICA outperforms all other ICA methods for 8 and 9 ICs. However, for 16 ICs topographic ICA is outperformed by both FastICA and tree-dependent ICA (KGV) using as an approximation of the mutual information the kemel generalized variance. The applicability of the new algorithms is demonstrated on experimental data.