 Research
 Open Access
 Published:
Sparse and smooth canonical correlation analysis through rank1 matrix approximation
EURASIP Journal on Advances in Signal Processing volume 2017, Article number: 25 (2017)
Abstract
Canonical correlation analysis (CCA) is a wellknown technique used to characterize the relationship between two sets of multidimensional variables by finding linear combinations of variables with maximal correlation. Sparse CCA and smooth or regularized CCA are two widely used variants of CCA because of the improved interpretability of the former and the better performance of the later. So far, the crossmatrix product of the two sets of multidimensional variables has been widely used for the derivation of these variants. In this paper, two new algorithms for sparse CCA and smooth CCA are proposed. These algorithms differ from the existing ones in their derivation which is based on penalized rank1 matrix approximation and the orthogonal projectors onto the space spanned by the two sets of multidimensional variables instead of the simple crossmatrix product. The performance and effectiveness of the proposed algorithms are tested on simulated experiments. On these results, it can be observed that they outperform the state of the art sparse CCA algorithms.
Introduction
Canonical correlation analysis (CCA) [1] is a multivariate analysis method, the aim of which is to identify and quantify the association between two sets of variables. The two sets of variables can be associated with a pair of linear transforms (projectors) such that the correlation between the projections of the variables in lower dimensional space through these linear transforms are mutually maximized. The pair of canonical projectors are easily obtained by solving a simple generalized eigenvalue decomposition problem, which only involves the covariance and crosscovariance matrices of the considered random vectors. CCA has been widely applied in many important fields, for instance, facial expression recognition [2, 3], detection of neural activity in functional magnetic resonance imaging (fMRI) [4, 5], machine learning [6, 7] and blind source separation [8, 9].
In the context of highdimensional data, there is usually a large portion of features that is not informative in data analysis. When the canonical variables involve all features in the original space, the canonical projectors are, in general, not sparse. Therefore, it is not easy to interpret canonical variables in such highdimensional data analysis. These problems may be tackled by selecting sparse subsets of variables, i.e. obtaining sparse canonical projectors in the linear combinations of variables of each data set [7, 10–12]. For example, in [11], the authors propose a new criterion for sparse CCA and applied a penalized matrix decomposition approach to solve the sparse CCA problem, and in [10], the presented sparse CCA approach computes the canonical projectors from primal and dual representations.
In this paper, we adopt an alternative formulation of the CCA problem which is based on rank1 matrix approximation of the orthogonal projectors of data sets [13]. Based on this new formulation of the CCA problem, we developed a new sparse CCA based on penalized rank1 matrix approximation which aims to overcome the drawback of CCA in the context of highdimensional data and improved interpretability. The proposed sparse CCA seeks to obtain iteratively a sparse pair of canonical projectors by solving a penalized rank1 matrix approximation via a sparse coding method. Also, we present in this paper a smoothed version of the CCA problem based on rank1 matrix approximation where we impose some smoothness on the projections of the variables in order to avoid abrupt or sudden variations. These proposed algorithms differ from the existing ones in their derivation which is based on penalized rank1 matrix approximation and the orthogonal projectors onto the space spanned by the two sets of multidimensional variables instead of the simple crossmatrix product [7, 10–12].
The rest of the paper is organized as follows: In Section 2, we give a brief review of the CCA problem. In Section 3, we present a formulation of CCA using a rank1 matrix approximation of the orthogonal projectors of data sets and derive the smoothed solution. In Section 4, we introduce our new sparse CCA algorithm. In Section 5, we present some simulation results to demonstrate the effectiveness of the proposed method compared to state of the art CCA algorithms. Finally, Section 6 concludes the paper.
Henceforth, bold lower cases denote realvalued vectors and bold upper cases denote realvalued matrices. The transpose of a given matrix A is denoted by A ^{T}. All vectors will be column vectors unless transposed. Throughout the paper, I _{ n } stands for n×n identity matrix, 0 stands for the null vector and 1 _{ n } is the (column) vector of \(\mathbb {R}^{n}\) with one entries only. For a vector x, the notation x _{ i } will stand for the i ^{th} component of x. As usual, for any integer m, ⟦1,m ⟧ stands for {1,2,…m}.
Canonical correlation analysis
In this section, we present briefly a review of CCA and its optimization problem. Let \(\boldsymbol {x}\in \mathbb {R}^{d_{x}}\) and \(\boldsymbol {y}\in \mathbb {R}^{d_{y}}\) be the two random vectors, and we assume, without loss of generality, that both x and y have zero mean, i.e. \(\mathbb {E}[\!\boldsymbol {x}]=\mathbf {0}\) and \(\mathbb {E}[\boldsymbol {\!y}]=\mathbf {0}\) where \(\mathbb {E}[\!\cdot ]\) is the expectation operator. CCA seeks a pair of linear transform \(\boldsymbol {w}_{x}\in \mathbb {R}^{d_{x}}\) and \(\boldsymbol {w}_{y}\in \mathbb {R}^{d_{y}}\), such that correlation between \(\boldsymbol {w}_{x}^{T}\boldsymbol {x}\) and \(\boldsymbol {w}_{y}^{T}\boldsymbol {y}\) is maximized. Mathematically, the objective function to be maximized is given by:
Then, the objective function ρ can be rewritten as:
where \(\boldsymbol {C}_{xx} = \mathbb {E}[\!\boldsymbol {xx}^{\boldsymbol {T}}]\), \(\boldsymbol {C}_{yy} = \mathbb {E}[\!\boldsymbol {y}\boldsymbol {y}^{T}]\) and \(\boldsymbol {C}_{xy} = \mathbb {E}[\boldsymbol {\!x}\boldsymbol {y}^{T}]\) are the covariance matrices. Since the value of ρ(w _{ x },w _{ y }) is invariant with the magnitude of the projection direction, we can turn to solve the following optimization problem
Incorporating these two constraints, the Lagrangian is given by:
Taking derivatives with respect to w _{ x } and w _{ y }, we obtain
These equations lead to the following generalized eigenvalue problem
where λ=2λ _{ x }=2λ _{ y }. One way to solve this problem is as proposed in [6] by assuming C _{ yy } is invertible; we can write
and so substituting in Eq. (6) and assuming C _{ xx } is invertible gives
It has been shown in [6] that we can choose the associated eigenvectors corresponding to the top eigenvalues of the generalized eigenvalue problem in (9) and then use (8) for find the corresponding w _{ y }. A number of existing methods for sparse and smooth CCA have used the description provided above of CCA and focused on the use of the cross matrix C _{ xy } for the derivation of new CCA variant algorithms [7, 10–12]. For the derivation of the proposed CCA variants, we adopt an alternative description of CCA which is based on the orthogonal projectors onto the space spanned by the two sets of multidimensional variables [13].
Canonical correlation analysis based on rank1 matrix approximation
In practice, the covariance matrices C _{ xx }, C _{ yy } and C _{ xy } are usually not available. Instead, the estimated covariance matrices are constructed based on given sample data set. Let \(\boldsymbol {X} = [\!\boldsymbol {x}_{1},\ldots,\boldsymbol {x}_{N}]\in \mathbb {R}^{d_{x}\times N}\) and \(\boldsymbol {Y} = [\boldsymbol {\!y}_{1},\ldots,\boldsymbol {y}_{N}]\in \mathbb {R}^{d_{y}\times N}\) be the two sets of instances of x and y, respectively. Without loss of generality, we can assume both {x _{1},…,x _{ N }} and {y _{1},…,y _{ N }} have zero mean, i.e., \(\sum \limits _{i=1}^{N}\boldsymbol {x}_{i} =\mathbf { 0}\) and \(\sum \limits _{i=1}^{N}\boldsymbol {y}_{i} =\mathbf { 0}\). Or, we can center the data sets such that x _{ i }←x _{ i }−μ _{ x } and y _{ i }←y _{ i }−μ _{ y } for all i∈⟦1,N ⟧, where \(\boldsymbol {\mu }_{x} = N^{1}\sum \limits _{i=1}^{N}\mathbf {x}_{i}\) and \(\boldsymbol {\mu }_{y} = N^{1}\sum \limits _{i=1}^{N}\mathbf {y}_{i}\). Then, the optimization problem for CCA based on estimated covariance matrices is given by
and the generalized eigenvalue problem given by Eqs. (6) and (7) can be rewritten as
Then, by multiplying both sides of Eqs. (11) and (12) by X ^{T}(X X ^{T})^{−1} and Y ^{T}(Y Y ^{T})^{−1}, respectively, we obtain:
where P _{ x }=X ^{T}(X X ^{T})^{−1} X and P _{ y }=Y ^{T}(Y Y ^{T})^{−1} Y are the orthogonal projectors onto the linear spans of the rows of X and Y respectively. So substituting X ^{T} w _{ x } in Eq. (14) and Y ^{T} w _{ y } in Eq. (13) gives
Therefore, we can observe that X ^{T} w _{ x } and Y ^{T} w _{ y } are the left singular vectors associated to the largest singular values of the matrices K _{ xy }=P _{ x } P _{ y } and K _{ yx }=P _{ y } P _{ x } respectively. By using the symmetric property of the matrices P _{ x } and P _{ y } we have:
The singular value decomposition of the matrices K _{ xy } and K _{ yx } is given by:
where u _{ i } and v _{ i } are the i ^{th} column vectors of the matrices U and V, respectively, and D=diag(d _{1},…,d _{ N }) such that d _{1}≥d _{2}≥…≥d _{ N } represent the singular values of K _{ xy } and K _{ yx }. We can deduce from Eqs. (16), (17) and (15) that the left singular vectors of K _{ yx } correspond to the right singular vectors of K _{ xy }.
In order to estimate the canonical projectors, we define the nearest rank1 matrix approximation of K _{ xy } by:
where the nearest means that the squared Frobenius norm between K _{ xy } and K _{1}, defined by \(\big \Vert \boldsymbol {K}_{xy}\boldsymbol {K}_{1} \big \Vert _{F}^{2}\), is minimal. Therefore, the rank1 matrix approximation of K _{ xy } can be formulated as solving the following optimization from:
Consequently, the projected data X ^{T} w _{ x } and Y ^{T} w _{ y } consist of the left and right singular vectors, respectively, associated to the largest singular value of the matrix K _{ xy }. Therefore, after estimating the left and right singular vectors u _{1} and v _{1}. respectively, and associated singular value d _{1} of the matrix K _{ xy }, we can obtain the projectors w _{ x } and w _{ y } by solving the following least square equations (see Step 5 in Algorithm 1):
Hence, for multiple projected data, the solution consist of the associated singular vectors corresponding to the top singular values of the matrix K _{ xy }.
From (18),we can observe that the optimization problem (10) that involves the two constraints \(\Vert \boldsymbol {w}^{T}_{x}\boldsymbol {X}\Vert _{2}=1\) and \(\Vert \boldsymbol {w}^{T}_{y}\boldsymbol {Y}\Vert _{2}=1\) has now been transformed into a rank1 matrix approximation problem free of constraints and which can be solved with an SVD. With this approach, the proposed algorithm avoids the need of using these constraints and hence also avoids their relaxations as it was proposed in [11].
One disadvantage of the above approach is the restriction that X X ^{T} and Y Y ^{T} must be nonsingular. In order to prevent overfitting and avoid the singularity of X X ^{T} and Y Y ^{T} [6], two regularization terms, \(\phantom {\dot {i}\!}\gamma _{x}\boldsymbol {I}_{d_{x}}\) and \(\gamma _{y}\boldsymbol {I}_{d_{y}}\phantom {\dot {i}\!}\), with γ _{ x }>0, γ _{ y }>0 are added in (10). Therefore, the regularized version solves the generalized eigenvalue problem with \(\boldsymbol {P}_{x}=\boldsymbol {X}^{T}(\boldsymbol {X}\boldsymbol {X}^{T}+\gamma _{x}\boldsymbol {I}_{d_{x}})^{1}\boldsymbol {X}\) and \(\boldsymbol {P}_{y}=\boldsymbol {Y}^{T}(\boldsymbol {Y}\boldsymbol {Y}^{T}+\gamma _{y}\boldsymbol {I}_{d_{y}})^{1}\boldsymbol {Y}\). We summarized the method of solving the entire rank1 matrix approximation CCA in Algorithm 1.
Smoothed rank1 matrix approximation CCA algorithm
In order to give preference to a particular solution with desirable properties for the proposed CCA problem, a regularization term (Tikhonov regularization) can be included in Eq. (18) such that:
In many cases, the matrices Ω _{ x } and Ω _{ y } are chosen as a multiple of the identity matrix, giving preference to solutions with smaller norms. In our case, the matrices Ω _{ x } and Ω _{ y } are nonnegative definite roughness penalty matrices used to penalize the second differences [14, 15], and α _{ x }>0 and α _{ y }>0 are tradeoff parameters such as:
The choice of such matrices may be used to enforce smoothness if the underlying vector is believed to be mostly continuous. The criterion of Eq. (19) can be rewritten as
The optimization problem (21) can be alternatively solved by optimizing w _{ x } and w _{ y }. Specifically, we first fix w _{ y } and solve w _{ x } by minimizing (21). Then, we fix w _{ x } and minimize (21) to obtain w _{ y }. The above two procedures are repeated until convergence. Taking derivatives with respect to w _{ x } and w _{ y }, we obtain
Therefore, we obtain w _{ x } and w _{ y } by solving the above equations in the least square sense (see Steps 7 and 9 in Algorithm 2). For multiple canonical projectors, let us consider the singular value decomposition of \(\boldsymbol {K}_{xy} = \boldsymbol {U}\boldsymbol {D}\boldsymbol {V}^{T} = \sum \limits _{i=1}^{N}d_{i}\boldsymbol {u}_{i}\boldsymbol {v}_{i}^{T}\), where u _{ i } and v _{ i } are the i ^{th} column vectors of the matrices U and V, respectively, and D=diag(d _{1},…,d _{ N }) such that d _{1}≥d _{2}≥…≥d _{ N }. In order to estimate the second pair of canonical projectors, we must remove the contribution of the first pair of canonical projectors from the matrix K _{ xy }. To this end, we must remove the contribution of the singular vectors associated to the largest singular value d _{1} using:
As presented in Section 3, the singular vectors u _{1} and v _{1} represent the projected data X ^{T} w _{ x } and Y ^{T} w _{ y }, respectively. Then, by using the unitary property of matrices U and V, we can compute the singular value associated to the singular vectors u _{1} and v _{1} by \(d_{1} = \boldsymbol {u}_{1}^{T}\boldsymbol {K}_{xy}\boldsymbol {v}_{1}\). Therefore, we propose to use a deflation procedure where the second pair of canonical projectors are defined by using the corresponding residual matrix \(\boldsymbol {K}_{xy}\boldsymbol {w}_{x}^{T}\boldsymbol {X} \boldsymbol {K}_{xy}\boldsymbol {Y}^{T}\boldsymbol {w}_{y} \boldsymbol {X}^{T}\boldsymbol {w}_{x} \boldsymbol {w}_{y}^{T}\boldsymbol {Y} \). Then, we can define the other pair of projectors. The method for solving the smoothed rank1 matrix approximation CCA is summarized by Algorithm 2.
For illustrating the advantage of the proposed smoothed CCA approach over standard CCA, we generated for three distinct simulated activation cases; spatially independent case S _{1}, partial spatial overlap S _{2}, and complete spatial overlap case S _{3} as done in [16]. Three temporal sources, with 120 s duration, were constructed to represent the brain hemodynamics, i.e. block design activation (T _{1}), and two sinusoids (T _{2} and T _{3}) with frequencies ∈{1.5,9.5} Hz, respectively, and box signals were used as brain activation patterns [16]. Three distinct visual patterns of size 10×10 voxels were created with amplitudes of 1 at voxel indexes {2,…,6}×{2,…,6} for pattern A, {8,9}×{8,9} for pattern B, and {5,…,9}×{5,…,9} for pattern C, and 0 elsewhere. The three simulated cases are shown in Figs. 1, 2 and 3: spatially independent events in Fig. 1, partial spatial overlapping events in Fig. 2 and complete spatial overlapping events in Fig. 3.
We can observe from Figs. 1, 2 and 3 that the proposed smoothed CCA algorithm have recovered both the temporal signal and spatial maps with better accuracy than CCA for the three presented cases S1, S2 and S3. This demonstrates the effectiveness of the proposed smoothed CCA approach in regularization when the estimated signals are believed to be continuous and smooth.
Sparse CCA algorithm based on rank1 matrix approximation
In this section, we will propose the sparse CCA method based on rank1 matrix approximation by penalizing the optimization problem (18). Then, we propose an efficient iterative algorithm to solve the sparse solution of the proposed criterion.
In general, the canonical projectors w _{ x } and w _{ y } found as solutions in Eq. (18) are not sparse, i.e., the entries of both w _{ x } and w _{ y } are nonzeros. To obtain the sparse solution, we adopt the similar trick used in [7, 11, 12, 17] by imposing penalty functions on the optimization problem (18). Therefore, we can write the new optimization problem as:
where \(\mathcal {F}_{x}(\cdot)\) and \(\mathcal {F}_{y}(\cdot)\) are penalty functions, which can take on a variety of forms. Useful examples are ℓ _{0}quasinorm \(\mathcal {F}(\boldsymbol {z}) = \Vert \boldsymbol {z} \Vert _{0}\) which count the nonzero entries of a vector; Lasso penalty with ℓ _{1}norm \(\mathcal {F}(\boldsymbol {z}) = \Vert \boldsymbol {z} \Vert _{1}\) and so on.
The optimization problem (22) can be alternatively solved by optimizing w _{ x } and w _{ y }. Specifically, we first fix w _{ y } and solve for w _{ x } by minimizing (22). Then, we fix w _{ x } and minimize (22) to obtain w _{ y }. The above two procedures are repeated until convergence.
The straightforward approach to solve this problem is to formulate it as an ordinary sparse coding task. Then, for a fix w _{ y } the problem (22) is equivalent to much simpler sparse coding problem
which can be solved by using any sparse approximation method. In the same way, we can solve the problem (22) regarding w _{ y } for a fix w _{ x } by minimizing the following criterion:
Based on the above description, we can obtain the first pair of sparse projectors w _{ x } and w _{ y }. For multiple projection vectors, we propose to use a deflation procedure as presented in Section 3.1 where the second pair of sparse projectors are defined by using the corresponding residual matrices \(\boldsymbol {K}_{xy}  \boldsymbol {X}^{T}\boldsymbol {w}_{x}\boldsymbol {K}_{xy}\boldsymbol {w}_{y}^{T}\boldsymbol {Y}\boldsymbol {w}_{x}^{T}\boldsymbol {X}\boldsymbol {Y}^{T}\boldsymbol {w}_{y}\). Using the same way, we can define the other pair of sparse projectors.
The uncorrelated entries of the projected vector is obtained due to the orthogonality of the canonical components. The orthogonality among these components is lost due to the constraints added to the cost (18), a nice property enjoyed by standard CCA. Several other CCA procedures lose this property as well; this is just the price to pay for using the other constraints (sparsity or smoothness).
Then, we summarized the method of solving the entire sparse rank1 matrix approximation CCA in Algorithm 3
In terms of difference between the proposed approach to achieve sparse CCA and the method proposed in [11]; the method proposed in [11] uses a penalized matrix decomposition on the crossproduct matrix X Y ^{T}, whereas our proposed approach is based on a rank1 matrix approximation of K _{ xy } as defined in (18). Furthermore, the method proposed in [11] makes the assumption that X X ^{T} and Y Y ^{T} are identities to replace the constraints \(\boldsymbol {w}^{T}_{x}\boldsymbol {X}\boldsymbol {X}^{T}\boldsymbol {w}_{x}\leq 1\) and \(\boldsymbol {w}^{T}_{y}\boldsymbol {Y}\boldsymbol {Y}^{T}\boldsymbol {w}_{y}\leq 1\) by \(\Vert \boldsymbol {w}_{x}\Vert _{2}^{2}\leq 1\) and \(\Vert \boldsymbol {w}_{y}\Vert _{2}^{2}\leq 1\) (Eqs. (4.2) and (4.3) of [11]). This assumption is relaxed in the proposed sparse CCA algorithm presented in Section 4. This is obtained by directly including these constraints \(\boldsymbol {w}^{T}_{x}\boldsymbol {X}\boldsymbol {X}^{T}\boldsymbol {w}_{x}= 1\) and \(\boldsymbol {w}^{T}_{y}\boldsymbol {Y}\boldsymbol {Y}^{T}\boldsymbol {w}_{y}= 1\) in the derivation of the matrix K _{ xy } used in the penalized rank1 matrix approximation via Eq. (3).
The same argument is valid for [7] and [12] as both these papers are based on the crossproduct matrix X Y ^{T}; furthermore, their approaches used for regularization is similar to the one described in Algorithm 1 and therefore different from the regularization adopted in this paper given in Algorithm 2.
Experiments
In this section, we present several computer simulations in the context of blind channel estimation in singleinput multipleoutput (SIMO) systems and blind source separation to demonstrate the effectiveness of the proposed algorithm. We compare the performance of the proposed algorithm with existing state of the art sparse CCA methods:

The sparse CCA presented in [11], relying on a penalized matrix decomposition denoted PMD. An R package implementing this algorithm, called PMA, is available at http://cran.rproject.org/web/packages/PMA/index.html. Sparsity parameters are selected using the permutation approach presented in [18] of which the code is provided in PMA package.

The sparse CCA presented in [7] where the CCA is reformulated as a leastsquares problem denoted LS CCA. A Matlab package implementing this algorithm is available at http://www.public.asu.edu/~jye02/Software/CCA/.

The sparse CCA presented in [12] where the sparse canonical projectors are computed by solving two ℓ _{1}minimization problems by using the Linearized Bregman iterative method [19]. This algorithm is denoted CCA LB (Linearized Bregman). We reimplemented the sparse CCA algorithm proposed in [12] using Matlab.
For the proposed sparse CCA algorithm, we have used \(\mathcal {F}_{x}(\boldsymbol {z})=\mathcal {F}_{y}(\boldsymbol {z})=\Vert \boldsymbol {z}\Vert _{0}\) as penalty functions. We solve the sparse coding problem by using orthogonal matching pursuit (OMP) algorithm [20, 21]. For proposed smoothed CCA algorithm, we chose Ω _{ x }=Ω _{ y } and given by Eq. (20).
Synthetic data
This simulation setup is inspired from [22]. The synthetic data X and Y were generating according to multivariate normal distribution, with covariance matrices described in Table 1. The number of simulations with each configuration was N _{ k }=1000. We compare the performance of our algorithm to the state of the art methods by estimating the precision accuracy of the space spanned by r estimated canonical projectors. We compute for each simulation run k the angle \(\theta ^{k}(\hat {\boldsymbol {W}}^{k}_{x},\boldsymbol {W}_{x})\) between the subspace^{1} spanned by the estimated canonical projectors contained in the columns of \(\hat {\boldsymbol {W}}^{k}\) and the subspace spanned by the true canonical projectors contained in the columns of W _{ x } solution of the eigenproblem (9). The same criterion is used for the canonical projectors W _{ y }. The average angles are estimated over N _{ k } MonteCarlo run such that:
For each algorithm, we used the following parameters: LS CCA algorithm with λ _{ x }=λ _{ y }=0.5, CCA LB algorithm with μ _{ x }=μ _{ y }=2; Algorithm 2 with α _{ x }=α _{ y }=10^{−2} ; and Algorithm 3 with β _{ x }=β _{ y }=3. The simulation performance on the estimated angle between the subspace spanned by the true canonical projectors and the estimated one by the different methods are reported in Tables 2 and 3, and plotted in Fig. 4. Note that the true canonical projectors W _{ x } and W _{ y } are sparse due to the structure of the matrices C _{ xy } (see Eqs. (8) and (9)).
We can observe that the simulation accuracy of the proposed sparse CCA method is significantly better compared to other CCA methods. In the case of low number of observations, the proposed sparse CCA method is still doing well and where the performance gain increases with increasing number of observations. This demonstrates the robustness of our sparse CCA method with respect to the number of available observations and the benefit of using our sparse CCA method in the context of a relatively low number of observations
Blind channel identification for SIMO systems
Blind channel identification is a fundamental signal processing technology aimed at retrieving a system’s unknown information from its outputs only. Estimation of sparse long channels (i.e. channels with small number of nonzero coefficients but a large span of delays) is considered in this simulation. Such sparse channels are encountered in many communication applications: highdefinition television (HDTV) [23], underwater acoustic communications [24] and wireless communications [25, 26]. The problem addressed in this section is to determine the sparse impulse response of a SIMO system in a blind way, i.e. only the observed system outputs are available and used without assuming knowledge of the specific input signal.
Let us consider a mathematical model where the input and the output are discrete, the system is driven by a singleinput sequence s(t) and yields two output sequences x _{1}(t) and x _{2}(t) and the system has finite impulse responses (FIR’s) h _{ i }(t), for t=0,…,L and i=1,2 with L as the maximal channel length (which is assumed to be known). Such a system model can be described as follows :
where ∗ denotes linear convolution, η(t)=[η _{1}(t),η _{2}(t)]^{T} is an additive spatial white Gaussian noise, i.e. \(\mathbb {E}[\boldsymbol {\eta }(t)\boldsymbol {\eta }(t)^{T}]=\sigma ^{2} \boldsymbol {I}_{2}\), and \(\boldsymbol {h} = [\boldsymbol {h}_{1}^{T} \boldsymbol {h}_{2}^{T}]^{T}\) with h _{ i }=[h _{ i }(0),…,h _{ i }(L)]^{T} (i=1,2) denotes the impulse response vector of the i ^{th} channel. Given a finite set of observation of length T, the objective in this experience is to estimate the channel coefficients vector h. The identification method presented by Xu et al. in [27] which is closely related to linear prediction exploits the commutativity of the convolution. Based on this approach and inspired from [28], we present in the following an experience to asses the performance of blind channel identification methods based on CCA.
From Eq. (23), the noisefree outputs x _{ i }(n), i=1,2 and using the commutativity of convolution, it follows :
In case the outputs x _{ i }(t) are corrupted by additive noise, this property inspired the design of the identification diagram shown in Fig. 5, which allows to find estimates of the channels impulse response, \(\widehat {\boldsymbol {h}}_{1}\) and \(\widehat {\boldsymbol {h}}_{2}\), by collecting T observations sample and minimizing the following cost function
where
This problem is a canonical correlation analysis (CCA) problem.
Then, we present here some numerical simulations to assess the performance of the proposed algorithm. We consider a SIMO system with two outputs represented by polynomial transfer function of degree L = 66. The channel impulse response is generated following 3GPP ETU (Extended Typical Urban) channel model [29] with frequency sampling 15.36 MHz which is used to model a channel impulse response for urban area in the context of wireless communications. The multipath delay profile for this channel is shown in Table 4.
The input signal is a BPSK i.i.d. sequence of length T={256,1024}. The observation is corrupted by the additive white Gaussian noise with a variance σ ^{2} chosen such that the signal to noise ratio SNR\(=\frac {\\boldsymbol {h}\^{2}}{\sigma ^{2}}\) varies in the range [0,40] in dB. Statistics are evaluated over N _{ k }=100 Monte Carlo runs, and estimation performance are given by the normalized mean square error criterion :
where \(\widehat {\boldsymbol {h}}_{k}\) denotes the estimated channel coefficient vector at the k ^{th} Monte Carlo run. For each algorithm, we used the following parameters: LS CCA algorithm with λ _{ x }=λ _{ y }=10^{−2}, CCA LB algorithm with μ _{ x }=μ _{ y }=10^{−1}; Algorithm 2 with α _{ x }=α _{ y }=10^{−3} ; and Algorithm 3 with β _{ x }=β _{ y }=10.
In Figs. 6 and 7, the normalized mean square error is plotted versus the SNR for the proposed approaches and state of the art algorithm. It is clearly shown that our sparse CCA based on rank1 matrix approximation provide the best results for all SNR range and all observation length. Especially, we can observe that the proposed method outperforms the PMD algorithm [11] by 9 dB for moderate and high SNR. This results show the robustness of the proposed method against the additive noise and its fast convergence. Indeed, from Fig. 6, we can observe that the proposed sparse CCA method provide for moderate and high SNR a nearoptimal performance even in the case of low observation size.
Blind source separation for fMRI signals
In this section, we evaluate the performance of the proposed CCA variant algorithms on a problem of functional magnetic resonance imaging (fMRI) resting state experiment (see Fig. 8 and Table 5). In this case, we are interested in functional connectivity and recovering a resting state network, i.e. the default mode network from a data matrix Y formed by vectorizing each time series observed in every voxel creating a matrix n×N where n is the number of time points and N the number of voxels (≈10,000−100,000) [30].
To use CCA, either a second data set obtained from a different subject is used or the second data set is obtained from the original data Y by time delay [31]. This last option is used in this application example. Instead of taking N as the total number of voxels, only the cortical, subcortical and cerebellum regions in the brain obtained by parcellating the whole brain into 116 ROIs using automated anatomical labelling [32] were considered. For each considered region, the average time series was generated and used.
The single subject (id 100307) rsfMRI dataset used in this section was obtained from the Human Connectome Project Q1 release [33]. The acquisition parameters of rsfMRI data are 90 × 104 matrix, 220 mm FOV, 72 slices, TR = 0.72 s, TE = 33.1 ms, flip angle = 52°, BW = 2290Hz/Px, inplane FOV = 208 × 180 mm with 2.0 mm isotropic voxels. The obtained data was already preprocessed with the preprocessing pipeline consisting of motion correction, temporal prewhitening, slice time correction and global drift removal, and the scans were spatially normalized to a standard MNI152 template and were resampled to 2 mm × 2 mm × 2 mm voxels. The reader is referred to [33, 34] for more details regarding data acquisition and preprocessing.
The second data set obtained by a single sample delay was used for CCA. The different CCA algorithms were applied on Y and Y _{ t−1} of dimension n×N to allow us to generate canonical correlation components representing maximally correlated temporal profile. The neural dynamics of interest can be obtained by correlating the modulation profile of the canonical correlation components with the time series representing average neural dynamics for regions of interest (ROIs). For functional connectivity analysis of the default mode network (DMN), the modulation profile that was most correlated with posterior cingulate cortex (PCC) representative time series is used. Using the neural dynamics of interest, sparsely distributed and clustered origin of the dynamics are obtained by converting the associated coefficient rows to zscores.
Using the different CCA variant algorithms, the connected regions obtained for DMN are mostly PCC, medial prefrontal cortex (MFC) and right inferior parietal lobe (IPL). As there is no gold standard reference for DMN connectivity available, therefore, we relied on the similarity of temporal dynamics of DMNbased modulation profile with PCC representative time series. The similarity measure used was correlation and estimated as >0.9 for all the algorithms.
Conclusions
In this paper, we have developed two new variants of CCA; more specifically, we have introduced new algorithms for sparse and smooth CCA. The proposed algorithms are based on penalized rank1 matrix approximation and differ from the existing ones in the matrices they use for their derivation. Indeed, instead of focusing on the crossmatrix product of the two sets of multidimensional variables, we have used the product of the orthogonal projectors onto the space spanned by the columns of the two sets of multidimensional variables. Using this approach, the sparse and smooth CCA algorithms proposed differ only in the penalty used in the penalized rank1 matrix approximation. Simulation results illustrating the effectiveness of the proposed CCA variant algorithms are provided where we can observe that proposed sparse CCA outperforms state of the art methods. As a continuation of the presented work and in order to fix the tuning parameters of the proposed approaches, the main idea of the permutation method presented in [18] will be studied and adapted.
Endnotes
^{1} Let A and B be two matrices. In order to compute the angle θ between the subspaces spanned by the columns of A and B; first, we compute an orthonormal basis A _{⊥} and B _{⊥} for the range of A and B respectively. θ is computed by \(\theta =\arccos (\min (\boldsymbol {A}_{\perp }^{T}\boldsymbol {B}_{\perp }))\).
References
 1
H Hotelling, Relations between two sets of variables. Biometrika. 28(3–4), 321–377 (1936).
 2
W Zheng, X Zhou, C Zou, L Zhao, Facial expression recognition using kernel canonical correlation analysis (KCCA). IEEE Trans. Neural Netw. 17(1), 233–238 (2006).
 3
XY Jing, S Li, C Lan, D Zhang, J Yang, Q Liu, Color image canonical correlation analysis for face feature extraction and recognition. Signal Process. 91(8), 2132–2140 (2011).
 4
O Friman, J Carlsson, P Lundberg, M Borga, H Knutsson, Detection of neural activity in functional MRI using canonical correlation analysis. Magn. Reson. Med. 45(2), 323–330 (2001).
 5
DR Hardoon, J MouraoMiranda, M Brammer, J ShaweTaylor, Unsupervised analysis of fMRI data using kernel canonical correlation. NeuroImage. 37(4), 1250–1259 (2007).
 6
DR Hardoon, S Szedmak, J ShaweTaylor, Canonical correlation analysis: an overview with application to learning methods. Neural Comput. 16(12), 2639–2664 (2004).
 7
L Sun, S Ji, J Ye, Canonical correlation analysis for multilabel classification: a leastsquares formulation, extensions, and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 194–200 (2011).
 8
W Liu, DP Mandic, A Cichocki, Analysis and online realization of the CCA approach for blind source separation. IEEE Trans. Neural Netw. 18(5), 1505–1510 (2007).
 9
YO Li, T Adali, W Wang, VD Calhoun, Joint blind source separation by multiset canonical correlation analysis. IEEE Trans. Signal Process. 57(10), 3918–3929 (2009).
 10
DR Hardoon, J ShaweTaylor, Sparse canonical correlation analysis. Mach. Learn. 83(3), 331–353 (2011). doi:10.1007/s1099401052227.
 11
DM Witten, R Tibshirani, T Hastie, A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics. 10(3), 515–534 (2009). doi:10.1093/biostatistics/kxp008.
 12
D Chu, LZ Liao, MK Ng, X Zhang, Sparse canonical correlation analysis: new formulation and algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 3050–3065 (2013).
 13
KV Mardia, JT Kent, JM Bibby, Multivariate Analysis. Probability and mathematical statistics, 1st edn. (Academic Press, University of Leeds, Leeds, 1979).
 14
AN Tikhonov, On the stability of inverse problems. Doklady Akademii nauk SSSR. 39(5), 195–198 (1943).
 15
JO Ramsay, BW Silverman, Functional Data Analysis, 2nd edn. (SprinverVerlag, New York, 2005).
 16
K Lee, SK Tak, JC Yee, A data driven sparse GLM for fMRI analysis using sparse dictionary learning and MDL criterion. IEEE Trans. Med. Imaging. 30:, 1176–1089 (2011).
 17
A AïssaElBey, AK Seghouane, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Sparse canonical correlation analysis based on rank1 matrix approximation and its application for FMRI signals, (2016), pp. 4678–4682. doi:10.1109/ICASSP.2016.7472564.
 18
S Gross, B Narasimhan, R Tibshirani, D Witten, Correlate: sparse canonical correlation analysis for the integrative analysis of genomic data. Technical Report User guide and technical document, Stanford University (2011).
 19
JF Cai, S Osher, Z Shen, Convergence of the linearized bregman iteration for ℓ _{1}norm minimization. Technical Report CAM Report 08–52, University of California Los Angeles (2008).
 20
YC Pati, R Rezaiifar, PS Krishnaprasad, 1. Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers (Pacific Grove, 1993), pp. 40–44. doi:10.1109/ACSSC.1993.342465.
 21
G Davis, S Mallat, M Avellaneda, Adaptive greedy approximations. Constr. Approximation. 13(1), 57–98 (1997). doi:10.1007/BF02678430.
 22
JA Branco, C Croux, P Filzmoser, MR Oliveira, Robust canonical correlations: a comparative study. Comput. Stat. 20(2), 203–229 (2005). doi:10.1007/BF02789700.
 23
W Schreiber, Advanced television systems for terrestrial broadcasting: some problems and some proposed solutions. Proc. IEEE. 83(6), 958–981 (1995).
 24
M Kocic, D Brady, M Stojanovic, in Proc. OCEANS, 3. Sparse equalization for realtime digital underwater acoustic communications (San Diego, 1995), pp. 1417–1422.
 25
L PerrosMeilhac, E Moulines, K AbedMeraim, P Chevalier, P Duhamel, Blind identification of multipath channels: a parametric subspace approach. IEEE Trans. Signal Process. 49(7), 1468–1480 (2001).
 26
S Ariyavisitakul, N Sollenberger, L Greenstein, Tap selectable decisionfeedback equalization. IEEE Trans. Commun. 45(12), 1497–1500 (1997).
 27
G Xu, H Liu, L Tong, T Kailath, A leastsquares approach to blind channel identification. IEEE Trans. Signal Process. 43(12), 2982–2993 (1995).
 28
S Van Vaerenbergh, J Via, I Santamaria, Blind identification of SIMO Wiener systems based on kernel canonical correlation analysis. IEEE Trans. Signal Process. 61(9), 2219–2230 (2013).
 29
3GPP TS 36.104, Evolved Universal Terrestrial Radio Access (EUTRA); Base Station (BS) Radio Transmission and Reception (2015). 3GPP TS 36.104. www.3gpp.org/dynareport/36104.htm.
 30
NA Lazar, Statistics for Biology and Health. The Statistical Analysis of Functional MRI Data, 1st edn. (Springer, New York, 2008).
 31
MU Khaled, AK Seghouane. Improving functional connectivity detection in FMRI by combining sparse dictionary learning and canonical correlation analysis, 10th IEEE International Symposium on Biomedical Imaging (San Francisco, 2013), pp. 286–289. doi:10.1109/ISBI.2013.6556468.
 32
N TzourioMazoyer, B Landeau, D Papathanassiou, F Crivello, O Etard, N Delcroix, B Mazoyer, M Joliot, Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the mni mri singlesubject brain. NeuroImage. 15:, 273–289 (2002).
 33
DM Barch, GC Burgess, MP Harms, SE Petersen, BL Schlaggar, M Corbetta, MF Glasser, S Curtiss, S Dixit, C Feldt, D Nolan, E Bryant, T Hartley, O Footer, JM Bjork, R Poldrack, S Smith, H JohansenBerg, AZ Snyder, DCV Essen, Function in the human connectome: taskfMRI and individual differences in behavior. NeuroImage. 80:, 169–189 (2013).
 34
MF Glasser, SN Sotiropoulos, JA Wilson, TS Coalson, B Fischl, JL Andersson, J Xu, S Jbabdi, M Webster, JR Polimeni, DCV Essen, M Jenkinson, The minimal preprocessing pipelines for the human connectome project. NeuroImage. 80:, 105–124 (2013).
Funding
No funding was received or used to prepare this manuscript.
Authors’ contributions
All authors contributed equally to this work. All authors discussed the results and implications and commented on the manuscript at all stages. Both authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Author information
Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
AïssaElBey, A., Seghouane, A. Sparse and smooth canonical correlation analysis through rank1 matrix approximation. EURASIP J. Adv. Signal Process. 2017, 25 (2017). https://doi.org/10.1186/s136340170459y
Received:
Accepted:
Published:
Keywords
 Canonical correlation analysis
 Sparse representation
 Rank1 matrix approximation