 Research
 Open Access
 Published:
Nonunitary matrix joint diagonalization for complex independent vector analysis
EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 241 (2012)
Abstract
Independent vector analysis (IVA) is a special form of independent component analysis (ICA), which has demonstrated its prominent performance in solving convolutive blind source separation (BSS) problems in the frequency domain. Most IVA algorithms are based on optimizing certain contrast functions, where the main difficulty of these approaches lies in finding a reliable and fast estimation of the unknown distribution of sources. Despite the rich availability of efficient tensorial approaches to the standard ICA problem, these methods have not been explored considerably for IVA. In this article, we propose a matrix joint diagonalization approach to solve the complex IVA problem. The new factorization neither relies on a whitening process, nor does it require an estimate of the joint probability distribution of the dependent signal groups. The latter is in contrast to most IVA approaches up to date. The underlying geometry of the problem is investigated together with a critical point analysis of the resulting cost function. A conjugate gradient algorithm on the appropriate manifold setting is developed.
1 Introduction
Independent component analysis (ICA) is a standard statistical tool for solving the blind source separation (BSS) problem. BSS aims to recover source signals from the observed mixtures, without knowing either the distribution of the sources or the mixing process. Application of the standard ICA model is often limited, since it requires mutual statistical independence between all individual components. However, in many applications, there exist groups of signals of interest, where components from different groups are mutually statistically independent indeed, but where mutual statistical dependence occurs between components in the same group. Such problems can be tackled by a technique now referred to as multidimensional independent component analysis (MICA) [1], or independent subspace analysis (ISA) [2].
A special form of ISA arises in solving the BSS problem with convolutive mixtures [3]. After transferring the convolutive observations into the frequency domain via shorttime Fourier transforms, the convolutive BSS problem results in a collection of instantaneous complex BSS problems in each frequency bin. After solving the subproblems individually, the final stage faces the challenge of aligning all statistically dependent components from different groups, which is referred to as the permutation problem. To avoid this problem, a new approach named independent vector analysis (IVA) has been proposed in [4]. Besides its application in convolutive BSS problem, IVA has also recently been applied to analyze multivariate Gaussian models, cf. [5, 6]. In the current literature, the majority of IVA algorithms are based on optimizing certain contrast functions, cf. [5, 7–9]. The main difficulty of these contrast function based approaches lies in estimating the unknown distribution of the sources, which usually requires a large number of observations [10].
On the other hand, tensorial approaches are efficient and richly available to solve both the ICA and ISA problems. In particular, joint block diagonalization approaches are shown to be effective methods for solving the ISA problem, cf. [11, 12], and are inherently applicable to IVA. However, such general joint block diagonalization approaches do not take the intrinsic structure of the IVA problem into account. Recent study in [13] proposes a joint diagonalization approach of cross cumulant matrices to solve the complex IVA problem. More recently, the present authors have developed a similar approach of jointly diagonalizing both cross covariance and cross pseudo covariance matrices, cf. [14]. In this article, we extend the previous study in [14], and adapt the socalled complex oblique projective (COP) manifold, which has proven to be an appropriate setting for the standard instantaneous complex ICA problem [15], to the current scenario. Finally, an efficient conjugate gradient (CG) based IVA algorithm is proposed, and numerical experiments are provided to demonstrate the convergence properties of the proposed CG algorithm, and to compare its performance with two recently developed IVA algorithms in terms of separation quality.
2 Notations
Throughout the article, (·)^{⊤} denotes the matrix transpose, (·)^{H} the Hermitian transpose, $\overline{(\xb7)}$ the entrywise complex conjugate of a matrix, and by Gl(m) the set of all m×m invertible complex matrices. The Frobenius norm of a matrix $A\in {\mathbb{C}}^{m\times n}$ is denoted by $\parallel A{\parallel}_{F}:=\sqrt{tr\left(A{A}^{\mathsf{H}}\right)}$, where tr(·) is the trace of a square matrix. Given a square matrix $Z\in {\mathbb{C}}^{m\times m}$, ddiag(Z)forms a diagonal matrix whose diagonal entries are those of Z, and off(Z) generates a matrix by setting all diagonal entries of Z to zero, i.e. off(Z):=Z−ddiag(Z).
In this study, we consider an mdimensional complex signal $s\left(t\right)={\left[{s}_{1}\right(t),\dots ,{s}_{m}(t\left)\right]}^{\top}\in {\mathbb{C}}^{m}$ as an mdimensional complex stochastic process indexed by the variable t. The empirical expectation of a random variable s is denoted by $\mathbb{E}\left[s\right(t\left)\right]=\frac{1}{T}\sum _{t=1}^{T}s\left(t\right)$, where T is the number of samples. As usual for the standard ICA model, we assume without loss of generality that $\mathbb{E}\left[s\right(t\left)\right]=0$. The empirical covariance and pseudocovariance matrix of complex signals s(t) are referred to as $\text{cov}\left(s\right(t\left)\right):=\mathbb{E}\left[s\right(t\left)s{\left(t\right)}^{\mathsf{H}}\right]$ and $\text{pcov}\left(s\right(t\left)\right):=\mathbb{E}\left[s\right(t\left)s{\left(t\right)}^{\top}\right]$, respectively.
3 Problem description
It is known that convolutive BSS problems can be transformed into in the frequency domain, and can be solved as instantaneous complex BSS problems for every frequency simultaneously, when the demixing filter is sufficiently longer than the mixing filter, cf. [16, 17]. In this study, we consider the spectral timefrequency representation of a signal in terms of a shorttime Fourier transformation that is centered at time t. Let ${w}_{i}(t,f)\in \mathbb{C}$ and ${s}_{i}(t,f)\in \mathbb{C}$ denote the coefficient of the center frequency f of the i th observation w_{ i }(t) and the i th source signal s_{ i }(t), respectively. Then, for a given pair (t, f), the Fourier coefficients of the observations and the sources obey the equality
where $w(t,f):={\left[{w}_{1}\right(t,f),\dots ,{w}_{m}(t,f\left)\right]}^{\top}\in {\mathbb{C}}^{m}$, $s(t,f):={\left[{s}_{1}\right(t,f),\dots ,{s}_{m}(t,f\left)\right]}^{\top}\in {\mathbb{C}}^{m}$, and ${A}_{f}\in {\mathbb{C}}^{m\times m}$ serves as a complex mixing matrix. More compactly, for a fixed frequency f , we get a standard instantaneous complex BSS problem as
where $W\left(f\right)\in \left[w\right(1,f),\dots ,w(T,f\left)\right]\in {\mathbb{C}}^{m\times T}$ and $S\left(f\right)\in \left[s\right(1,f),\dots ,s(T,f\left)\right]\in {\mathbb{C}}^{m\times T}$, with T being the number of chosen time frames. One popular approach to solve the convolutive BSS problem is to solve the individual instantaneous BSS problem at each frequency (2), and then assemble the results from each frequency to reconstruct the estimated signal in the time domain [18].
Let us denote the rows of S(f)by ${s}_{i}\left(f\right)=\left[{s}_{i}\right(1,f),\dots ,{s}_{i}(T,f\left)\right]\in {\mathbb{C}}^{1\times T}$ for i = 1,…,m. Following the assumption of statistical independence between the sources, the complex valued signals s_{ i }(f) and s_{ j }(f) are statistically independent for i ≠ j. In contrast, we assume that for a pair of frequencies (f_{ p },f_{ q })with f_{ p }≠ f_{ q }, the complex signals s_{ i }(f_{ p }) and s_{ i }(f_{ q }) are statistically dependent for a given source. The development of IVA is inspired by this cross frequency structure. It aims to find a set of demixing matrices $\left\{{X}_{f}\right\}\subset Gl\left(m\right)$ via
such that

(1)
all subICA problems are solved, and

(2)
the statistical alignment between groups is restored, i.e. the estimated i th signals {y _{ i }(f)} are mutually statistically dependent.
The main idea for our approach is to exploit the cross covariance matrices between groups of observations defined as
Similarly, the socalled pseudo cross covariance, defined as
also allows to gain additional information about the secondorder statistics of the involved signals. In this study, we assume that cross covariances between sources in all groups do not vanish. The assumption of statistical independence between the source signals implies that the cross covariance matrix cov(S(f_{ i }),S(f_{ j })) and the pseudo cross covariance matrix pcov(S(f_{ i }),S(f_{ j }))are diagonal for all pairs (i j). With a further assumption on the sources being nonstationary, which has been exploited in [19], we arrive at a problem of jointly diagonalizing two sets of cross covariance and pseudo cross covariance matrices at different time instances.
To summarize, we are interested in solving the following problem. For a complex IVA problem with k subproblems, we consider the cross covariance and pseudo cross covariance matrices at n time instances, i.e. for all i,j=1,…,k and t=1,…,n, a set of matrices $\left(\right)close="">{\left\{{C}_{ij}^{\left(t\right)}\right\}}_{ij}$ and a set of complex symmetric matrices $\left(\right)close="">{\left\{{R}_{ij}^{\left(t\right)}\right\}}_{ij}$, which are constructed by
where ${\Omega}_{ij}^{\left(t\right)},{\stackrel{~}{\Omega}}_{ij}^{\left(t\right)}\in {\mathbb{C}}^{m\times m}$ are diagonal. The task is to find a set of matrices ${\left\{{X}_{i}\right\}}_{i=1}^{k}\subset \mathrm{Gl}\left(m\right)$ such that
for all i < jand t = 1,…,n, are simultaneously, or approximately simultaneously diagonalized. In this study, we study the noise free IVA problem as defined in (2), and neglect the cross covariance matrix estimation errors due to the finite sample size effect. In other words, we assume that both sets of $\left(\right)close="">{C}_{ij}^{\left(t\right)}$’s and $\left(\right)close="">{R}_{ij}^{\left(t\right)}$’s are jointly diagonalizable.
Note that the above problem is similar to the simultaneous SVD formulation proposed in [20], where only the situation with two transform matrices is studied, i.e. k = 2. To the contrary, our current setting deals with the cases of multiple transform matrices {X_{ i }}_{i=1,…,k}, which are not restricted to be unitary. Finally, instead of considering second order cross covariance matrices, our developed approach can be generalized to the high order cross cumulants. We refer to [17] for further details.
4 Diagonality measure and the COP manifold
Our cost function to tackle problem (7) originates from the popular offnorm function that measures the squared Frobenius norm of the offdiagonal entries of the involved matrices. We develop an appropriate mathematical setting on the subsequently defined complex oblique projective (COP) manifold to provide its critical point analysis.
4.1 Derivation of the cost function
For legibility reasons, from now on, we only consider the problem of simultaneously diagonalizing the covariance matrices, i.e. the first condition in (7). The combination with the additional requirement that also the pseudo cross covariance matrices may be used for estimating the demixing matrix is straightforwardly adapted to our setting and not further discussed here.
Let us define the offnorm function as
Due to the noisefree assumption and since we neglect finite sample size effects, the set of joint diagonalizers $\left(\right)close="">({X}_{1}^{\ast},\dots ,{X}_{k}^{\ast})$ of the $\left(\right)close="">{C}_{ij}^{\left(t\right)}$ in Equation (7) is a global minimum of g, that is $\left(\right)close="">g({X}_{1}^{\ast},\dots ,{X}_{k}^{\ast})=0$. It is clear that a minimization approach without further constraints on the X_{ i } would drive all diagonalizers to zero. In order to avoid such trivial solutions and to regularize the minimization problem, the authors in [21] propose to restrict all columns of transform matrices to have unit norm. This set is known as the oblique manifold, which has been shown to be an appropriate setting for matrix diagonalization, cf. [22]. Its complex counterpart is the socalled complex oblique manifold
and we denote by Ob^{k}(m) the product manifold of k copies of Ob(m). The restriction of the offnorm cost function (8) is denoted by
Now denote the p th column of X_{ i }by x_{ ip }. It is obvious that the function g_{1} is invariant with respect to the phase difference of each column x_{ ip }, which reflects the wellknown scaling ambiguity of complex ICA problems. By a further calculation, g_{1} has the form
Instead of fixing a phase for each x_{ ip }, in this study we employ an elegant mathematical setting for the problem. Recall the fact that each $\left(\right)close="">{x}_{ip}{x}_{ip}^{\mathsf{H}}$ defines a Hermitian rankone projector, the set of which identifies the (m − 1) dimensional complex projective space${\mathbb{C}\mathbb{P}}^{m1}$, i.e.
By doing so for each column and by maintaining the fact that the columns of X_{ i } form a complex basis (i.e. invertibility of X_{ i }), we naturally arrive at the following set, which we refer to as the complex oblique projective (COP) manifold,
The offnorm cost function g_{1} now induces the following function g_{2} on the COP manifold. Namely, if ${\mathcal{Q}}^{k}\left(m\right)$ denotes the ktimes product of $\mathcal{Q}\left(m\right)$, g_{2} is given by
with P_{ i }:=(P_{i 1},…,P_{ im }).
4.2 The geometry of the complex oblique projective manifold
In this section, we recall some basic facts and concepts that are necessary for developing a Riemannian CG algorithm on the COP manifold, cf. [23]. In particular, we require a formula for the parallel transport and the geodesics of the COP manifold. We endow $\mathcal{Q}\left(m\right)$ with the standard Riemannian metric
inherited from the Euclidean metric of the mfold product of Hermitian matrices. With this, $\mathcal{Q}\left(m\right)$ is an open and dense Riemannian submanifold of the mtimes product of ${\mathbb{C}\mathbb{P}}^{m1}$ with the standard metric, i.e.
where $\overline{\mathcal{Q}\left(m\right)}$ denotes the closure of $\mathcal{Q}\left(m\right)$. Accordingly, the tangent spaces, the geodesics, and the parallel transport for $\mathcal{Q}\left(m\right)$ and ${\left({\mathbb{C}\mathbb{P}}^{m1}\right)}^{m}$ coincide locally and thus are easily derived from the geometry of ${\mathbb{C}\mathbb{P}}^{m1}$. We refer to [24] for further discussions and details about ${\mathbb{C}\mathbb{P}}^{m1}$.
Let us denote by
the set of skewHermitian matrices. The tangent space at P in ${\mathbb{C}\mathbb{P}}^{m1}$ is given by
where [A,B]:=AB−BA is the matrix commutator. Then, the tangent space at $\mathbf{P}=({P}_{1},\dots ,{P}_{m})\in \mathcal{Q}\left(m\right)$ is simply the Cartesian product
With the above metric, the geodesics through $P\in {\mathbb{C}\mathbb{P}}^{m1}$ in direction $Z\in {T}_{P}{\mathbb{C}\mathbb{P}}^{m1}$ are given by
where e^{(·)}denotes the matrix exponential. Thus, the (local^{a}) geodesic through $\mathbf{P}\in \mathcal{Q}\left(m\right)$ in direction $\mathbf{Z}:=({Z}_{1},\dots ,{Z}_{m})\in {T}_{\mathbf{P}}\mathcal{Q}\left(m\right)$ is given by
The parallel transport of $\mathbf{\Psi}:=({\Psi}_{1},\dots ,{\Psi}_{m})\in {T}_{\mathbf{P}}\mathcal{Q}\left(m\right)$ with respect to the LeviCivita connection along the geodesic γ_{P,Z}(t) is
with τ_{P,Z} being the parallel transport of $\Psi \in {T}_{P}{\mathbb{C}\mathbb{P}}^{m1}$ with respect to the LeviCivita connection along the geodesic γ_{P,Z}(t), i.e.
The natural or Riemannian gradient of a function that is the restriction of some globally defined function to a submanifold is simply the orthogonal projection of the Euclidean gradient onto the corresponding tangent space. For the complex projective space, this projection is given by
It is easily seen that the operator π_{ P } is an orthogonal projector on the tangent space ${T}_{p}{\mathbb{C}\mathbb{P}}^{m1}$, i.e. that π_{ P }∘π_{ P }(A)=π_{ P }(A) and that the null space of π_{ P } is orthogonal to its image. Here, ∘denotes the composition of functions. The formulas for the tangent spaces, the geodesics, the parallel transport, and the projection onto the tangent spaces of ${\mathcal{Q}}^{k}\left(m\right)$ follow directly from the product manifold structure.
5 Critical point analysis of the cost function
In this section, we conduct a critical point analysis of the cost function g_{2} on the product COP manifold. We show that the joint diagonalizers are a nondegenerate global minimum of g_{2}, This is an important fact, since in many cases the speed of convergence relies on the nondegeneracy of the minima. First of all, we present a lemma which originates from the derivation of the cost from the offnorm function.
Lemma 1
Let us assume that all $\left(\right)close="">{C}_{ij}^{\left(t\right)}$’s are jointly diagonalizable. If $\left(\right)close="">({X}_{1}^{\ast},\dots ,{X}_{k}^{\ast})\in O{b}^{k}\left(m\right)$ minimizes the cost function g_{1}, as defined in (10), i.e. $\left(\right)close="">{X}_{i}^{\ast \mathsf{H}}{C}_{ij}^{\left(t\right)}{X}_{j}^{\ast}={D}_{ij}^{\left(t\right)}=\text{diag}({d}_{ij1}^{\left(t\right)},\dots ,{d}_{\mathrm{ijm}}^{\left(t\right)})$ being diagonal for all t=1,…,n and i,j=1,…,k, then the set of corresponding Hermitian projectors ${\mathcal{P}}^{\ast}=({\mathbf{P}}_{1}^{\ast},\dots ,{\mathbf{P}}_{k}^{\ast})\in {\mathcal{Q}}^{k}\left(m\right)$ with ${P}_{\mathrm{ip}}^{\ast}:={x}_{\mathrm{ip}}^{\ast}{x}_{\mathrm{ip}}^{\ast \mathsf{H}}\in {\mathbb{C}\mathbb{P}}^{m1}$ minimizes the cost function g_{2}, defined in (14) and
The above lemma follows directly from the condition of $\left(\right)close="">{X}_{i}^{\ast \mathsf{H}}{C}_{ij}^{\left(t\right)}{X}_{j}^{\ast}$ being diagonal, i.e. its (p,q)th entry is computed as
Now, let $\mathcal{P}:=({\mathbf{P}}_{1},\dots ,{\mathbf{P}}_{m})\in {\mathcal{Q}}^{k}\left(m\right)$ be arbitrary. We compute the first derivative of g_{2}at $\mathcal{P}\in {\mathcal{Q}}^{k}\left(m\right)$ in direction $\mathcal{Z}:=({\mathbf{Z}}_{1},\dots ,{\mathbf{Z}}_{k})\in {T}_{\mathcal{P}}{\mathcal{Q}}^{k}\left(m\right)$ as
By recalling the structure of the tangent space of ${\mathcal{Q}}^{k}\left(m\right)$ and the result in Lemma 1, it is trivial to see that the first derivative of g_{2} vanishes at ${\mathcal{P}}^{\ast}$, which corresponds to the correct joint diagonalizers.
The remainder of this section addresses the characterization of the Hessian of g_{2}at the joint diagonalizers. To that end, we denote by $\mathbb{o}\mathbb{f}\mathbb{f}\left(m\right):=\{Z\in {\mathbb{C}}^{m\times m}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}{z}_{\mathrm{ii}}=0,$i=1,…,m}the set of matrices with zero diagonal. Let Πbe the natural projection
and let μ_{ X }be defined as
where e_{ j } denotes the j th standard basis vector. Note, that μ_{ X }defines a locally injective but not bijective mapping. The composition of Π and μ_{ X }, however, yields a local diffeomorphism. With the shorthand notation P := Π(X), the mapping
is a local parametrization around P and thus permits a local parameterization of ${\mathcal{Q}}^{k}\left(m\right)$ via
with ${\Phi}_{\mathcal{P}}\left(0\right)=\mathcal{P}:=({\mathbf{P}}_{1},\dots ,{\mathbf{P}}_{k})$. The associated tangent map $T{\Phi}_{\mathcal{P}}$ is given as
where $\left(\right)close="">\xi \left({x}_{\mathrm{ip}}\right):={I}_{m}{x}_{\mathrm{ip}}{x}_{\mathrm{ip}}^{\mathsf{H}}$ is the orthogonal projection operator onto the complement space of x_{ ip }. Let $\mathcal{Z}:=T{\Phi}_{{\mathcal{P}}^{\ast}}(\Theta )\in {T}_{{\mathcal{P}}^{\ast}}{\mathcal{Q}}^{k}\left(m\right)$. Then, we can compute the Hessian of g_{2} at the critical points ${\mathcal{P}}^{\ast}\in {\mathcal{Q}}^{k}\left(m\right)$, i.e. the symmetric bilinear form ${\mathsf{H}}_{{g}_{2}}\phantom{\rule{0.3em}{0ex}}\left({\mathcal{P}}^{\ast}\right):{T}_{{\mathcal{P}}^{\ast}}{\mathcal{Q}}^{k}\left(m\right)\times {T}_{{\mathcal{P}}^{\ast}}{\mathcal{Q}}^{k}\left(m\right)\to \mathbb{R}$ via
The last equality holds by following the results in Lemma 1, i.e. $\left(\right)close="">{X}_{i}^{\ast \mathsf{H}}{C}_{ij}^{\left(t\right)}{X}_{j}^{\ast}={D}_{ij}^{\left(t\right)}$, which is equivalent to $\left(\right)close="">{C}_{ij}^{\left(t\right)}={X}_{i}^{\ast \mathsf{H}}{D}_{ij}^{\left(t\right)}{X}_{j}^{\ast 1}$ and the fact that $\left(\right)close="">{Z}_{\mathrm{ip}}^{\ast}=({I}_{m}{P}_{i1}){X}_{i}^{\ast}{\theta}_{\mathrm{ip}}$. It can easily been seen that the Hessian form (33) is positive definite if and only if all (2×2)matrices
are positive definite. Since this is a generic assumption on the data, we have the following result.
Theorem 1
Generically, the global minimizer of the cost function g_{2}is nondegenerate.
6 A CG IVA algorithm
In this section, we introduce a general form of CG algorithms on matrix manifolds. After computing the Riemannian gradient of the cost function on the COP manifold, we develop a CG based IVA algorithm. The CG methods on matrix manifolds are shortly reviewed here. They form the backbone of the algorithm for our optimization problem on the COP manifold and explain the use of the differential geometric concepts derived in the previous sections. For an indepth introduction on optimization on matrix manifolds, we refer the interested reader to [23].
Let M be a submanifold of some Euclidean space with inner product 〈·,·〉 and let $f:M\to \mathbb{R}$ be smooth. The CG method is initialized by some x_{0}∈M and the descent direction H_{0}:=−gradf(x_{0})given by the Riemannian gradient. If f is the restriction of a globally defined function $\hat{f}$ to M, the Riemannian gradient is just the orthogonal projection of the gradient of $\hat{f}$ to the tangent space, i.e.
where $\nabla \hat{f}\left(x\right)$ denotes the Euclidean gradient of $\hat{f}$, and π_{ x } is the orthogonal projection onto T_{ x }M. Subsequently, sweeps are iterated that consist of two steps, a line search in a given direction (i.e. along a geodesic in that direction) followed by an update of the search direction. Several different possibilities for these steps lead to different CG methods. Assume now that x_{ i }, H_{ i }, and G_{ i }:=grad f(x_{ i }) are given.
Given a geodesic γ_{ i }with γ_{ i }(0)=x_{ i } and ${\stackrel{\u0307}{\gamma}}_{i}\left(0\right)={H}_{i}$, the line search aims to find ${\lambda}_{i}\in \mathbb{R}$ that minimizes $f\circ \gamma :t\to \mathbb{R}$. A generic approach for the stepsize selection is a Riemannian adaption to the backtracking line search and several modifications, cf. [23, 25]. Here, we present a closed form solution for the stepsize selection that works particularly well for our problem due to the quadratic nature of our cost function, cf. [26]. It is based on the assumption that a onedimensional Newton step along f∘γyields a good approximation for its minimizer. Explicitly, we choose the stepsize as
The absolute value in the denominator is chosen for the following reason. While being an unaltered onedimensional Newton step in a neighborhood of a minimum the step size is the negative of a regular Newton step if $\frac{{d}^{2}}{d{t}^{2}}(f\circ \gamma )\left(\lambda \right){}_{\lambda =0}<0$ and thus yields nonattractiveness for critical points that are not minima.
In order to compute the new search direction $\left(\right)close="">{H}_{i+1}\in {T}_{{x}_{i+1}}M$, we need to transport H_{ i }and G_{ i }, which are tangent to x_{ i }, to the tangent space $\left(\right)close="">{T}_{{x}_{i+1}}M$. This is done via parallel transport along the geodesic γ, which we denote by
The updated search direction is now chosen according to a Riemannian adaption of the HestenesStiefel, the PolakRibière, or the FletcherReeves update. Here, we choose a different formulation that performs slightly better in our situation than the afore mentioned ones, namely
Albeit the nice performance in applications, convergence analysis of CG methods on smooth manifolds is still an open problem. Partial convergence results for CGmethods on manifolds can be found in [27, 28] and a recent result in [29].
As it is clear from the above, the first step towards formulating a CG algorithm for minimizing the cost function g_{2} is to compute its Riemannian gradient. Let us denote by $\hat{{g}_{2}}$ the continuation of g_{2} to the embedding space ${\mathbb{C}}^{m\times m\times m\times k}$. Following the computation in Equation (27), we have the Euclidean gradient of $\hat{{g}_{2}}$ at $\mathcal{P}\in {\mathcal{Q}}^{k}\left(m\right)$, i.e. $\nabla \hat{{g}_{2}}\left(\mathcal{P}\right):=({\mathbf{J}}_{1},\dots ,{\mathbf{J}}_{k})$, for each element ${J}_{ip}\in {\mathbb{C}}^{m\times m}$, as
By projecting it onto the tangent space ${T}_{\mathcal{P}}{\mathcal{Q}}^{k}\left(m\right)$, we get the Riemannian gradient of g_{2}at $\mathcal{P}\in {\mathcal{Q}}^{k}\left(m\right)$, i.e. $\text{grad}\phantom{\rule{0.3em}{0ex}}\phantom{\rule{0.3em}{0ex}}{g}_{2}\left(\mathcal{P}\right):=({\mathbf{G}}_{1},\dots ,{\mathbf{G}}_{k})\in {T}_{\mathcal{P}}{\mathcal{Q}}^{k}\left(m\right)$, for each element ${G}_{ip}\in {T}_{{P}_{\mathrm{ip}}}{\mathbb{C}\mathbb{P}}^{m1}$, as
The above formula for the Riemannian gradient now allows to implement the geometric CG algorithm for minimizing the function g_{2}as define in (14) in a straightforward way. A pseudo code is provided in Algorithm 1.
Algorithm 1 A CG IVA algorithm
Input: A set of matrices $\left\{{C}_{ij}^{\left(t\right)}\right\}\subset {\mathbb{C}}^{m\times m}$ for i,j=1,…,n;
Step 1: Generate an initial guess ${\mathcal{P}}^{\left(0\right)}=[{\mathbf{P}}_{1}^{\left(0\right)}\dots ,{\mathbf{P}}_{k}^{\left(0\right)}]\in {\mathcal{Q}}^{k}\left(m\right)$ and set i=1;
Step 2: Compute ${\mathcal{G}}^{\left(1\right)}={\mathcal{H}}^{\left(1\right)}=[{\mathbf{H}}_{1},\dots ,{\mathbf{H}}_{k}]\leftarrow \mathrm{grad}{g}_{2}\left({\mathcal{P}}^{\left(0\right)}\right)$ using Equation (40);
Step 3: Set i=i + 1;
Step 4: Update ${\mathcal{P}}^{i+1}\leftarrow \left({\gamma}_{{\mathbf{P}}_{1},{\mathbf{H}}_{1}}\left({\lambda}_{i}\right),\dots ,{\gamma}_{{\mathbf{P}}_{k},{\mathbf{H}}_{k}}\left({\lambda}_{i}\right)\right)$, where λ_{ i }is computed (36);
Step 5: Update ${\mathcal{H}}^{(i+1)}\leftarrow {\mathcal{G}}^{(i+1)}+{\gamma}_{i}\phantom{\rule{1em}{0ex}}{\tau}_{{\mathcal{P}}^{\left(i\right)},{\mathcal{H}}^{i}}\left({\lambda}_{i}\right)$, where
and γ_{ i }is chosen according to Equation (38);
Step 6: If i mod (2km(m−1)−1)=0, set ${\mathcal{H}}^{(i+1)}\leftarrow {\mathcal{G}}^{(i+1)}$;
Step 7: If $\u2225{\mathcal{G}}^{(i+1)}\u2225$ is small enough, stop. Otherwise, go to Step 3;
7 Numerical experiments
In our experiment, we investigate the performance of our method in terms of both local convergence property and accuracy of estimating the joint diagonalizers.
7.1 Experiment one
The First task of our experiment is to jointly diagonalize two sets of complex matrices, $\left(\right)close="">{\left\{{C}_{ij}^{\left(t\right)}\right\}}_{ij}$ and $\left(\right)close="">{\left\{{R}_{ij}^{\left(t\right)}\right\}}_{ij}$, which are constructed by
where the matrices A_{ i }∈Gl(m)are randomly picked, both real and imaginary parts of the diagonal entries of $\left(\right)close="">{\Omega}_{ij}^{\left(t\right)}$ and ${\hat{\Omega}}_{ij}^{\left(t\right)}$ are drawn from a uniform distribution on the interval (0,10), the matrices ${N}_{ij}^{H}\in {\mathbb{C}}^{m\times m}$ and ${N}_{ij}^{S}\in {\mathbb{C}}^{m\times m}$ are a Hermitian and a complex symmetric matrix, respectively, whose real and imaginary parts are generated from a uniform distribution on the unit interval (−0.5,0.5), representing additive stationary noise, and $\u03f5\in \mathbb{R}$ is the noise level.
In our experiments, we set m=3, k=3, n=3. First of all, we choose the noise level ϵ=0. A typical local convergence curve of our proposed algorithm is shown in Figure 1. A tendency of superlinear convergence can be observed.
In order to investigate the performance of the proposed algorithm in terms of estimation accuracy, we restrict ϵ∈{0.1,0.5,1.0}, and run 50 tests. The performance index is chosen to be the averaged Amari error, proposed in [30]. Generally, the smaller the Amari error, the better the separation. The quartile based boxplot of averaged Amari errors of our proposed algorithm against three different noise levels are drawn in Figure 2. Our CG algorithm demonstrates its correspondingly delaying performance with the increasing noise levels.
7.2 Experiment two
In this experiment, we compare our CG based IVA approach, referred to as IVACG, with two secondorder statistics based IVA algorithms. We refer to one contrast optimization based IVA algorithm as IVACO, cf. [5, 6], and the other matrix joint diagonalization based approach as IVAJD, cf. [13]. The task of this experiment is to separate two groups of complex valued signals. We take three real audio source signals with 480,000 samples, and apply the short time Fourier transform to the sources with the number of FFT points being 1,024. By doing so, we end up with a complex IVA problem with 513 groups of statistically dependent complex signals.
For a practical implementation of our method, note that computing and jointly diagonalizing all possible cross covariance and pseudo covariance matrices between the 513 groups is prohibitively expensive. We overcome this issue by only taking two neighboring frequency bins randomly at one time. The sources from each frequency bin are mixed independently via multiplying a mixing matrix, whose entries are drawn from a normal distribution. We run the experiment 100 times, and plot the boxplot of averaged Amari errors of the three studied algorithms in Figure 3. It depicts clearly that our proposed IVACG algorithm outperforms the other two consistently.
8 Conclusion
We propose a matrix joint diagonalization approach to solve the complex IVA problem which does not rely on a prewhitening step nor on the estimation of the unknown distribution of the sources. A mathematical setting is derived that allows a formulation without ambiguity on the set of unknown parameters, i.e. the dimension of the search space is maximally reduced. This leads in a natural way to a smooth manifold structure that we call complex oblique projective manifold, due to its close relation to the oblique manifold which consists of invertible matrices with normalized columns. We propose to solve the complex IVA problem via minimizing a cost function that is based on the wellknown offnorm function for measuring joint diagonality. We show that our setting leads to a nondegenerate Hessian for the solution of the IVA problem. This is an important result for the design of minimization methods, since in many cases, the speed of convergence relies on the nondegeneracy of the minima. We develop a geometric CG method for solving the IVA problem and conclude by providing some numerical experiments.
Endnote
^{a}Note, that $\mathcal{Q}\left(m\right)$ is not a geodesically complete manifold.
References
 1.
Cardoso JF: Multidimensional independent component analysis. In Proceedings of the 23rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4. (Seattle, WA, USA; 1998).
 2.
Hyvärinen A, Hoyer PO: Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 2000, 12(7):17051720. 10.1162/089976600300015312
 3.
Araki S, Mukai R, Makino S, Nishikawa T, Saruwatari H: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process 2003, 11(2):109116. 10.1109/TSA.2003.809193
 4.
Lee I, Kim T, Lee TW: Independent vector analysis for convolutive blind speech separation. In Blind Speech Separation, Signals and Communication Technology. Edited by: Makino S, Lee TW, Sawada H. (Springer, Netherlands; 2007).
 5.
Anderson M, Li XL, Adalı T: Complexvalued independent vector analysis: application to multivariate Gaussian model. Signal Process 2012, 92(8):18211831. 10.1016/j.sigpro.2011.09.034
 6.
Anderson M, Adalı T, Li XL: Joint blind source separation with multivariate Gaussian model: algorithms and performance analysis. IEEE Trans. Signal Process 2012, 60(4):16721683.
 7.
Kim T: Realtime independent vector analysis for convolutive blind source separation. IEEE Trans. Circ. Syst. I: Regular Papers 2010, 57(7):14311438.
 8.
Hao J, Lee I, Lee TW, Sejnowski TJ: Independent vector analysis for source separation using a mixture of Gaussians prior. Neural Comput 2010, 22(6):16461673. 10.1162/neco.2010.1108906
 9.
Lee I, Jang GJ: Independent vector analysis based on overlapped cliques of variable width for frequencydomain blind signal separation. EURASIP J. Adv. Signal Process 2012, 113: 112.
 10.
Bermejo S: Finite sample effects in higher order statistics contrast functions for sequential blind source separation. IEEE Signal Process. Lett 2005, 12(6):481484.
 11.
Ghennioui H, Fadaili EM, ThirionMoreau N, Adib A, Moreau E: A nonunitary joint block diagonalization algorithm for blind separation of convolutive mixtures of sources. IEEE Signal Process. Lett 2007, 14(11):860863.
 12.
Ghennioui H, ThirionMoreau N, Moreau E, Aboutajdine D: Gradientbased joint block diagonalization algorithms: application to blind separation of FIR convolutive mixtures. Signal Process 2010, 90(6):18361849. 10.1016/j.sigpro.2009.12.002
 13.
Li XL, Adalı T, Anderson M: Joint blind source separation by generalized joint diagonalization of cumulant matrices. Signal Process 2011, 91(10):23142322. 10.1016/j.sigpro.2011.04.016
 14.
Shen H, Kleinsteuber M: A matrix joint diagonalization approach for complex independent vector analysis. In Proceedings of the 10th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), vol. 7191 Lecture Notes in Computer Science. Edited by: Theis F, Cichocki A, Yeredor A, Zibulevsky M. (SpringerVerlag, Berlin/Heidelberg; 2012).
 15.
Shen H, Kleinsteuber M: Complex blind source separation via simultaneous strong uncorrelating transform. In Lecture Notes in Computer Science, Proceedings of the 9th International Conference on Latent Variable Analysis and Signal Separation, vol. 6365. (SpringerVerlag, Berlin/Heidelberg; 2010).
 16.
Smaragdis P: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 1998, 22: 2134. 10.1016/S09252312(98)000472
 17.
Comon P, Jutten C: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press Inc, San Diego, USA; 2010.
 18.
Makino S, Lee TW, Sawada H: Blind Speech Separation Signals and Communication Technology. Springer, Netherlands; 2007.
 19.
Hosseini S, Deville Y, Saylani H: Blind separation of linear instantaneous mixtures of nonstationarysignals in the frequency domain. Signal Process 2009, 89: 819830. 10.1016/j.sigpro.2008.10.024
 20.
Maehara T, Murota K: Simultaneous singular value decomposition. Linear Alg. Appl 2011, 435: 106116. 10.1016/j.laa.2011.01.007
 21.
Absil PA, Gallivan KA: Joint diagonalization on the oblique manifold for independent component analysis. In Proceedings of the 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5. (Toulouse, France; 2006).
 22.
Afsari B: Sensitivity analysis for the problem of matrix joint diagonalization. SIAM J. Matrix Anal. Appl 2008, 30(3):11481171. 10.1137/060655997
 23.
Absil PA, Mahony R, Sepulchre R: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ; 2008.
 24.
Helmke U, Hüper K, Trumpf J: Newton’s method on Graßmann manifolds. 2007.
 25.
Nocedal J, Wright SJ: Numerical Optimization. Springer, New York; 2006.
 26.
Kleinsteuber M, Hüper K: An intrinsic CG algorithm for computing dominant subspaces. In Proceedings of the 32nd IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP), vol. 4. (Hawaii, USA; 2007).
 27.
Smith ST: Optimization techniques on Riemannian manifolds. In Hamiltonian and Gradient Flows, Algorithms and Control, Fields Institute Communications, vol. 3. Edited by: Bloch A. American Mathematical Society, Providence, RI; 1994).
 28.
Gabay D: Minimizing a differentiable function over a differential manifold. J. Optimiz. Theory Appl 1982, 37(2):177219. 10.1007/BF00934767
 29.
Ring W, Wirth B: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optimiz 2012, 22(2):596627. 10.1137/11082885X
 30.
Amari SI, Cichocki A, Yang HH: A new learning algorithm for blind signal separation. In Advances in Neural Information Processing Systems (NIPS), vol. 8. Edited by: Touretzky DS, Mozer MC, Hasselmo ME. (The MIT Press, Cambridge, MA, USA; 1996).
Acknowledgements
This study had been supported by the Cluster of Excellence CoTeSys—Cognition for Technical Systems, funded by the Deutsche Forschungsgemeinschaft (DFG).
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Received
Accepted
Published
DOI
Keywords
 Conjugate Gradient
 Independent Component Analysis
 Conjugate Gradient Method
 Independent Component Analysis
 Parallel Transport