Skip to main content

Non-unitary matrix joint diagonalization for complex independent vector analysis


Independent vector analysis (IVA) is a special form of independent component analysis (ICA), which has demonstrated its prominent performance in solving convolutive blind source separation (BSS) problems in the frequency domain. Most IVA algorithms are based on optimizing certain contrast functions, where the main difficulty of these approaches lies in finding a reliable and fast estimation of the unknown distribution of sources. Despite the rich availability of efficient tensorial approaches to the standard ICA problem, these methods have not been explored considerably for IVA. In this article, we propose a matrix joint diagonalization approach to solve the complex IVA problem. The new factorization neither relies on a whitening process, nor does it require an estimate of the joint probability distribution of the dependent signal groups. The latter is in contrast to most IVA approaches up to date. The underlying geometry of the problem is investigated together with a critical point analysis of the resulting cost function. A conjugate gradient algorithm on the appropriate manifold setting is developed.

1 Introduction

Independent component analysis (ICA) is a standard statistical tool for solving the blind source separation (BSS) problem. BSS aims to recover source signals from the observed mixtures, without knowing either the distribution of the sources or the mixing process. Application of the standard ICA model is often limited, since it requires mutual statistical independence between all individual components. However, in many applications, there exist groups of signals of interest, where components from different groups are mutually statistically independent indeed, but where mutual statistical dependence occurs between components in the same group. Such problems can be tackled by a technique now referred to as multidimensional independent component analysis (MICA) [1], or independent subspace analysis (ISA) [2].

A special form of ISA arises in solving the BSS problem with convolutive mixtures [3]. After transferring the convolutive observations into the frequency domain via short-time Fourier transforms, the convolutive BSS problem results in a collection of instantaneous complex BSS problems in each frequency bin. After solving the subproblems individually, the final stage faces the challenge of aligning all statistically dependent components from different groups, which is referred to as the permutation problem. To avoid this problem, a new approach named independent vector analysis (IVA) has been proposed in [4]. Besides its application in convolutive BSS problem, IVA has also recently been applied to analyze multivariate Gaussian models, cf. [5, 6]. In the current literature, the majority of IVA algorithms are based on optimizing certain contrast functions, cf. [5, 79]. The main difficulty of these contrast function based approaches lies in estimating the unknown distribution of the sources, which usually requires a large number of observations [10].

On the other hand, tensorial approaches are efficient and richly available to solve both the ICA and ISA problems. In particular, joint block diagonalization approaches are shown to be effective methods for solving the ISA problem, cf. [11, 12], and are inherently applicable to IVA. However, such general joint block diagonalization approaches do not take the intrinsic structure of the IVA problem into account. Recent study in [13] proposes a joint diagonalization approach of cross cumulant matrices to solve the complex IVA problem. More recently, the present authors have developed a similar approach of jointly diagonalizing both cross covariance and cross pseudo covariance matrices, cf. [14]. In this article, we extend the previous study in [14], and adapt the so-called complex oblique projective (COP) manifold, which has proven to be an appropriate setting for the standard instantaneous complex ICA problem [15], to the current scenario. Finally, an efficient conjugate gradient (CG) based IVA algorithm is proposed, and numerical experiments are provided to demonstrate the convergence properties of the proposed CG algorithm, and to compare its performance with two recently developed IVA algorithms in terms of separation quality.

2 Notations

Throughout the article, (·) denotes the matrix transpose, (·)H the Hermitian transpose, ( · ) ¯ the entry-wise complex conjugate of a matrix, and by Gl(m) the set of all m×m invertible complex matrices. The Frobenius norm of a matrix A C m × n is denoted by A F := t r ( A A H ) , where tr(·) is the trace of a square matrix. Given a square matrix Z C m × m , ddiag(Z)forms a diagonal matrix whose diagonal entries are those of Z, and off(Z) generates a matrix by setting all diagonal entries of Z to zero, i.e. off(Z):=Z−ddiag(Z).

In this study, we consider an m-dimensional complex signal s(t)= [ s 1 ( t ) , , s m ( t ) ] C m as an m-dimensional complex stochastic process indexed by the variable t. The empirical expectation of a random variable s is denoted by E[s(t)]= 1 T t = 1 T s(t), where T is the number of samples. As usual for the standard ICA model, we assume without loss of generality that E[s(t)]=0. The empirical covariance and pseudo-covariance matrix of complex signals s(t) are referred to as cov(s(t)):=E[s(t)s ( t ) H ] and pcov(s(t)):=E[s(t)s ( t ) ], respectively.

3 Problem description

It is known that convolutive BSS problems can be transformed into in the frequency domain, and can be solved as instantaneous complex BSS problems for every frequency simultaneously, when the demixing filter is sufficiently longer than the mixing filter, cf. [16, 17]. In this study, we consider the spectral time-frequency representation of a signal in terms of a short-time Fourier transformation that is centered at time t. Let w i (t,f)C and s i (t,f)C denote the coefficient of the center frequency f of the i th observation w i (t) and the i th source signal s i (t), respectively. Then, for a given pair (t, f), the Fourier coefficients of the observations and the sources obey the equality

w(t,f)= A f s(t,f),

where w(t,f):= [ w 1 ( t , f ) , , w m ( t , f ) ] C m , s(t,f):= [ s 1 ( t , f ) , , s m ( t , f ) ] C m , and A f C m × m serves as a complex mixing matrix. More compactly, for a fixed frequency f , we get a standard instantaneous complex BSS problem as

W(f)= A f S(f),

where W(f)[w(1,f),,w(T,f)] C m × T and S(f)[s(1,f),,s(T,f)] C m × T , with T being the number of chosen time frames. One popular approach to solve the convolutive BSS problem is to solve the individual instantaneous BSS problem at each frequency (2), and then assemble the results from each frequency to reconstruct the estimated signal in the time domain [18].

Let us denote the rows of S(f)by s i (f)=[ s i (1,f),, s i (T,f)] C 1 × T for i = 1,…,m. Following the assumption of statistical independence between the sources, the complex valued signals s i (f) and s j (f) are statistically independent for ij. In contrast, we assume that for a pair of frequencies (f p ,f q )with f p f q , the complex signals s i (f p ) and s i (f q ) are statistically dependent for a given source. The development of IVA is inspired by this cross frequency structure. It aims to find a set of demixing matrices { X f }Gl(m) via

Y(f)= X f H W(f),

such that

  1. (1)

    all sub-ICA problems are solved, and

  2. (2)

    the statistical alignment between groups is restored, i.e. the estimated i th signals {y i (f)} are mutually statistically dependent.

The main idea for our approach is to exploit the cross covariance matrices between groups of observations defined as

cov ( W ( f i ) , W ( f j ) ) : = 1 T t = 1 T w ( t , f i ) w ( t , f j ) H = A ( f i ) 1 T t = 1 T s ( t , f i ) s ( t , f j ) H = : cov ( S ( f i ) , S ( f j ) ) A ( f j ) H .

Similarly, the so-called pseudo cross covariance, defined as

pcov ( W ( f i ) , W ( f j ) ) : = 1 T t = 1 T w ( t , f i ) w ( t , f j ) = A ( f i ) pcov ( S ( f i ) , S ( f j ) ) A ( f j ) ,

also allows to gain additional information about the second-order statistics of the involved signals. In this study, we assume that cross covariances between sources in all groups do not vanish. The assumption of statistical independence between the source signals implies that the cross covariance matrix cov(S(f i ),S(f j )) and the pseudo cross covariance matrix pcov(S(f i ),S(f j ))are diagonal for all pairs (i j). With a further assumption on the sources being non-stationary, which has been exploited in [19], we arrive at a problem of jointly diagonalizing two sets of cross covariance and pseudo cross covariance matrices at different time instances.

To summarize, we are interested in solving the following problem. For a complex IVA problem with k subproblems, we consider the cross covariance and pseudo cross covariance matrices at n time instances, i.e. for all i,j=1,…,k and t=1,…,n, a set of matrices { C i j ( t ) } i < j and a set of complex symmetric matrices { R i j ( t ) } i < j , which are constructed by

C i j ( t ) = A i Ω i j ( t ) A j H and R i j ( t ) = A i Ω ~ i j ( t ) A j ¯ ,

where Ω i j ( t ) , Ω ~ i j ( t ) C m × m are diagonal. The task is to find a set of matrices { X i } i = 1 k Gl(m) such that

X i H C i j ( t ) X j and X i H R i j ( t ) X ¯ j ,

for all i < jand t = 1,…,n, are simultaneously, or approximately simultaneously diagonalized. In this study, we study the noise free IVA problem as defined in (2), and neglect the cross covariance matrix estimation errors due to the finite sample size effect. In other words, we assume that both sets of C i j ( t ) ’s and R i j ( t ) ’s are jointly diagonalizable.

Note that the above problem is similar to the simultaneous SVD formulation proposed in [20], where only the situation with two transform matrices is studied, i.e. k = 2. To the contrary, our current setting deals with the cases of multiple transform matrices {X i }i=1,…,k, which are not restricted to be unitary. Finally, instead of considering second order cross covariance matrices, our developed approach can be generalized to the high order cross cumulants. We refer to [17] for further details.

4 Diagonality measure and the COP manifold

Our cost function to tackle problem (7) originates from the popular off-norm function that measures the squared Frobenius norm of the off-diagonal entries of the involved matrices. We develop an appropriate mathematical setting on the subsequently defined complex oblique projective (COP) manifold to provide its critical point analysis.

4.1 Derivation of the cost function

For legibility reasons, from now on, we only consider the problem of simultaneously diagonalizing the covariance matrices, i.e. the first condition in (7). The combination with the additional requirement that also the pseudo cross covariance matrices may be used for estimating the demixing matrix is straightforwardly adapted to our setting and not further discussed here.

Let us define the off-norm function as

g : ( Gl ( m ) ) k R , g ( X 1 , , X k ) : = i < j k t = 1 n 1 2 off ( X i H C i j ( t ) X j ) F 2 .

Due to the noise-free assumption and since we neglect finite sample size effects, the set of joint diagonalizers ( X 1 , , X k ) of the C i j ( t ) in Equation (7) is a global minimum of g, that is g ( X 1 , , X k ) = 0 . It is clear that a minimization approach without further constraints on the X i would drive all diagonalizers to zero. In order to avoid such trivial solutions and to regularize the minimization problem, the authors in [21] propose to restrict all columns of transform matrices to have unit norm. This set is known as the oblique manifold, which has been shown to be an appropriate setting for matrix diagonalization, cf. [22]. Its complex counterpart is the so-called complex oblique manifold

Ob(m):= X G l ( m ) ddiag ( X H X ) = I m

and we denote by Obk(m) the product manifold of k copies of Ob(m). The restriction of the off-norm cost function (8) is denoted by

g 1 : O b k ( m ) R , g 1 ( X 1 , , X k ) : = i < j k t = 1 n 1 2 off ( X i H C i j ( t ) X j ) F 2 .

Now denote the p th column of X i by x ip . It is obvious that the function g1 is invariant with respect to the phase difference of each column x ip , which reflects the well-known scaling ambiguity of complex ICA problems. By a further calculation, g1 has the form

g 1 ( X 1 , , X k ) = 1 2 i < j k t = 1 n p q m x i p H C i j ( t ) x j q 2 = 1 2 i < j k t = 1 n p q m x i p H C i j ( t ) x j q x i p H C i j ( t ) x j q H = 1 2 i < j k t = 1 n p q m tr x i p x i p H C i j ( t ) x j q x j q H C i j ( t ) H .

Instead of fixing a phase for each x ip , in this study we employ an elegant mathematical setting for the problem. Recall the fact that each x i p x i p H defines a Hermitian rank-one projector, the set of which identifies the (m − 1)- dimensional complex projective space C P m 1 , i.e.

C P m 1 = P C m × m P H = P , P 2 = P , tr ( P ) = 1 .

By doing so for each column and by maintaining the fact that the columns of X i form a complex basis (i.e. invertibility of X i ), we naturally arrive at the following set, which we refer to as the complex oblique projective (COP) manifold,

Q(m):= P 1 , , P m P i C P m 1 , det i = 1 m P i > 0 .

The off-norm cost function g1 now induces the following function g2 on the COP manifold. Namely, if Q k (m) denotes the k-times product of Q(m), g2 is given by

g 2 : Q k ( m ) R , g 2 ( P 1 , , P k ) : = i < j k t = 1 n p q m tr P i p C i j ( t ) P j q C i j ( t ) H ,

with P i :=(Pi 1,…,P im ).

4.2 The geometry of the complex oblique projective manifold

In this section, we recall some basic facts and concepts that are necessary for developing a Riemannian CG algorithm on the COP manifold, cf. [23]. In particular, we require a formula for the parallel transport and the geodesics of the COP manifold. We endow Q(m) with the standard Riemannian metric

( A 1 ,, A m ),( B 1 ,, B m ):= i tr( A i B i ),

inherited from the Euclidean metric of the m-fold product of Hermitian matrices. With this, Q(m) is an open and dense Riemannian submanifold of the m-times product of C P m 1 with the standard metric, i.e.

Q ( m ) ¯ =: C P m 1 m ,

where Q ( m ) ¯ denotes the closure of Q(m). Accordingly, the tangent spaces, the geodesics, and the parallel transport for Q(m) and ( C P m 1 ) m coincide locally and thus are easily derived from the geometry of C P m 1 . We refer to [24] for further discussions and details about C P m 1 .

Let us denote by

u(m):= Ω C m × m Ω = Ω H

the set of skew-Hermitian matrices. The tangent space at P in C P m 1 is given by

T P C P m 1 = [ P , Ω ] Ω u ( m )

where [A,B]:=ABBA is the matrix commutator. Then, the tangent space at P=( P 1 ,, P m )Q(m) is simply the Cartesian product

T P Q(m) T P 1 C P m 1 ×× T P m C P m 1 .

With the above metric, the geodesics through P C P m 1 in direction Z T P C P m 1 are given by

γ P , Z :R C P m 1 , γ P , Z (t):= e t [ Z , P ] P e t [ Z , P ] ,

where e(·)denotes the matrix exponential. Thus, the (locala) geodesic through PQ(m) in direction Z:=( Z 1 ,, Z m ) T P Q(m) is given by

γ P , Z (t)= γ P 1 , Z 1 ( t ) , , γ P m , Z m ( t ) .

The parallel transport of Ψ:=( Ψ 1 ,, Ψ m ) T P Q(m) with respect to the Levi-Civita connection along the geodesic γP,Z(t) is

τ P , Z (Ψ):= τ P 1 , Z 1 ( Ψ 1 ) , , τ P m , Z m ( Ψ m ) ,

with τP,Z being the parallel transport of Ψ T P C P m 1 with respect to the Levi-Civita connection along the geodesic γP,Z(t), i.e.

τ P , Z (Ψ)= e [ Z , P ] Ψ e [ Z , P ] .

The natural or Riemannian gradient of a function that is the restriction of some globally defined function to a sub-manifold is simply the orthogonal projection of the Euclidean gradient onto the corresponding tangent space. For the complex projective space, this projection is given by

π P : C m × m T P C P m 1 ,A[P,[P, 1 2 (A+ A H )]].

It is easily seen that the operator π P is an orthogonal projector on the tangent space T p C P m 1 , i.e. that π P π P (A)=π P (A) and that the null space of π P is orthogonal to its image. Here, denotes the composition of functions. The formulas for the tangent spaces, the geodesics, the parallel transport, and the projection onto the tangent spaces of Q k (m) follow directly from the product manifold structure.

5 Critical point analysis of the cost function

In this section, we conduct a critical point analysis of the cost function g2 on the product COP manifold. We show that the joint diagonalizers are a non-degenerate global minimum of g2, This is an important fact, since in many cases the speed of convergence relies on the non-degeneracy of the minima. First of all, we present a lemma which originates from the derivation of the cost from the off-norm function.

Lemma 1

Let us assume that all C i j ( t ) ’s are jointly diagonalizable. If ( X 1 , , X k ) O b k ( m ) minimizes the cost function g1, as defined in (10), i.e. X i H C i j ( t ) X j = D i j ( t ) = diag ( d i j 1 ( t ) , , d ijm ( t ) ) being diagonal for all t=1,…,n and i,j=1,…,k, then the set of corresponding Hermitian projectors P =( P 1 ,, P k ) Q k (m) with P ip := x ip x ip H C P m 1 minimizes the cost function g2, defined in (14) and

P i p C i j ( t ) P j q = 0 , for p q , P i p C i j ( t ) P j q = d i j p ( t ) P i p , for p = q .

The above lemma follows directly from the condition of X i H C i j ( t ) X j being diagonal, i.e. its (p,q)th entry is computed as

x i p H C i j ( t ) x i q = 0 , for p q , d i j p ( t ) , for p = q .

Now, let P:=( P 1 ,, P m ) Q k (m) be arbitrary. We compute the first derivative of g2at P Q k (m) in direction Z:=( Z 1 ,, Z k ) T P Q k (m) as

D g 2 ( P ) Z = i < j k p q m t = 1 n tr Z i p C i j ( t ) P j q C i j ( t ) H + tr P i p C i j ( t ) Z j q C i j ( t ) H .

By recalling the structure of the tangent space of Q k (m) and the result in Lemma 1, it is trivial to see that the first derivative of g2 vanishes at P , which corresponds to the correct joint diagonalizers.

The remainder of this section addresses the characterization of the Hessian of g2at the joint diagonalizers. To that end, we denote by off(m):={Z C m × m | z ii =0,i=1,…,m}the set of matrices with zero diagonal. Let Πbe the natural projection

Π:Ob(m)Q(m),Π(X)=( x 1 x 1 H ,, x m x m H )

and let μ X be defined as

μ X : o f f ( m ) O b ( m ) , Z X ( I m + Z ) diag 1 X ( e 1 + z 1 ) , , 1 X ( e m + z m ) ,

where e j denotes the j th standard basis vector. Note, that μ X defines a locally injective but not bijective mapping. The composition of Π and μ X , however, yields a local diffeomorphism. With the shorthand notation P := Π(X), the mapping

ϕ P :off(m)Q(m),ZΠ μ X (Z)

is a local parametrization around P and thus permits a local parameterization of Q k (m) via

Φ P : o f f k ( m ) Q k ( m ) , ( Z 1 , , Z k ) ( ϕ P 1 ( Z 1 ) , , ϕ P k ( Z k ) ) ,

with Φ P (0)=P:=( P 1 ,, P k ). The associated tangent map T Φ P is given as

T Φ P : T o f f k ( m ) o f f k ( m ) T P Q k ( m ) , ( Θ 1 , , Θ k ) x 11 ξ ( x 11 ) θ 11 H + θ 11 ξ ( x 11 ) x 11 H , , x k m ξ ( x k m ) θ k m H + θ k m ξ ( x k m ) x k m H ,

where ξ ( x ip ) : = I m x ip x ip H is the orthogonal projection operator onto the complement space of x ip . Let Z:=T Φ P (Θ) T P Q k (m). Then, we can compute the Hessian of g2 at the critical points P Q k (m), i.e. the symmetric bilinear form H g 2 ( P ): T P Q k (m)× T P Q k (m)R via

H g 2 ( P ) ( Z , Z ) = H g 2 ( P ) ( T Φ P ( Θ ) , T Φ P ( Θ ) ) = d 2 d t 2 ( g 2 Φ P ) ( t Θ ) t = 0 = i < j k p q m t = 1 n 2 tr Z i p C i j ( t ) Z j q C i j ( t ) H = i < j k p q m t = 1 n 2 d i j p ( t ) θ p q ( i ) + d i j q ( t ) θ q p ( j ) 2 .

The last equality holds by following the results in Lemma 1, i.e. X i H C i j ( t ) X j = D i j ( t ) , which is equivalent to C i j ( t ) = X i H D i j ( t ) X j 1 and the fact that Z ip = ( I m P i 1 ) X i θ ip . It can easily been seen that the Hessian form (33) is positive definite if and only if all (2×2)-matrices

t = 1 n | d i j p ( t ) | 2 d i j q ( t ) d i j p ( t ) ¯ d i j p ( t ) d i j q ( t ) ¯ | d i j q ( t ) | 2

are positive definite. Since this is a generic assumption on the data, we have the following result.

Theorem 1

Generically, the global minimizer of the cost function g2is non-degenerate.

6 A CG IVA algorithm

In this section, we introduce a general form of CG algorithms on matrix manifolds. After computing the Riemannian gradient of the cost function on the COP manifold, we develop a CG based IVA algorithm. The CG methods on matrix manifolds are shortly reviewed here. They form the backbone of the algorithm for our optimization problem on the COP manifold and explain the use of the differential geometric concepts derived in the previous sections. For an in-depth introduction on optimization on matrix manifolds, we refer the interested reader to [23].

Let M be a submanifold of some Euclidean space with inner product 〈·,·〉 and let f:MR be smooth. The CG method is initialized by some x0M and the descent direction H0:=−gradf(x0)given by the Riemannian gradient. If f is the restriction of a globally defined function f ̂ to M, the Riemannian gradient is just the orthogonal projection of the gradient of f ̂ to the tangent space, i.e.

gradf(x)= π x f ̂ ( x ) ,

where f ̂ (x) denotes the Euclidean gradient of f ̂ , and π x is the orthogonal projection onto T x M. Subsequently, sweeps are iterated that consist of two steps, a line search in a given direction (i.e. along a geodesic in that direction) followed by an update of the search direction. Several different possibilities for these steps lead to different CG methods. Assume now that x i , H i , and G i :=grad f(x i ) are given.

Given a geodesic γ i with γ i (0)=x i and γ ̇ i (0)= H i , the line search aims to find λ i R that minimizes fγ:tR. A generic approach for the step-size selection is a Riemannian adaption to the backtracking line search and several modifications, cf. [23, 25]. Here, we present a closed form solution for the step-size selection that works particularly well for our problem due to the quadratic nature of our cost function, cf. [26]. It is based on the assumption that a one-dimensional Newton step along fγyields a good approximation for its minimizer. Explicitly, we choose the step-size as

λ i := d f γ ( λ ) | λ = 0 d 2 d λ 2 f γ ( λ ) | λ = 0 .

The absolute value in the denominator is chosen for the following reason. While being an unaltered one-dimensional Newton step in a neighborhood of a minimum the step size is the negative of a regular Newton step if d 2 d t 2 (fγ)(λ) λ = 0 <0 and thus yields non-attractiveness for critical points that are not minima.

In order to compute the new search direction H i + 1 T x i + 1 M , we need to transport H i and G i , which are tangent to x i , to the tangent space T x i + 1 M . This is done via parallel transport along the geodesic γ, which we denote by

τ: T x i M T x i + 1 M.

The updated search direction is now chosen according to a Riemannian adaption of the Hestenes-Stiefel, the Polak-Ribière, or the Fletcher-Reeves update. Here, we choose a different formulation that performs slightly better in our situation than the afore mentioned ones, namely

γ i : = G i + 1 , G i + 1 τ G i H i , G i .

Albeit the nice performance in applications, convergence analysis of CG methods on smooth manifolds is still an open problem. Partial convergence results for CG-methods on manifolds can be found in [27, 28] and a recent result in [29].

As it is clear from the above, the first step towards formulating a CG algorithm for minimizing the cost function g2 is to compute its Riemannian gradient. Let us denote by g 2 ̂ the continuation of g2 to the embedding space C m × m × m × k . Following the computation in Equation (27), we have the Euclidean gradient of g 2 ̂ at P Q k (m), i.e. g 2 ̂ (P):=( J 1 ,, J k ), for each element J i p C m × m , as

J i p = j > i k p q m t = 1 n C i j ( t ) P i q C i j ( t ) H + j < i k p q m t = 1 n C i j ( t ) H P i q C i j ( t ) .

By projecting it onto the tangent space T P Q k (m), we get the Riemannian gradient of g2at P Q k (m), i.e. grad g 2 (P):=( G 1 ,, G k ) T P Q k (m), for each element G i p T P ip C P m 1 , as

G i p = [ P i p , [ P i p , j > i k p q m t = 1 n C i j ( t ) P i q C i j ( t ) H + j < i k p q m t = 1 n C i j ( t ) H P i q C i j ( t ) ] ] .

The above formula for the Riemannian gradient now allows to implement the geometric CG algorithm for minimizing the function g2as define in (14) in a straightforward way. A pseudo code is provided in Algorithm 1.

Algorithm 1 A CG IVA algorithm

Input: A set of matrices { C i j ( t ) } C m × m for i,j=1,…,n;

Step 1: Generate an initial guess P ( 0 ) =[ P 1 ( 0 ) , P k ( 0 ) ] Q k (m) and set i=1;

Step 2: Compute G ( 1 ) = H ( 1 ) =[ H 1 ,, H k ]grad g 2 ( P ( 0 ) ) using Equation (40);

Step 3: Set i=i + 1;

Step 4: Update P i + 1 γ P 1 , H 1 ( λ i ) , , γ P k , H k ( λ i ) , where λ i is computed (36);

Step 5: Update H ( i + 1 ) G ( i + 1 ) + γ i τ P ( i ) , H i ( λ i ), where

G ( i + 1 ) = grad g 2 ( P ( i ) ) ,

and γ i is chosen according to Equation (38);

Step 6: If i mod (2km(m−1)−1)=0, set H ( i + 1 ) G ( i + 1 ) ;

Step 7: If G ( i + 1 ) is small enough, stop. Otherwise, go to Step 3;

7 Numerical experiments

In our experiment, we investigate the performance of our method in terms of both local convergence property and accuracy of estimating the joint diagonalizers.

7.1 Experiment one

The First task of our experiment is to jointly diagonalize two sets of complex matrices, { C i j ( t ) } i < j and { R i j ( t ) } i < j , which are constructed by

C i j ( t ) = A i Ω i j ( t ) A j H +ϵ N i j H and R i j ( t ) = A i Ω ̂ i j ( t ) A j +ϵ N i j S

where the matrices A i Gl(m)are randomly picked, both real and imaginary parts of the diagonal entries of Ω i j ( t ) and Ω ̂ i j ( t ) are drawn from a uniform distribution on the interval (0,10), the matrices N i j H C m × m and N i j S C m × m are a Hermitian and a complex symmetric matrix, respectively, whose real and imaginary parts are generated from a uniform distribution on the unit interval (−0.5,0.5), representing additive stationary noise, and ϵR is the noise level.

In our experiments, we set m=3, k=3, n=3. First of all, we choose the noise level ϵ=0. A typical local convergence curve of our proposed algorithm is shown in Figure 1. A tendency of superlinear convergence can be observed.

Figure 1
figure 1

Convergence behavior of the proposed CG algorithm.

In order to investigate the performance of the proposed algorithm in terms of estimation accuracy, we restrict ϵ{0.1,0.5,1.0}, and run 50 tests. The performance index is chosen to be the averaged Amari error, proposed in [30]. Generally, the smaller the Amari error, the better the separation. The quartile based boxplot of averaged Amari errors of our proposed algorithm against three different noise levels are drawn in Figure 2. Our CG algorithm demonstrates its correspondingly delaying performance with the increasing noise levels.

Figure 2
figure 2

Performance of the proposed CG algorithm.

7.2 Experiment two

In this experiment, we compare our CG based IVA approach, referred to as IVA-CG, with two second-order statistics based IVA algorithms. We refer to one contrast optimization based IVA algorithm as IVA-CO, cf. [5, 6], and the other matrix joint diagonalization based approach as IVA-JD, cf. [13]. The task of this experiment is to separate two groups of complex valued signals. We take three real audio source signals with 480,000 samples, and apply the short time Fourier transform to the sources with the number of FFT points being 1,024. By doing so, we end up with a complex IVA problem with 513 groups of statistically dependent complex signals.

For a practical implementation of our method, note that computing and jointly diagonalizing all possible cross covariance and pseudo covariance matrices between the 513 groups is prohibitively expensive. We overcome this issue by only taking two neighboring frequency bins randomly at one time. The sources from each frequency bin are mixed independently via multiplying a mixing matrix, whose entries are drawn from a normal distribution. We run the experiment 100 times, and plot the boxplot of averaged Amari errors of the three studied algorithms in Figure 3. It depicts clearly that our proposed IVA-CG algorithm outperforms the other two consistently.

Figure 3
figure 3

Comparison of separation performance.

8 Conclusion

We propose a matrix joint diagonalization approach to solve the complex IVA problem which does not rely on a pre-whitening step nor on the estimation of the unknown distribution of the sources. A mathematical setting is derived that allows a formulation without ambiguity on the set of unknown parameters, i.e. the dimension of the search space is maximally reduced. This leads in a natural way to a smooth manifold structure that we call complex oblique projective manifold, due to its close relation to the oblique manifold which consists of invertible matrices with normalized columns. We propose to solve the complex IVA problem via minimizing a cost function that is based on the well-known off-norm function for measuring joint diagonality. We show that our setting leads to a non-degenerate Hessian for the solution of the IVA problem. This is an important result for the design of minimization methods, since in many cases, the speed of convergence relies on the non-degeneracy of the minima. We develop a geometric CG method for solving the IVA problem and conclude by providing some numerical experiments.


aNote, that Q(m) is not a geodesically complete manifold.


  1. Cardoso JF: Multidimensional independent component analysis. In Proceedings of the 23rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4. (Seattle, WA, USA; 1998).

    Google Scholar 

  2. Hyvärinen A, Hoyer PO: Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 2000, 12(7):1705-1720. 10.1162/089976600300015312

    Article  Google Scholar 

  3. Araki S, Mukai R, Makino S, Nishikawa T, Saruwatari H: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process 2003, 11(2):109-116. 10.1109/TSA.2003.809193

    Article  Google Scholar 

  4. Lee I, Kim T, Lee TW: Independent vector analysis for convolutive blind speech separation. In Blind Speech Separation, Signals and Communication Technology. Edited by: Makino S, Lee TW, Sawada H. (Springer, Netherlands; 2007).

    Google Scholar 

  5. Anderson M, Li XL, Adalı T: Complex-valued independent vector analysis: application to multivariate Gaussian model. Signal Process 2012, 92(8):1821-1831. 10.1016/j.sigpro.2011.09.034

    Article  Google Scholar 

  6. Anderson M, Adalı T, Li XL: Joint blind source separation with multivariate Gaussian model: algorithms and performance analysis. IEEE Trans. Signal Process 2012, 60(4):1672-1683.

    Article  MathSciNet  Google Scholar 

  7. Kim T: Real-time independent vector analysis for convolutive blind source separation. IEEE Trans. Circ. Syst. I: Regular Papers 2010, 57(7):1431-1438.

    Article  Google Scholar 

  8. Hao J, Lee I, Lee TW, Sejnowski TJ: Independent vector analysis for source separation using a mixture of Gaussians prior. Neural Comput 2010, 22(6):1646-1673. 10.1162/neco.2010.11-08-906

    Article  MathSciNet  MATH  Google Scholar 

  9. Lee I, Jang GJ: Independent vector analysis based on overlapped cliques of variable width for frequency-domain blind signal separation. EURASIP J. Adv. Signal Process 2012, 113: 1-12.

    Article  Google Scholar 

  10. Bermejo S: Finite sample effects in higher order statistics contrast functions for sequential blind source separation. IEEE Signal Process. Lett 2005, 12(6):481-484.

    Article  Google Scholar 

  11. Ghennioui H, Fadaili EM, Thirion-Moreau N, Adib A, Moreau E: A nonunitary joint block diagonalization algorithm for blind separation of convolutive mixtures of sources. IEEE Signal Process. Lett 2007, 14(11):860-863.

    Article  Google Scholar 

  12. Ghennioui H, Thirion-Moreau N, Moreau E, Aboutajdine D: Gradient-based joint block diagonalization algorithms: application to blind separation of FIR convolutive mixtures. Signal Process 2010, 90(6):1836-1849. 10.1016/j.sigpro.2009.12.002

    Article  MATH  Google Scholar 

  13. Li XL, Adalı T, Anderson M: Joint blind source separation by generalized joint diagonalization of cumulant matrices. Signal Process 2011, 91(10):2314-2322. 10.1016/j.sigpro.2011.04.016

    Article  MATH  Google Scholar 

  14. Shen H, Kleinsteuber M: A matrix joint diagonalization approach for complex independent vector analysis. In Proceedings of the 10th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), vol. 7191 Lecture Notes in Computer Science. Edited by: Theis F, Cichocki A, Yeredor A, Zibulevsky M. (Springer-Verlag, Berlin/Heidelberg; 2012).

    Google Scholar 

  15. Shen H, Kleinsteuber M: Complex blind source separation via simultaneous strong uncorrelating transform. In Lecture Notes in Computer Science, Proceedings of the 9th International Conference on Latent Variable Analysis and Signal Separation, vol. 6365. (Springer-Verlag, Berlin/Heidelberg; 2010).

    Google Scholar 

  16. Smaragdis P: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 1998, 22: 21-34. 10.1016/S0925-2312(98)00047-2

    Article  MATH  Google Scholar 

  17. Comon P, Jutten C: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press Inc, San Diego, USA; 2010.

    Google Scholar 

  18. Makino S, Lee TW, Sawada H: Blind Speech Separation Signals and Communication Technology. Springer, Netherlands; 2007.

    Book  Google Scholar 

  19. Hosseini S, Deville Y, Saylani H: Blind separation of linear instantaneous mixtures of non-stationarysignals in the frequency domain. Signal Process 2009, 89: 819-830. 10.1016/j.sigpro.2008.10.024

    Article  MATH  Google Scholar 

  20. Maehara T, Murota K: Simultaneous singular value decomposition. Linear Alg. Appl 2011, 435: 106-116. 10.1016/j.laa.2011.01.007

    Article  MathSciNet  MATH  Google Scholar 

  21. Absil PA, Gallivan KA: Joint diagonalization on the oblique manifold for independent component analysis. In Proceedings of the 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5. (Toulouse, France; 2006).

    Google Scholar 

  22. Afsari B: Sensitivity analysis for the problem of matrix joint diagonalization. SIAM J. Matrix Anal. Appl 2008, 30(3):1148-1171. 10.1137/060655997

    Article  MathSciNet  Google Scholar 

  23. Absil PA, Mahony R, Sepulchre R: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ; 2008.

    Book  MATH  Google Scholar 

  24. Helmke U, Hüper K, Trumpf J: Newton’s method on Graßmann manifolds. 2007.

    Google Scholar 

  25. Nocedal J, Wright SJ: Numerical Optimization. Springer, New York; 2006.

    MATH  Google Scholar 

  26. Kleinsteuber M, Hüper K: An intrinsic CG algorithm for computing dominant subspaces. In Proceedings of the 32nd IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP), vol. 4. (Hawaii, USA; 2007).

    Google Scholar 

  27. Smith ST: Optimization techniques on Riemannian manifolds. In Hamiltonian and Gradient Flows, Algorithms and Control, Fields Institute Communications, vol. 3. Edited by: Bloch A. American Mathematical Society, Providence, RI; 1994).

    Google Scholar 

  28. Gabay D: Minimizing a differentiable function over a differential manifold. J. Optimiz. Theory Appl 1982, 37(2):177-219. 10.1007/BF00934767

    Article  MathSciNet  MATH  Google Scholar 

  29. Ring W, Wirth B: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optimiz 2012, 22(2):596-627. 10.1137/11082885X

    Article  MathSciNet  MATH  Google Scholar 

  30. Amari SI, Cichocki A, Yang HH: A new learning algorithm for blind signal separation. In Advances in Neural Information Processing Systems (NIPS), vol. 8. Edited by: Touretzky DS, Mozer MC, Hasselmo ME. (The MIT Press, Cambridge, MA, USA; 1996).

    Google Scholar 

Download references


This study had been supported by the Cluster of Excellence CoTeSys—Cognition for Technical Systems, funded by the Deutsche Forschungsgemeinschaft (DFG).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Martin Kleinsteuber.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Shen, H., Kleinsteuber, M. Non-unitary matrix joint diagonalization for complex independent vector analysis. EURASIP J. Adv. Signal Process. 2012, 241 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: