Non-unitary matrix joint diagonalization for complex independent vector analysis

Shen, Hao; Kleinsteuber, Martin

doi:10.1186/1687-6180-2012-241

Research
Open access
Published: 21 November 2012

Non-unitary matrix joint diagonalization for complex independent vector analysis

Hao Shen¹ &
Martin Kleinsteuber¹

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 241 (2012) Cite this article

2910 Accesses
1 Citations
1 Altmetric
Metrics details

Abstract

Independent vector analysis (IVA) is a special form of independent component analysis (ICA), which has demonstrated its prominent performance in solving convolutive blind source separation (BSS) problems in the frequency domain. Most IVA algorithms are based on optimizing certain contrast functions, where the main difficulty of these approaches lies in finding a reliable and fast estimation of the unknown distribution of sources. Despite the rich availability of efficient tensorial approaches to the standard ICA problem, these methods have not been explored considerably for IVA. In this article, we propose a matrix joint diagonalization approach to solve the complex IVA problem. The new factorization neither relies on a whitening process, nor does it require an estimate of the joint probability distribution of the dependent signal groups. The latter is in contrast to most IVA approaches up to date. The underlying geometry of the problem is investigated together with a critical point analysis of the resulting cost function. A conjugate gradient algorithm on the appropriate manifold setting is developed.

1 Introduction

Independent component analysis (ICA) is a standard statistical tool for solving the blind source separation (BSS) problem. BSS aims to recover source signals from the observed mixtures, without knowing either the distribution of the sources or the mixing process. Application of the standard ICA model is often limited, since it requires mutual statistical independence between all individual components. However, in many applications, there exist groups of signals of interest, where components from different groups are mutually statistically independent indeed, but where mutual statistical dependence occurs between components in the same group. Such problems can be tackled by a technique now referred to as multidimensional independent component analysis (MICA) [1], or independent subspace analysis (ISA) [2].

A special form of ISA arises in solving the BSS problem with convolutive mixtures [3]. After transferring the convolutive observations into the frequency domain via short-time Fourier transforms, the convolutive BSS problem results in a collection of instantaneous complex BSS problems in each frequency bin. After solving the subproblems individually, the final stage faces the challenge of aligning all statistically dependent components from different groups, which is referred to as the permutation problem. To avoid this problem, a new approach named independent vector analysis (IVA) has been proposed in [4]. Besides its application in convolutive BSS problem, IVA has also recently been applied to analyze multivariate Gaussian models, cf. [5, 6]. In the current literature, the majority of IVA algorithms are based on optimizing certain contrast functions, cf. [5, 7–9]. The main difficulty of these contrast function based approaches lies in estimating the unknown distribution of the sources, which usually requires a large number of observations [10].

On the other hand, tensorial approaches are efficient and richly available to solve both the ICA and ISA problems. In particular, joint block diagonalization approaches are shown to be effective methods for solving the ISA problem, cf. [11, 12], and are inherently applicable to IVA. However, such general joint block diagonalization approaches do not take the intrinsic structure of the IVA problem into account. Recent study in [13] proposes a joint diagonalization approach of cross cumulant matrices to solve the complex IVA problem. More recently, the present authors have developed a similar approach of jointly diagonalizing both cross covariance and cross pseudo covariance matrices, cf. [14]. In this article, we extend the previous study in [14], and adapt the so-called complex oblique projective (COP) manifold, which has proven to be an appropriate setting for the standard instantaneous complex ICA problem [15], to the current scenario. Finally, an efficient conjugate gradient (CG) based IVA algorithm is proposed, and numerical experiments are provided to demonstrate the convergence properties of the proposed CG algorithm, and to compare its performance with two recently developed IVA algorithms in terms of separation quality.

2 Notations

Throughout the article, (·)^⊤ denotes the matrix transpose, (·)^H the Hermitian transpose, $\bar{(\cdot)}$ the entry-wise complex conjugate of a matrix, and by Gl(m) the set of all m×m invertible complex matrices. The Frobenius norm of a matrix $A \in C^{m \times n}$ is denoted by $∥ A ∥_{F} : = \sqrt{t r (A A^{H})}$ , where tr(·) is the trace of a square matrix. Given a square matrix $Z \in C^{m \times m}$ , ddiag(Z)forms a diagonal matrix whose diagonal entries are those of Z, and off(Z) generates a matrix by setting all diagonal entries of Z to zero, i.e. off(Z):=Z−ddiag(Z).

In this study, we consider an m-dimensional complex signal $s (t) = {[s_{1} (t), \dots, s_{m} (t)]}^{⊤} \in C^{m}$ as an m-dimensional complex stochastic process indexed by the variable t. The empirical expectation of a random variable s is denoted by $E [s (t)] = \frac{1}{T} \sum_{t = 1}^{T} s (t)$ , where T is the number of samples. As usual for the standard ICA model, we assume without loss of generality that $E [s (t)] = 0$ . The empirical covariance and pseudo-covariance matrix of complex signals s(t) are referred to as $cov (s (t)) : = E [s (t) s {(t)}^{H}]$ and $pcov (s (t)) : = E [s (t) s {(t)}^{⊤}]$ , respectively.

3 Problem description

It is known that convolutive BSS problems can be transformed into in the frequency domain, and can be solved as instantaneous complex BSS problems for every frequency simultaneously, when the demixing filter is sufficiently longer than the mixing filter, cf. [16, 17]. In this study, we consider the spectral time-frequency representation of a signal in terms of a short-time Fourier transformation that is centered at time t. Let $w_{i} (t, f) \in C$ and $s_{i} (t, f) \in C$ denote the coefficient of the center frequency f of the i th observation w_i(t) and the i th source signal s_i(t), respectively. Then, for a given pair (t, f), the Fourier coefficients of the observations and the sources obey the equality

w (t, f) = A_{f} s (t, f),

(1)

where $w (t, f) : = {[w_{1} (t, f), \dots, w_{m} (t, f)]}^{⊤} \in C^{m}$ , $s (t, f) : = {[s_{1} (t, f), \dots, s_{m} (t, f)]}^{⊤} \in C^{m}$ , and $A_{f} \in C^{m \times m}$ serves as a complex mixing matrix. More compactly, for a fixed frequency f , we get a standard instantaneous complex BSS problem as

W (f) = A_{f} S (f),

(2)

where $W (f) \in [w (1, f), \dots, w (T, f)] \in C^{m \times T}$ and $S (f) \in [s (1, f), \dots, s (T, f)] \in C^{m \times T}$ , with T being the number of chosen time frames. One popular approach to solve the convolutive BSS problem is to solve the individual instantaneous BSS problem at each frequency (2), and then assemble the results from each frequency to reconstruct the estimated signal in the time domain [18].

Let us denote the rows of S(f)by $s_{i} (f) = [s_{i} (1, f), \dots, s_{i} (T, f)] \in C^{1 \times T}$ for i = 1,…,m. Following the assumption of statistical independence between the sources, the complex valued signals s_i(f) and s_j(f) are statistically independent for i ≠ j. In contrast, we assume that for a pair of frequencies (f_p,f_q)with f_p≠ f_q, the complex signals s_i(f_p) and s_i(f_q) are statistically dependent for a given source. The development of IVA is inspired by this cross frequency structure. It aims to find a set of demixing matrices ${X_{f}} \subset G l (m)$ via

Y (f) = X_{f}^{H} W (f),

(3)

such that

(1)
all sub-ICA problems are solved, and
(2)
the statistical alignment between groups is restored, i.e. the estimated i th signals {y _i(f)} are mutually statistically dependent.

The main idea for our approach is to exploit the cross covariance matrices between groups of observations defined as

\begin{matrix} cov (W (f_{i}), W (f_{j})) : = \frac{1}{T} \sum_{t = 1}^{T} w (t, f_{i}) w {(t, f_{j})}^{H} \\ = A (f_{i}) \underset{= : cov (S (f_{i}), S (f_{j}))}{\underset{⏟}{\frac{1}{T} \sum_{t = 1}^{T} s (t, f_{i}) s {(t, f_{j})}^{H}}} A {(f_{j})}^{H} . \end{matrix}

(4)

Similarly, the so-called pseudo cross covariance, defined as

\begin{matrix} pcov (W (f_{i}), W (f_{j})) : = \frac{1}{T} \sum_{t = 1}^{T} w (t, f_{i}) w {(t, f_{j})}^{⊤} \\ = A (f_{i}) pcov (S (f_{i}), S (f_{j})) A {(f_{j})}^{⊤}, \end{matrix}

(5)

also allows to gain additional information about the second-order statistics of the involved signals. In this study, we assume that cross covariances between sources in all groups do not vanish. The assumption of statistical independence between the source signals implies that the cross covariance matrix cov(S(f_i),S(f_j)) and the pseudo cross covariance matrix pcov(S(f_i),S(f_j))are diagonal for all pairs (i j). With a further assumption on the sources being non-stationary, which has been exploited in [19], we arrive at a problem of jointly diagonalizing two sets of cross covariance and pseudo cross covariance matrices at different time instances.

To summarize, we are interested in solving the following problem. For a complex IVA problem with k subproblems, we consider the cross covariance and pseudo cross covariance matrices at n time instances, i.e. for all i,j=1,…,k and t=1,…,n, a set of matrices ${C_{i j}^{(t)}}_{i < j}$ and a set of complex symmetric matrices ${R_{i j}^{(t)}}_{i < j}$ , which are constructed by

C_{i j}^{(t)} = A_{i} Ω_{i j}^{(t)} A_{j}^{H} and R_{i j}^{(t)} = A_{i} {\tilde{Ω}}_{i j}^{(t)} \bar{A_{j}},

(6)

where $Ω_{i j}^{(t)}, {\tilde{Ω}}_{i j}^{(t)} \in C^{m \times m}$ are diagonal. The task is to find a set of matrices ${X_{i}}_{i = 1}^{k} \subset Gl (m)$ such that

X_{i}^{H} C_{i j}^{(t)} X_{j} and X_{i}^{H} R_{i j}^{(t)} {\bar{X}}_{j},

(7)

for all i < jand t = 1,…,n, are simultaneously, or approximately simultaneously diagonalized. In this study, we study the noise free IVA problem as defined in (2), and neglect the cross covariance matrix estimation errors due to the finite sample size effect. In other words, we assume that both sets of $C_{i j}^{(t)}$ ’s and $R_{i j}^{(t)}$ ’s are jointly diagonalizable.

Note that the above problem is similar to the simultaneous SVD formulation proposed in [20], where only the situation with two transform matrices is studied, i.e. k = 2. To the contrary, our current setting deals with the cases of multiple transform matrices {X_i}_i=1,…,k, which are not restricted to be unitary. Finally, instead of considering second order cross covariance matrices, our developed approach can be generalized to the high order cross cumulants. We refer to [17] for further details.

4 Diagonality measure and the COP manifold

Our cost function to tackle problem (7) originates from the popular off-norm function that measures the squared Frobenius norm of the off-diagonal entries of the involved matrices. We develop an appropriate mathematical setting on the subsequently defined complex oblique projective (COP) manifold to provide its critical point analysis.

4.1 Derivation of the cost function

For legibility reasons, from now on, we only consider the problem of simultaneously diagonalizing the covariance matrices, i.e. the first condition in (7). The combination with the additional requirement that also the pseudo cross covariance matrices may be used for estimating the demixing matrix is straightforwardly adapted to our setting and not further discussed here.

Let us define the off-norm function as

\begin{array}{l} g : {(Gl (m))}^{k} \to R, \\ g (X_{1}, \dots, X_{k}) : = \sum_{i < j}^{k} \sum_{t = 1}^{n} \frac{1}{2} {∥off (X_{i}^{H} C_{i j}^{(t)} X_{j})∥}_{F}^{2} . \end{array}

(8)

Due to the noise-free assumption and since we neglect finite sample size effects, the set of joint diagonalizers $(X_{1}^{*}, \dots, X_{k}^{*})$ of the $C_{i j}^{(t)}$ in Equation (7) is a global minimum of g, that is $g (X_{1}^{*}, \dots, X_{k}^{*}) = 0$ . It is clear that a minimization approach without further constraints on the X_i would drive all diagonalizers to zero. In order to avoid such trivial solutions and to regularize the minimization problem, the authors in [21] propose to restrict all columns of transform matrices to have unit norm. This set is known as the oblique manifold, which has been shown to be an appropriate setting for matrix diagonalization, cf. [22]. Its complex counterpart is the so-called complex oblique manifold

O b (m) : = \{X \in G l (m) |ddiag (X^{H} X) = I_{m}\}

(9)

and we denote by Ob^k(m) the product manifold of k copies of Ob(m). The restriction of the off-norm cost function (8) is denoted by

\begin{array}{l} g_{1} : O b^{k} (m) \to R, \\ g_{1} (X_{1}, \dots, X_{k}) : = \sum_{i < j}^{k} \sum_{t = 1}^{n} \frac{1}{2} {∥off (X_{i}^{H} C_{i j}^{(t)} X_{j})∥}_{F}^{2} . \end{array}

(10)

Now denote the p th column of X_iby x_ip. It is obvious that the function g₁ is invariant with respect to the phase difference of each column x_ip, which reflects the well-known scaling ambiguity of complex ICA problems. By a further calculation, g₁ has the form

\begin{array}{l} g_{1} (X_{1}, \dots, X_{k}) = \frac{1}{2} \sum_{i < j}^{k} \sum_{t = 1}^{n} \sum_{p \neq q}^{m} {|x_{i p}^{H} C_{i j}^{(t)} x_{j q}|}^{2} \\ = \frac{1}{2} \sum_{i < j}^{k} \sum_{t = 1}^{n} \sum_{p \neq q}^{m} x_{i p}^{H} C_{i j}^{(t)} x_{j q} {(x_{i p}^{H} C_{i j}^{(t)} x_{j q})}^{H} \\ = \frac{1}{2} \sum_{i < j}^{k} \sum_{t = 1}^{n} \sum_{p \neq q}^{m} tr ((x_{i p} x_{i p}^{H}) C_{i j}^{(t)} (x_{j q} x_{j q}^{H}) C_{i j}^{(t) H}) . \end{array}

(11)

Instead of fixing a phase for each x_ip, in this study we employ an elegant mathematical setting for the problem. Recall the fact that each $x_{i p} x_{i p}^{H}$ defines a Hermitian rank-one projector, the set of which identifies the (m − 1)- dimensional complex projective space ${C P}^{m - 1}$ , i.e.

{C P}^{m - 1} = \{P \in C^{m \times m} |P^{H} = P, P^{2} = P, tr (P) = 1\} .

(12)

By doing so for each column and by maintaining the fact that the columns of X_i form a complex basis (i.e. invertibility of X_i), we naturally arrive at the following set, which we refer to as the complex oblique projective (COP) manifold,

Q (m) : = \{(P_{1}, \dots, P_{m}) | P_{i} \in {C P}^{m - 1}, det (\sum_{i = 1}^{m} P_{i}) > 0\} .

(13)

The off-norm cost function g₁ now induces the following function g₂ on the COP manifold. Namely, if $Q^{k} (m)$ denotes the k-times product of $Q (m)$ , g₂ is given by

\begin{array}{l} g_{2} : Q^{k} (m) \to R, \\ g_{2} (P_{1}, \dots, P_{k}) : = \sum_{i < j}^{k} \sum_{t = 1}^{n} \sum_{p \neq q}^{m} tr P_{i p} C_{i j}^{(t)} P_{j q} C_{i j}^{(t) H}, \end{array}

(14)

with P_i:=(P_{i 1},…,P_im).

4.2 The geometry of the complex oblique projective manifold

In this section, we recall some basic facts and concepts that are necessary for developing a Riemannian CG algorithm on the COP manifold, cf. [23]. In particular, we require a formula for the parallel transport and the geodesics of the COP manifold. We endow $Q (m)$ with the standard Riemannian metric

〈 (A_{1}, \dots, A_{m}), (B_{1}, \dots, B_{m}) 〉 : = \sum_{i} t r (A_{i} B_{i}),

(15)

inherited from the Euclidean metric of the m-fold product of Hermitian matrices. With this, $Q (m)$ is an open and dense Riemannian submanifold of the m-times product of ${C P}^{m - 1}$ with the standard metric, i.e.

\bar{Q (m)} = : {({C P}^{m - 1})}^{m},

(16)

where $\bar{Q (m)}$ denotes the closure of $Q (m)$ . Accordingly, the tangent spaces, the geodesics, and the parallel transport for $Q (m)$ and ${({C P}^{m - 1})}^{m}$ coincide locally and thus are easily derived from the geometry of ${C P}^{m - 1}$ . We refer to [24] for further discussions and details about ${C P}^{m - 1}$ .

Let us denote by

u (m) : = \{Ω \in C^{m \times m} |Ω = - Ω^{H}\}

(17)

the set of skew-Hermitian matrices. The tangent space at P in ${C P}^{m - 1}$ is given by

T_{P} {C P}^{m - 1} = \{[P, Ω] |Ω \in u (m)\}

(18)

where [A,B]:=AB−BA is the matrix commutator. Then, the tangent space at $P = (P_{1}, \dots, P_{m}) \in Q (m)$ is simply the Cartesian product

T_{P} Q (m) ≅ T_{P_{1}} {C P}^{m - 1} \times \dots \times T_{P_{m}} {C P}^{m - 1} .

(19)

With the above metric, the geodesics through $P \in {C P}^{m - 1}$ in direction $Z \in T_{P} {C P}^{m - 1}$ are given by

γ_{P, Z} : R \to {C P}^{m - 1}, γ_{P, Z} (t) : = e^{t [Z, P]} P e^{- t [Z, P]},

(20)

where e^(·)denotes the matrix exponential. Thus, the (local^a) geodesic through $P \in Q (m)$ in direction $Z : = (Z_{1}, \dots, Z_{m}) \in T_{P} Q (m)$ is given by

γ_{P, Z} (t) = (γ_{P_{1}, Z_{1}} (t), \dots, γ_{P_{m}, Z_{m}} (t)) .

(21)

The parallel transport of $Ψ : = (Ψ_{1}, \dots, Ψ_{m}) \in T_{P} Q (m)$ with respect to the Levi-Civita connection along the geodesic γ_P,Z(t) is

τ_{P, Z} (Ψ) : = (τ_{P_{1}, Z_{1}} (Ψ_{1}), \dots, τ_{P_{m}, Z_{m}} (Ψ_{m})),

(22)

with τ_P,Z being the parallel transport of $Ψ \in T_{P} {C P}^{m - 1}$ with respect to the Levi-Civita connection along the geodesic γ_P,Z(t), i.e.

τ_{P, Z} (Ψ) = e^{[Z, P]} Ψ e^{- [Z, P]} .

(23)

The natural or Riemannian gradient of a function that is the restriction of some globally defined function to a sub-manifold is simply the orthogonal projection of the Euclidean gradient onto the corresponding tangent space. For the complex projective space, this projection is given by

π_{P} : C^{m \times m} \to T_{P} {C P}^{m - 1}, A \mapsto [P, [P, \frac{1}{2} (A + A^{H})]] .

(24)

It is easily seen that the operator π_P is an orthogonal projector on the tangent space $T_{p} {C P}^{m - 1}$ , i.e. that π_P∘π_P(A)=π_P(A) and that the null space of π_P is orthogonal to its image. Here, ∘denotes the composition of functions. The formulas for the tangent spaces, the geodesics, the parallel transport, and the projection onto the tangent spaces of $Q^{k} (m)$ follow directly from the product manifold structure.

5 Critical point analysis of the cost function

In this section, we conduct a critical point analysis of the cost function g₂ on the product COP manifold. We show that the joint diagonalizers are a non-degenerate global minimum of g₂, This is an important fact, since in many cases the speed of convergence relies on the non-degeneracy of the minima. First of all, we present a lemma which originates from the derivation of the cost from the off-norm function.

Lemma 1

Let us assume that all $C_{i j}^{(t)}$ ’s are jointly diagonalizable. If $(X_{1}^{*}, \dots, X_{k}^{*}) \in O b^{k} (m)$ minimizes the cost function g₁, as defined in (10), i.e. $X_{i}^{* H} C_{i j}^{(t)} X_{j}^{*} = D_{i j}^{(t)} = diag (d_{i j 1}^{(t)}, \dots, d_{ijm}^{(t)})$ being diagonal for all t=1,…,n and i,j=1,…,k, then the set of corresponding Hermitian projectors $P^{*} = (P_{1}^{*}, \dots, P_{k}^{*}) \in Q^{k} (m)$ with $P_{ip}^{*} : = x_{ip}^{*} x_{ip}^{* H} \in {C P}^{m - 1}$ minimizes the cost function g₂, defined in (14) and

\{\begin{matrix} P_{i p}^{*} C_{i j}^{(t)} P_{j q}^{*} = 0, & for p \neq q, \\ P_{i p}^{*} C_{i j}^{(t)} P_{j q}^{*} = d_{i j p}^{(t)} P_{i p}^{*}, & for p = q . \end{matrix}

(25)

The above lemma follows directly from the condition of $X_{i}^{* H} C_{i j}^{(t)} X_{j}^{*}$ being diagonal, i.e. its (p,q)th entry is computed as

x_{i p}^{H} C_{i j}^{(t)} x_{i q} = \{\begin{matrix} 0, & for p \neq q, \\ d_{i j p}^{(t)}, & for p = q . \end{matrix}

(26)

Now, let $P : = (P_{1}, \dots, P_{m}) \in Q^{k} (m)$ be arbitrary. We compute the first derivative of g₂at $P \in Q^{k} (m)$ in direction $Z : = (Z_{1}, \dots, Z_{k}) \in T_{P} Q^{k} (m)$ as

\begin{array}{l} D g_{2} (P) Z = \sum_{i < j}^{k} \sum_{p \neq q}^{m} \sum_{t = 1}^{n} tr Z_{i p} C_{i j}^{(t)} P_{j q} C_{i j}^{(t) H} \\ + tr P_{i p} C_{i j}^{(t)} Z_{j q} C_{i j}^{(t) H} . \end{array}

(27)

By recalling the structure of the tangent space of $Q^{k} (m)$ and the result in Lemma 1, it is trivial to see that the first derivative of g₂ vanishes at $P^{*}$ , which corresponds to the correct joint diagonalizers.

The remainder of this section addresses the characterization of the Hessian of g₂at the joint diagonalizers. To that end, we denote by $o f f (m) : = {Z \in C^{m \times m} | z_{ii} = 0,$ i=1,…,m}the set of matrices with zero diagonal. Let Πbe the natural projection

Π : Ob (m) \to Q (m), Π (X) = (x_{1} x_{1}^{H}, \dots, x_{m} x_{m}^{H})

(28)

and let μ_Xbe defined as

\begin{array}{l} μ_{X} : o f f (m) \to O b (m), \\ Z \mapsto X (I_{m} + Z) diag (\frac{1}{∥ X (e_{1} + z_{1}) ∥}, \dots, \frac{1}{∥ X (e_{m} + z_{m}) ∥}), \end{array}

(29)

where e_j denotes the j th standard basis vector. Note, that μ_Xdefines a locally injective but not bijective mapping. The composition of Π and μ_X, however, yields a local diffeomorphism. With the shorthand notation P := Π(X), the mapping

ϕ_{P} : o f f (m) \to Q (m), Z \mapsto Π \circ μ_{X} (Z)

(30)

is a local parametrization around P and thus permits a local parameterization of $Q^{k} (m)$ via

\begin{array}{l} Φ_{P} : {o f f}^{k} (m) \to Q^{k} (m), \\ (Z_{1}, \dots, Z_{k}) \mapsto (ϕ_{P_{1}} (Z_{1}), \dots, ϕ_{P_{k}} (Z_{k})), \end{array}

(31)

with $Φ_{P} (0) = P : = (P_{1}, \dots, P_{k})$ . The associated tangent map $T Φ_{P}$ is given as

\begin{matrix} T Φ_{P} : T {o f f}^{k} (m) ≅ {o f f}^{k} (m) \to T_{P} Q^{k} (m), \\ (Θ_{1}, \dots, Θ_{k}) \mapsto \\ (x_{11} ξ (x_{11}) θ_{11}^{H} + θ_{11} ξ (x_{11}) x_{11}^{H}, \dots, \\ x_{k m} ξ (x_{k m}) θ_{k m}^{H} + θ_{k m} ξ (x_{k m}) x_{k m}^{H}), \end{matrix}

(32)

where $ξ (x_{ip}) : = I_{m} - x_{ip} x_{ip}^{H}$ is the orthogonal projection operator onto the complement space of x_ip. Let $Z : = T Φ_{P^{*}} (Θ) \in T_{P^{*}} Q^{k} (m)$ . Then, we can compute the Hessian of g₂ at the critical points $P^{*} \in Q^{k} (m)$ , i.e. the symmetric bilinear form $H_{g_{2}} (P^{*}) : T_{P^{*}} Q^{k} (m) \times T_{P^{*}} Q^{k} (m) \to R$ via

\begin{array}{l} H_{g_{2}} (P^{*}) (Z, Z) = H_{g_{2}} (P^{*}) (T Φ_{P^{*}} (Θ), T Φ_{P^{*}} (Θ)) \\ = {\frac{d^{2}}{d t^{2}} (g_{2} \circ Φ_{P^{*}}) (t Θ)|}_{t = 0} \\ = \sum_{i < j}^{k} \sum_{p \neq q}^{m} \sum_{t = 1}^{n} 2 tr Z_{i p}^{*} C_{i j}^{(t)} Z_{j q}^{*} C_{i j}^{(t) H} \\ = \sum_{i < j}^{k} \sum_{p \neq q}^{m} \sum_{t = 1}^{n} 2 {|d_{i j p}^{(t)} θ_{p q}^{(i)} + d_{i j q}^{(t)} θ_{q p}^{(j)}|}^{2} . \end{array}

(33)

The last equality holds by following the results in Lemma 1, i.e. $X_{i}^{* H} C_{i j}^{(t)} X_{j}^{*} = D_{i j}^{(t)}$ , which is equivalent to $C_{i j}^{(t)} = X_{i}^{* - H} D_{i j}^{(t)} X_{j}^{* - 1}$ and the fact that $Z_{ip}^{*} = (I_{m} - P_{i 1}) X_{i}^{*} θ_{ip}$ . It can easily been seen that the Hessian form (33) is positive definite if and only if all (2×2)-matrices

\sum_{t = 1}^{n} [\begin{matrix} | d_{i j p}^{(t)} |^{2} d_{i j q}^{(t)} \bar{d_{i j p}^{(t)}} \\ d_{i j p}^{(t)} \bar{d_{i j q}^{(t)}} | d_{i j q}^{(t)} |^{2} \end{matrix}]

(34)

are positive definite. Since this is a generic assumption on the data, we have the following result.

Theorem 1

Generically, the global minimizer of the cost function g₂is non-degenerate.

6 A CG IVA algorithm

In this section, we introduce a general form of CG algorithms on matrix manifolds. After computing the Riemannian gradient of the cost function on the COP manifold, we develop a CG based IVA algorithm. The CG methods on matrix manifolds are shortly reviewed here. They form the backbone of the algorithm for our optimization problem on the COP manifold and explain the use of the differential geometric concepts derived in the previous sections. For an in-depth introduction on optimization on matrix manifolds, we refer the interested reader to [23].

Let M be a submanifold of some Euclidean space with inner product 〈·,·〉 and let $f : M \to R$ be smooth. The CG method is initialized by some x₀∈M and the descent direction H₀:=−gradf(x₀)given by the Riemannian gradient. If f is the restriction of a globally defined function $\hat{f}$ to M, the Riemannian gradient is just the orthogonal projection of the gradient of $\hat{f}$ to the tangent space, i.e.

grad f (x) = π_{x} (\nabla \hat{f} (x)),

(35)

where $\nabla \hat{f} (x)$ denotes the Euclidean gradient of $\hat{f}$ , and π_x is the orthogonal projection onto T_xM. Subsequently, sweeps are iterated that consist of two steps, a line search in a given direction (i.e. along a geodesic in that direction) followed by an update of the search direction. Several different possibilities for these steps lead to different CG methods. Assume now that x_i, H_i, and G_i:=grad f(x_i) are given.

Given a geodesic γ_iwith γ_i(0)=x_i and ${\dot{γ}}_{i} (0) = H_{i}$ , the line search aims to find $λ_{i} \in R$ that minimizes $f \circ γ : t \to R$ . A generic approach for the step-size selection is a Riemannian adaption to the backtracking line search and several modifications, cf. [23, 25]. Here, we present a closed form solution for the step-size selection that works particularly well for our problem due to the quadratic nature of our cost function, cf. [26]. It is based on the assumption that a one-dimensional Newton step along f∘γyields a good approximation for its minimizer. Explicitly, we choose the step-size as

λ_{i} : = - \frac{\frac{d}{dλ} (f \circ γ) (λ) |_{λ = 0}}{| d^{2} d λ^{2} (f \circ γ) (λ) |_{λ = 0} |} .

(36)

The absolute value in the denominator is chosen for the following reason. While being an unaltered one-dimensional Newton step in a neighborhood of a minimum the step size is the negative of a regular Newton step if $\frac{d^{2}}{d t^{2}} (f \circ γ) (λ)_{λ = 0} < 0$ and thus yields non-attractiveness for critical points that are not minima.

In order to compute the new search direction $H_{i + 1} \in T_{x_{i + 1}} M$ , we need to transport H_iand G_i, which are tangent to x_i, to the tangent space $T_{x_{i + 1}} M$ . This is done via parallel transport along the geodesic γ, which we denote by

τ : T_{x_{i}} M \to T_{x_{i + 1}} M.

(37)

The updated search direction is now chosen according to a Riemannian adaption of the Hestenes-Stiefel, the Polak-Ribière, or the Fletcher-Reeves update. Here, we choose a different formulation that performs slightly better in our situation than the afore mentioned ones, namely

\begin{matrix} γ_{i} : & = - \frac{〈 G_{i + 1}, G_{i + 1} - τ G_{i} 〉}{〈 H_{i}, G_{i} 〉} . \end{matrix}

(38)

Albeit the nice performance in applications, convergence analysis of CG methods on smooth manifolds is still an open problem. Partial convergence results for CG-methods on manifolds can be found in [27, 28] and a recent result in [29].

As it is clear from the above, the first step towards formulating a CG algorithm for minimizing the cost function g₂ is to compute its Riemannian gradient. Let us denote by $\hat{g_{2}}$ the continuation of g₂ to the embedding space $C^{m \times m \times m \times k}$ . Following the computation in Equation (27), we have the Euclidean gradient of $\hat{g_{2}}$ at $P \in Q^{k} (m)$ , i.e. $\nabla \hat{g_{2}} (P) : = (J_{1}, \dots, J_{k})$ , for each element $J_{i p} \in C^{m \times m}$ , as

J_{i p} = \sum_{j > i}^{k} \sum_{p \neq q}^{m} \sum_{t = 1}^{n} C_{i j}^{(t)} P_{i q} C_{i j}^{(t) H} + \sum_{j < i}^{k} \sum_{p \neq q}^{m} \sum_{t = 1}^{n} C_{i j}^{(t) H} P_{i q} C_{i j}^{(t)} .

(39)

By projecting it onto the tangent space $T_{P} Q^{k} (m)$ , we get the Riemannian gradient of g₂at $P \in Q^{k} (m)$ , i.e. $grad g_{2} (P) : = (G_{1}, \dots, G_{k}) \in T_{P} Q^{k} (m)$ , for each element $G_{i p} \in T_{P_{ip}} {C P}^{m - 1}$ , as

\begin{matrix} G_{i p} = [P_{i p}, [P_{i p}, \sum_{j > i}^{k} \sum_{p \neq q}^{m} \sum_{t = 1}^{n} C_{i j}^{(t)} P_{i q} C_{i j}^{(t) H} \\ + \sum_{j < i}^{k} \sum_{p \neq q}^{m} \sum_{t = 1}^{n} C_{i j}^{(t) H} P_{i q} C_{i j}^{(t)}]] . \end{matrix}

(40)

The above formula for the Riemannian gradient now allows to implement the geometric CG algorithm for minimizing the function g₂as define in (14) in a straightforward way. A pseudo code is provided in Algorithm 1.

Algorithm 1 A CG IVA algorithm

Input: A set of matrices ${C_{i j}^{(t)}} \subset C^{m \times m}$ for i,j=1,…,n;

Step 1: Generate an initial guess $P^{(0)} = [P_{1}^{(0)} \dots, P_{k}^{(0)}] \in Q^{k} (m)$ and set i=1;

Step 2: Compute $G^{(1)} = H^{(1)} = [H_{1}, \dots, H_{k}] \leftarrow - grad g_{2} (P^{(0)})$ using Equation (40);

Step 3: Set i=i + 1;

Step 4: Update $P^{i + 1} \leftarrow (γ_{P_{1}, H_{1}} (λ_{i}), \dots, γ_{P_{k}, H_{k}} (λ_{i}))$ , where λ_iis computed (36);

Step 5: Update $H^{(i + 1)} \leftarrow - G^{(i + 1)} + γ_{i} τ_{P^{(i)}, H^{i}} (λ_{i})$ , where

G^{(i + 1)} = grad g_{2} (P^{(i)}),

(41)

and γ_iis chosen according to Equation (38);

Step 6: If i mod (2km(m−1)−1)=0, set $H^{(i + 1)} \leftarrow - G^{(i + 1)}$ ;

Step 7: If $∥G^{(i + 1)}∥$ is small enough, stop. Otherwise, go to Step 3;

7 Numerical experiments

In our experiment, we investigate the performance of our method in terms of both local convergence property and accuracy of estimating the joint diagonalizers.

7.1 Experiment one

The First task of our experiment is to jointly diagonalize two sets of complex matrices, ${C_{i j}^{(t)}}_{i < j}$ and ${R_{i j}^{(t)}}_{i < j}$ , which are constructed by

C_{i j}^{(t)} = A_{i} Ω_{i j}^{(t)} A_{j}^{H} + ϵ N_{i j}^{H} and R_{i j}^{(t)} = A_{i} {\hat{Ω}}_{i j}^{(t)} A_{j}^{⊤} + ϵ N_{i j}^{S}

(42)

where the matrices A_i∈Gl(m)are randomly picked, both real and imaginary parts of the diagonal entries of $Ω_{i j}^{(t)}$ and ${\hat{Ω}}_{i j}^{(t)}$ are drawn from a uniform distribution on the interval (0,10), the matrices $N_{i j}^{H} \in C^{m \times m}$ and $N_{i j}^{S} \in C^{m \times m}$ are a Hermitian and a complex symmetric matrix, respectively, whose real and imaginary parts are generated from a uniform distribution on the unit interval (−0.5,0.5), representing additive stationary noise, and $ϵ \in R$ is the noise level.

In our experiments, we set m=3, k=3, n=3. First of all, we choose the noise level ϵ=0. A typical local convergence curve of our proposed algorithm is shown in Figure 1. A tendency of superlinear convergence can be observed.

In order to investigate the performance of the proposed algorithm in terms of estimation accuracy, we restrict ϵ∈{0.1,0.5,1.0}, and run 50 tests. The performance index is chosen to be the averaged Amari error, proposed in [30]. Generally, the smaller the Amari error, the better the separation. The quartile based boxplot of averaged Amari errors of our proposed algorithm against three different noise levels are drawn in Figure 2. Our CG algorithm demonstrates its correspondingly delaying performance with the increasing noise levels.

7.2 Experiment two

In this experiment, we compare our CG based IVA approach, referred to as IVA-CG, with two second-order statistics based IVA algorithms. We refer to one contrast optimization based IVA algorithm as IVA-CO, cf. [5, 6], and the other matrix joint diagonalization based approach as IVA-JD, cf. [13]. The task of this experiment is to separate two groups of complex valued signals. We take three real audio source signals with 480,000 samples, and apply the short time Fourier transform to the sources with the number of FFT points being 1,024. By doing so, we end up with a complex IVA problem with 513 groups of statistically dependent complex signals.

For a practical implementation of our method, note that computing and jointly diagonalizing all possible cross covariance and pseudo covariance matrices between the 513 groups is prohibitively expensive. We overcome this issue by only taking two neighboring frequency bins randomly at one time. The sources from each frequency bin are mixed independently via multiplying a mixing matrix, whose entries are drawn from a normal distribution. We run the experiment 100 times, and plot the boxplot of averaged Amari errors of the three studied algorithms in Figure 3. It depicts clearly that our proposed IVA-CG algorithm outperforms the other two consistently.

8 Conclusion

We propose a matrix joint diagonalization approach to solve the complex IVA problem which does not rely on a pre-whitening step nor on the estimation of the unknown distribution of the sources. A mathematical setting is derived that allows a formulation without ambiguity on the set of unknown parameters, i.e. the dimension of the search space is maximally reduced. This leads in a natural way to a smooth manifold structure that we call complex oblique projective manifold, due to its close relation to the oblique manifold which consists of invertible matrices with normalized columns. We propose to solve the complex IVA problem via minimizing a cost function that is based on the well-known off-norm function for measuring joint diagonality. We show that our setting leads to a non-degenerate Hessian for the solution of the IVA problem. This is an important result for the design of minimization methods, since in many cases, the speed of convergence relies on the non-degeneracy of the minima. We develop a geometric CG method for solving the IVA problem and conclude by providing some numerical experiments.

Endnote

^aNote, that $Q (m)$ is not a geodesically complete manifold.

References

Cardoso JF: Multidimensional independent component analysis. In Proceedings of the 23rd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4. (Seattle, WA, USA; 1998).
Google Scholar
Hyvärinen A, Hoyer PO: Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 2000, 12(7):1705-1720. 10.1162/089976600300015312
Article Google Scholar
Araki S, Mukai R, Makino S, Nishikawa T, Saruwatari H: The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech. IEEE Trans. Speech Audio Process 2003, 11(2):109-116. 10.1109/TSA.2003.809193
Article Google Scholar
Lee I, Kim T, Lee TW: Independent vector analysis for convolutive blind speech separation. In Blind Speech Separation, Signals and Communication Technology. Edited by: Makino S, Lee TW, Sawada H. (Springer, Netherlands; 2007).
Google Scholar
Anderson M, Li XL, Adalı T: Complex-valued independent vector analysis: application to multivariate Gaussian model. Signal Process 2012, 92(8):1821-1831. 10.1016/j.sigpro.2011.09.034
Article Google Scholar
Anderson M, Adalı T, Li XL: Joint blind source separation with multivariate Gaussian model: algorithms and performance analysis. IEEE Trans. Signal Process 2012, 60(4):1672-1683.
Article MathSciNet Google Scholar
Kim T: Real-time independent vector analysis for convolutive blind source separation. IEEE Trans. Circ. Syst. I: Regular Papers 2010, 57(7):1431-1438.
Article Google Scholar
Hao J, Lee I, Lee TW, Sejnowski TJ: Independent vector analysis for source separation using a mixture of Gaussians prior. Neural Comput 2010, 22(6):1646-1673. 10.1162/neco.2010.11-08-906
Article MathSciNet MATH Google Scholar
Lee I, Jang GJ: Independent vector analysis based on overlapped cliques of variable width for frequency-domain blind signal separation. EURASIP J. Adv. Signal Process 2012, 113: 1-12.
Article Google Scholar
Bermejo S: Finite sample effects in higher order statistics contrast functions for sequential blind source separation. IEEE Signal Process. Lett 2005, 12(6):481-484.
Article Google Scholar
Ghennioui H, Fadaili EM, Thirion-Moreau N, Adib A, Moreau E: A nonunitary joint block diagonalization algorithm for blind separation of convolutive mixtures of sources. IEEE Signal Process. Lett 2007, 14(11):860-863.
Article Google Scholar
Ghennioui H, Thirion-Moreau N, Moreau E, Aboutajdine D: Gradient-based joint block diagonalization algorithms: application to blind separation of FIR convolutive mixtures. Signal Process 2010, 90(6):1836-1849. 10.1016/j.sigpro.2009.12.002
Article MATH Google Scholar
Li XL, Adalı T, Anderson M: Joint blind source separation by generalized joint diagonalization of cumulant matrices. Signal Process 2011, 91(10):2314-2322. 10.1016/j.sigpro.2011.04.016
Article MATH Google Scholar
Shen H, Kleinsteuber M: A matrix joint diagonalization approach for complex independent vector analysis. In Proceedings of the 10th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA), vol. 7191 Lecture Notes in Computer Science. Edited by: Theis F, Cichocki A, Yeredor A, Zibulevsky M. (Springer-Verlag, Berlin/Heidelberg; 2012).
Google Scholar
Shen H, Kleinsteuber M: Complex blind source separation via simultaneous strong uncorrelating transform. In Lecture Notes in Computer Science, Proceedings of the 9th International Conference on Latent Variable Analysis and Signal Separation, vol. 6365. (Springer-Verlag, Berlin/Heidelberg; 2010).
Google Scholar
Smaragdis P: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 1998, 22: 21-34. 10.1016/S0925-2312(98)00047-2
Article MATH Google Scholar
Comon P, Jutten C: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press Inc, San Diego, USA; 2010.
Google Scholar
Makino S, Lee TW, Sawada H: Blind Speech Separation Signals and Communication Technology. Springer, Netherlands; 2007.
Book Google Scholar
Hosseini S, Deville Y, Saylani H: Blind separation of linear instantaneous mixtures of non-stationarysignals in the frequency domain. Signal Process 2009, 89: 819-830. 10.1016/j.sigpro.2008.10.024
Article MATH Google Scholar
Maehara T, Murota K: Simultaneous singular value decomposition. Linear Alg. Appl 2011, 435: 106-116. 10.1016/j.laa.2011.01.007
Article MathSciNet MATH Google Scholar
Absil PA, Gallivan KA: Joint diagonalization on the oblique manifold for independent component analysis. In Proceedings of the 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 5. (Toulouse, France; 2006).
Google Scholar
Afsari B: Sensitivity analysis for the problem of matrix joint diagonalization. SIAM J. Matrix Anal. Appl 2008, 30(3):1148-1171. 10.1137/060655997
Article MathSciNet Google Scholar
Absil PA, Mahony R, Sepulchre R: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton, NJ; 2008.
Book MATH Google Scholar
Helmke U, Hüper K, Trumpf J: Newton’s method on Graßmann manifolds. 2007.
Google Scholar
Nocedal J, Wright SJ: Numerical Optimization. Springer, New York; 2006.
MATH Google Scholar
Kleinsteuber M, Hüper K: An intrinsic CG algorithm for computing dominant subspaces. In Proceedings of the 32nd IEEE International Conference on, Acoustics, Speech, and Signal Processing (ICASSP), vol. 4. (Hawaii, USA; 2007).
Google Scholar
Smith ST: Optimization techniques on Riemannian manifolds. In Hamiltonian and Gradient Flows, Algorithms and Control, Fields Institute Communications, vol. 3. Edited by: Bloch A. American Mathematical Society, Providence, RI; 1994).
Google Scholar
Gabay D: Minimizing a differentiable function over a differential manifold. J. Optimiz. Theory Appl 1982, 37(2):177-219. 10.1007/BF00934767
Article MathSciNet MATH Google Scholar
Ring W, Wirth B: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optimiz 2012, 22(2):596-627. 10.1137/11082885X
Article MathSciNet MATH Google Scholar
Amari SI, Cichocki A, Yang HH: A new learning algorithm for blind signal separation. In Advances in Neural Information Processing Systems (NIPS), vol. 8. Edited by: Touretzky DS, Mozer MC, Hasselmo ME. (The MIT Press, Cambridge, MA, USA; 1996).
Google Scholar

Download references

Acknowledgements

This study had been supported by the Cluster of Excellence CoTeSys—Cognition for Technical Systems, funded by the Deutsche Forschungsgemeinschaft (DFG).

Author information

Authors and Affiliations

The Department of Electrical Engineering and Information Technology, Technische Universität München, München, Germany
Hao Shen & Martin Kleinsteuber

Authors

Hao Shen
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kleinsteuber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Kleinsteuber.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Shen, H., Kleinsteuber, M. Non-unitary matrix joint diagonalization for complex independent vector analysis. EURASIP J. Adv. Signal Process. 2012, 241 (2012). https://doi.org/10.1186/1687-6180-2012-241

Download citation

Received: 21 May 2012
Accepted: 31 October 2012
Published: 21 November 2012
DOI: https://doi.org/10.1186/1687-6180-2012-241

Non-unitary matrix joint diagonalization for complex independent vector analysis

Abstract

1 Introduction

2 Notations

3 Problem description

4 Diagonality measure and the COP manifold

4.1 Derivation of the cost function

4.2 The geometry of the complex oblique projective manifold

5 Critical point analysis of the cost function

Lemma 1

Theorem 1

6 A CG IVA algorithm

Algorithm 1 A CG IVA algorithm

7 Numerical experiments

7.1 Experiment one

7.2 Experiment two

8 Conclusion

Endnote

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords