On the conditions for valid objective functions in blind separation of independent and dependent sources

Caiafa, Cesar F

doi:10.1186/1687-6180-2012-255

Research
Open access
Published: 11 December 2012

On the conditions for valid objective functions in blind separation of independent and dependent sources

Cesar F Caiafa^1,2

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 255 (2012) Cite this article

2887 Accesses
7 Citations
1 Altmetric
Metrics details

Abstract

It is well known that independent sources can be blindly detected and separated, one by one, from linear mixtures by identifying local extrema of certain objective functions (contrasts), like negentropy, non-Gaussianity (NG) measures, kurtosis, etc. It was also suggested by Donoho in 1981, and verified in practice by Caiafa et al., that some of these measures remain useful for particular cases with dependent sources, but not much work has been done in this respect and a rigorous theoretical ground still lacks. In this article, it is shown that, if a specific type of pairwise dependence among sources exists, called linear conditional expectation (LCE) law, then a family of objective functions are valid for their separation. Interestingly, this particular type of dependence arises in modeling material abundances in the spectral unmixing problem of remote sensed images. In this study, a theoretical novel approach is used to analyze Shannon entropy (SE), NG measure and absolute smoments of arbitrarily order β, i.e. generic absolute moments for the separation of sources allowing them to be dependent. We provide theoretical results that show the conditions under which sources are isolated by searching for a maximum or a minimum. Also, simple and efficient algorithms based on Parzen windows estimations of probability density functions and Newton–Raphson iterations are proposed for the separation of dependent or independent sources. A set of simulation results on synthetic data and an application to the blind spectral unmixing problem are provided in order to validate our theoretical results and compare these algorithms against FastICA and a very recently proposed algorithm for dependent sources, the bounded component analysis algorithm. It is shown that, for dependent sources verifying the LCE law, the NG measure provides the best separation results.

Introduction

In signal processing, a generic problem is how to separate signals that are linearly combined in the measurements. Blind source separation (BSS) consists on the task of isolating n sources from a set of m linear mixtures:

x_{t} = A s_{t},

(1)

where $x_{t} \in R^{m}$ is a column vector containing the mixtures (observed measurements), $s_{t} \in R^{n}$ is a column vector containing the unknown source signals (sources) and $A \in R^{m \times n}$ is the unknown mixing matrix containing the mixing coefficients. The parameter t is an index that can be related to the position in time or space (pixel index) depending on the application. The model of Equation (1) is commonly referred as the noiseless instantaneous mixing model.

When sources are statistically independent and m ≥ n (overdetermined case), the problem is well posed in the sense that sources can be identified up to some unimportant indeterminacies[1]. This result allowed the development of a sort of independent component analysis (ICA) algorithms which were successfully and widely used in engineering problems[2–4]. Many criteria have been proposed in the context of ICA, for example, it is known that sources can be detected by identifying the local minima of the SE, in the space of the mixing parameters, keeping the variance constant because of a classic result from information theory: the entropy of a sum of independent variables is larger than the entropy of individual variables[1, 3, 5–7]. More generally, Comon[1] has introduced the definition of contrasts for ICA, i.e. objective functions such that their global maxima corresponds to the separation of all sources. Besides negentropy (negative SE), other contrasts have been proposed for ICA as it is the case of higher order cumulants for which fourth order cumulant is a particular case[3, 8–10], the convex perimeter for bounded sources[11], L²-distance non-Gaussianity (NG) measure[12], least absolute end-point (LAE)[13], and others. For an up to date review of existing algorithms for ICA, the reader may refer to[2].

Unlike in the ICA case, the separation of dependent sources or dependent component analysis (DCA), has not been fully studied in the past and showed more difficulties. Hyvärinen & Hoyer[14] have proposed independent subspace analysis (ISA) as an extension of ICA where components in different subspaces are assumed independent whereas components in the same subspace have dependencies. When the sources and the mixing matrix are restricted to be non-negative, the problem can be seen as a non-negative matrix factorization (NMF) problem for which many algorithms have been developed[15]. However, NMF suffers from non-uniqueness of the solutions and the separation is not granted if not additional constraints are assumed, for example, by imposing sparsity of sources[16, 17]. Bedini et al.[18] have developed algorithms for the separation of correlated sources found in astrophysical applications based on multiple-lag data covariance matrices, i.e. by exploiting the time structure of sources. In[12, 19], an algorithm called MaxNG based on the maximization of a NG measure was proposed and tested on dependent sources extracted from remote sensed images. In[20, 21], some DCA methods were tested on astrophysical sources. Cruces[11] proposed bounded component analysis (BCA) as an alternative method for BSS which relies on the bounded support of sources. In BCA, the separation is granted when the convex hull of the sources domain can be written as the cartesian product of the convex hulls of the individual source supports which is a very restrictive assumption. Recently, Eldogar[22] showed that a particular type of dependent sources generated according with a copula-t distribution are perfectly separated by BCA for a wide range of the correlation coefficient.

In all the previously mentioned DCA methods, when the independence assumption is relaxed, the success of the separation relies on alternative, and usually very restrictive conditions on sources. Moreover, it was suggested in[23], and verified in practice in[12, 19, 24], that some measures used in ICA, such as negentropy, NG and kurtosis, remain useful for particular cases with dependent sources, but not much work has been done in this respect and a rigorous theoretical ground still lacks.

In this work, we propose a unified theoretical framework to study the capability of any objective function to detect each of the sources as a local maximum (or minimum) in the space of coefficients. From our analysis, it turns out that many new objective functions can be proposed owing this property if a particular type of dependence is verified among sources. In particular, we analyze generic absolute (GA) moments, SE and NG measure as valid objective functions. We introduce simple and efficient algorithms for the separation of sources using Parzen windows estimations of pdfs and Newton–Raphson (N–R) iteration. We analyze the performance of these measures and compare them against FastICA and BCA algorithm.

This article is organized as follows: in Section 2, the theoretical aspects are introduced followed by detailed analysis of the independent sources case (Section 3) and the dependent sources case (Section 4); in Section 5, using this theory, a set of particular cases are rigorously analyzed and illustrated by simulations; in Section 6, algorithms for source separation with n = 2 sources are derived by using Parzen windows estimation of pdfs and N–R iterative method; in Section 7, several simulation experiments are presented showing the performance of the proposed algorithms and comparing them against FastICA and BCA; finally, in Section 8, the main conclusions of this work and a discussion about our results is included.

Notation and assumptions

We use capital letters to denote random variables, for example, S₁, S₂,…, S_n are the random variables associated to the sources which have a joint probability density function (pdf) denoted by $f_{S_{1} S_{2} \dots S_{n}} (s_{1}, s_{2}, \dots, s_{n})$ . Obviously, when sources are independent, the joint pdf factorizes, i.e.

f_{S_{1} S_{2} \dots S_{n}} (s_{1}, s_{2}, \dots, s_{n}) = f_{S_{1}} (s_{1}) f_{S_{2}} (s_{2}) \dots f_{S_{n}} (s_{n}),

(2)

where $f_{S_{i}} (s_{i})$ is called the marginal pdf of variable S_i. In this work, we are also interested in the case of having dependent sources where such a factorization of the joint pdf does not exist.

We also define the conditional pdf of a random variable S₁ given that S₂ = x as follows: $f_{S_{1} | S_{2}} (s_{1}, x) = f_{S_{1}, S_{2}} (S_{1}, x) / f_{S_{2}} (x)$ . Accordingly, we define the first and second order conditional expectations as follows: $E [S_{1} | S_{2} = x] = \int s_{1} f_{S_{1} | S_{2}} (s_{1}, x) d s_{1}$ and $E [S_{1}^{2} | S_{2} = x] = \int s_{1}^{2} f_{S_{1} | S_{2}} (s_{1}, x) d s_{1}$ . In the case of having only two sources, we can simplify the notation by using E[S₁|x] ≡ E[S₁|S₂ = x] and $E [S_{1}^{2} | x] \equiv E [S_{1}^{2} | S_{2} = x]$ . Since conditioned expectations are functions of x, we use the following simplified notation: $E^{'} [S_{1} | x] \equiv \frac{d}{d x} E [S_{1} | x]$ and $E^{'} [S_{1}^{2} | x] \equiv \frac{d}{d x} E [S_{1}^{2} | x]$ .

In several parts of this article, when we apply the differentiation operator under the integral sign, i.e. $\frac{d}{dτ} \int g (x, τ) d x = \int \frac{d}{dτ} g (x, τ) d x$ we will assume that the function g(x, τ) is sufficiently nicely behaved in order to allow this operation. Basically, we assume that g(x, τ) and $\frac{d}{dτ} g (x, τ)$ are continuous for x in the range of integration and there are upper bounds |g(x, τ)| ≤ A(x) and $| \frac{d}{dτ} g (x, τ) | \leq B (x)$ independent of τ such that the integrals ∫A(x)dx, ∫ B(x)dx do exist.

A motivation for this work: the blind spectral unmixing problem

Blind spectral unmixing is a specific application of BSS to the analysis of hyper-spectral remote sensed images. In this case, it is known that, at any fixed pixel, the linear mixing model of Equation (1) is valid. The vector of mixtures x represents the sensor measurements at different wavelengths, matrix A contains in its columns the spectral signatures of the endmembers (materials) existing in the covered area and vector s contains the endmember fractional abundances at the given pixel[25]. When the spectral signatures are unknown this is a blind problem and the objective is to estimate both, matrix A and the endmember fractional abundances s.

Most of the existing spectral unmixing algorithms exploit geometrical concepts by using the fact that, due to the linear mixing equation, the mixed pixels have to lie inside the convex hull of the endmembers. This convex hull forms a simplex in the spectral space, with the endmembers as spanning vertices. Some algorithms then try to look for the largest volume embedded simplex (e.g. simplex growing algorithm (SGA)[26], simplex-projection unmixing (SPU)[27]) or try to identify extreme points in the data cloud (e.g. vertex component analysis (VCA)[28]). Other recently proposed methods are based on the NMF with sparsity assumptions (e.g. S-measure constrained NMF algorithm (NMF-SMC)[29]).

As a BSS problem, sources in the blind spectral unmixing problem are clearly not independent since they are related to fractional abundances, in fact, they are constrained to sum up to one, i.e. $\sum_{i = 1}^{n} S_{i} = 1$ . Additionally, in order to avoid scale indeterminacy and remove constant values, we have to work with normalized sources S_i, i.e. $S_{i} \leftarrow (S_{i} - μ_{i}) / σ_{i}^{2}$ where μ_i and σ_i are the mean and standard deviation associated to source i. Under these conditions, the blind spectral unmixing task can be approached by solving a BSS problem with the additional constraint[19]:

\sum_{i = 1}^{n} S_{i} = 0,

(3)

with $E [S_{i}] = 0, E [S_{i}^{2}] = 1 (i = 1, 2, \dots n) .$

The dependence between any pair of normalized sources can be characterized, for example, by the conditional expectations E[ S_i|S_j] and E[ S_j|S_i] (i ≠ j). Clearly, when sources are independent, we have E[ S_i|S_j] = E S_i = 0. On the other hand, when sources are dependent the conditional expectation E[ S_i|S_j] (i ≠ j) is a function of S_j. We can try, to determine these functions in order to satisfy Equation (3). By applying the conditional expectation to this equation with respect to S_i (i = 1, 2,…, n) we obtain a system of n equations with n² − n unknown (the conditional expectations E[ S_i|S_j] ∀ i ≠ j). Thus, for n ≥ 2 there is not a unique choice of the conditional expectations. Then, we have to obtain this information from the observation of real data. In[19], it was shown that hyper-spectral data can be well modeled as having linear conditional expectations (LCEs), i.e. E[ S_i|S_j] = a S_j + b (see[19]). The following theorem provides us the values of the constants a and b.

Theorem 1

( [[19]], Theorem 3):Given a pair of dependent normalized sources S_i, S_j, if the conditional expectation E[ S_i|S_j] is linear in S_j, that is E[ S_i|S_j] = a S_j + b, then a = E[ S_iS_j] and b = 0.

In other words, in the blind spectral unmixing problem, the normalized sources can be modeled as verifying the following condition:

E [S_{i} | S_{j}] = ρ_{ij} S_{j}, with ρ_{ij} = E [S_{i} S_{j}],

(4)

which is called the LCE law.

As we demonstrate in this article, when this particular type of dependence between sources is valid, their separation from linear mixtures can be obtained by maximizing (or minimizing) different types of objective functions.

Relationship with previous works and new contributions

In[12], we have proposed the Parzen based NG measure and developed the MaxNG algorithm, a DCA method which showed to be useful to separate dependent sources extracted from images and also astrophysical dependent sources[20], outperforming classical ICA algorithms. However, in those articles the NG measure was not rigorously justified. Later, in[19], the NG measure was proposed as a solution for the blind spectral unmixing problem and a partial theoretical justification was given in terms of the LCE condition (guarantee of local extremum). In[24], we reported the results obtained by extending our method to other measures of NG such as Negentropy, and moment based measures, and applying them to synthetic and real datasets in the blind spectral unmixing problem context. However, a complete theoretical foundation was still lacking and several questions arose from those experimental results, for example: (1) why kurtosis based measure failed to separate one specific type of dependent sources? or (2) given a particular measure, how to determine if it can be used for the separation of sources by local maximization or minimization?, and (3) on which conditions on sources the separation can be granted? In the present manuscript, we present a unified theoretical framework for the study of different measures for separation of independent and dependent sources verifying the LCE condition. The main contribution of this manuscript is to fill the gaps existing among previous works and give rigorous theoretical answers to the above open questions. In particular, it is shown that the kurtosis based measure (i.e. GA moment with β = 4) has zero second order derivative for the constrained dependent sources, which makes it useless for the separation as our empirical results showed in[24]. We also provide a precise condition (see Equation (33)) that establishes whether independent sources are separated by maximizing or minimizing the corresponding GA moment μ_β(θ). On the other side, for dependent sources, it is necessary to know the second order conditional expectations, i.e. $E [S_{i}^{2} | S_{j}]$ . Another contribution of the present article is a new algorithm based on a N–R search for local extrema which has quadratic convergence, i.e. being much faster than the algorithms proposed in[12, 19, 20] which used a steepest ascend method with fixed update step. Additionally, in the present manuscript we compare our algorithms against a recently proposed DCA algorithm, namely, the BCA[22], one state-of-the-art algorithm for DCA.

Detection of sources by maximizing or minimizing objective functions

In this article, we focus only on sequential methods, also known as deflation methods, that extract normalized sources one after another by searching for local extrema of a predefined objective function. This simple idea was already used in the ICA context[2, 3] and can be introduced as follows. When the matrix A in Equation (1) is full-column rank, the sources can be expressed as a linear combination of the mixtures by premultiplying Equation (1) with a pseudo inverse matrix such as the case of the Moore–Penrose pseudoinverse A^†, i.e.

s_{τ} = A^{†} x_{τ} .

(5)

Since matrix A is unknown, a reasonable strategy could be to search in the space of coefficients, for those points which correspond to each one of the sources. In other words, if we denote by S_i the random variable associated to the source ${s_{i}}_{t}$ then we need to analyze the behavior of the mixture random variable X defined as:

X = α_{1} S_{1} + α_{2} S_{2} + \dots + α_{n} S_{n} .

(6)

We say that variable S_i is separated from the mixture when all coefficients are zero except α_i, i.e. α_i = 1 and α_j = 0 for every j ≠ i. Here we arrive at the main question we want to answer in this article: how can we discriminate between a single source compared to any linear combination of two or more sources?

We introduce some important objective functions that allow us to answer this question. Any valid candidate for an objective function should involve the pdf of the mixture f_X(x) which depends on the mixture coefficients α_i. In particular we consider the SE:

g_{SE} = - \int f_{X} (x) log (f_{X} (x)) d x,

(7)

the NG measure defined as follows[12, 19]:

g_{N G} = \int {[f_{X} (x) - ϕ (x)]}^{2} d x,

(8)

where $ϕ (x) = \frac{1}{\sqrt{(} 2 Π)} exp (- \frac{1}{2} x^{2})$ , and the GA moment of order β which is defined as follows:

μ_{β} = E [| X |^{β}] = \int | x |^{β} f_{X} (x) d x .

(9)

The SE is a well known measure already used in ICA, on the other side the GA moment was not used before for BSS except in the particular case when β = 4 which is closely related with the kurtosis[3, 9]. In fact, kurtosis is defined as $κ = μ_{4} / μ_{2}^{2}$ . In this sense, our analysis generalizes existing ICA methods providing insightful interpretations of the results. Note that, for β = 2 we obtain the second order moment of the variable which, in our case is fixed to μ₂ = 1. But we can choose any order of moment β provided that the integral in Equation (9) exists. In the following section we prove that these objective functions are valid for ICA.

Independent sources case

Let us focus on Equation (6) and analyze wether it is possible to isolate a source S_i from the rest by only looking at the statistical behavior of the mixture variable X, i.e. by studying how the pdf f_X(x;α) varies according to the mixing parameters in the vector α = (α₁, α₂, …, α_n). For the ease of the presentation we start with the analysis of the two independent sources case. Since we have to generate all possible mixtures of two independent sources maintaining the variance constant, we use the following parameterization: X = α₁(θ)S₁ + α₂(θ)S₂ with α₁(θ) = cos(θ) and α₂(θ) = sin(θ) which means that the separation is obtained at θ = kΠ / 2 (k = 0, ± 1, ± 2,…) (ignoring scaling ambiguity). We first introduce a new result characterizing the pdf of a mixture of independent random variables as a function of the mixing parameter θ, i.e. f_X(x;θ).

Lemma 1

(Two independent sources case): Given two zero-mean and unit-variance independent source variables S₁and S₂, the pdf f_X(x;θ) of the mixture variable X = α₁(θ)S₁ + α₂(θ)S₂with α₁(θ) = cos(θ) and α(θ) = sin(θ), has zero-derivative with respect to θ for every x at θ = kΠ / 2 (k = 0, ±1, ±2,…), i.e.

{\frac{\partial f_{X} (x; θ)}{∂θ}|}_{θ = kΠ / 2} = 0 .

(10)

Proof

Let us prove first the zero-derivative condition (maximum or minimum) at θ = Π / 2 (α₀ = (α₁, α₂) = (0, 1)) which corresponds to the separation of source S₂. In the neighborhood of α₀, i.e. α = α₀ + δ, we can write the pdf of the mixture as the convolution:

f_{X} (x; α) = \frac{1}{α_{2}} \int f_{S_{1}} (s_{1}) f_{S_{2}} (\frac{x - α_{1} s_{1}}{α_{2}}) d s_{1} .

(11)

By using the chain rule of derivatives we obtain

\frac{\partial f_{X} (x; α)}{∂θ} = \frac{\partial f_{X} (x; α)}{\partial α_{1}} α_{1}^{'} (θ) + \frac{\partial f_{X} (x; α)}{\partial α_{2}} α_{2}^{'} (θ) .

(12)

We compute the partial derivatives in the last equation evaluated at α₁=0 and α₂=1 and, by inserting the derivative operator inside the integral, we obtain:

{\frac{\partial f_{X} (x; α)}{\partial α_{1}}|}_{α_{1} = 0} = - f_{S_{2}}^{'} (x) \int s_{1} f_{S_{1}} (s_{1}) d s_{1} = - f_{S_{2}}^{'} (x) E [S_{1}] = 0,

(13)

{\frac{\partial f_{X} (x; α)}{\partial α_{2}}|}_{α_{2} = 1} = - f_{S_{2}} (x) - x f_{S_{2}}^{'} (x) = - {(x f_{S_{2}} (x))}^{'},

(14)

Now, taking into account that $α_{1}^{'} (Π / 2) = 1$ and $α_{2}^{'} (Π / 2) = 0$ , and using Equations (13) and (14) into Equation (12), we arrive at

{\frac{\partial f_{X} (x; α)}{∂θ}|}_{θ = Π / 2} = 0 \times 1 - {(x f_{S_{2}} (x))}^{'} \times 0 = 0,

(15)

for every x. Using a similar procedure but working in a neighbouhood of α₀=(α₁,α₂)=(1,0), i.e. by considering $f_{X} (x; α) = \frac{1}{α_{1}} \int f_{S_{2}} (s_{2}) f_{S_{1}} (\frac{x - α_{2} s_{2}}{α_{1}}) d s_{2}$ instead of Equation (11), we can prove the zero-derivative condition at θ=0, which corresponds to the separation of source S₁. Finally, it is easy to see that the zero-derivative condition also holds for any integer multiple of Π/2, because the resulting mixture becomes ±S₁or ±S₂and the same reasoning used before applies. Thus, the zero-derivative condition is valid for every x at θ=kΠ/2 (k=0,±1,±2,…) as claimed by this lemma. □

In Figure1, a graphical interpretation of this lemma is shown in terms of the shape of the pdf for a mixture of two independent sub-Gaussian variables at θ₀=Π/2 and θ=θ₀±δθ.

In the following, using this fundamental property of independent variables (Lemma 1), we can easily prove that g_SE(θ) and μ_β(θ) have local extrema at the desirable separation points, i.e. $g_{SE}^{'} (kΠ / 2) = μ_{β}^{'} (kΠ / 2) = 0$ , for k∈ $Z$ .

Theorem 2

Local extrema of SE (independent sources case): Given two zero-mean and unit-norm source variables S_i(i=1,2), the SE (g_SE(θ)) of the mixture variable X=cos(θ)S₁ + sin(θ)S₂, has local extrema at θ=kΠ/2 (k∈ $Z$ ).

Proof

As in the proof of Lemma 1, here it suffices to prove that the derivative of the SE, with respect to the parameter θ, vanishes at θ=Π/2. From the definition of SE in Equation (7), if we take its derivative with respect to θ we have:

g_{SE}^{'} (θ) = - \int \frac{d f_{X} (x; θ)}{dθ} (log (f_{X} (x; θ)) + 1) d x .

(16)

Now, using Lemma 1 we see that the derivative of the pdf is zero at θ=Π/2 for every x $(\frac{d f_{X} (x; θ)}{dθ} = 0 \forall x)$ and therefore we conclude that $g_{SE}^{'} (Π / 2) = 0$ . □

Theorem 3

Local extrema of the GA moment measure (independent sources case):

Given two zero-mean and unit-norm source variables S_i(i=1,2), the GA moment of order β, μ_β(θ) of the mixture variable X=cos(θ)S₁ + sin(θ)S₂, has local extrema at θ=kΠ/2 (k∈ $Z$ ).

Proof

We need to prove that the derivative of Equation (9), with respect to the parameter θ, is zero at θ=Π/2. If we take the derivative of this equation we obtain:

μ_{β}^{'} (θ) = \int | x |^{β} \frac{d f_{X} (x; θ)}{dθ} d x,

(17)

where, by using the Lemma 1, we see that $μ_{β}^{'} (Π / 2) = 0$ . □

It is interesting to note that, our Theorem 3 shows that local extrema are found at the desirable locations for any chosen value of parameter β.

In[19], it was shown that the NG measure defined in Equation (8) has also local extrema at the separation points. Moreover, it is clear that, by using the same line of reasoning we can prove the existence of local extrema in many other objective functionals as, for example, for the case of Renyi entropy which was already proposed and studied for ICA[6, 7].

It is important to note that, a local extremum at (α₁α₂)=(0,1) is a necessary condition to separate source S₂but it is not a sufficient condition. The existence of local extrema which do not correspond to a separation of sources, also known as mixing local extrema or spurious local extrema, was a topic of research in the ICA setting. Moreover, some theoretical results are available showing the existence of spurious local minima for the Entropy measure when sources has multimodal distributions[30, 31]. Vrins and Verleysen[32] have shown that the kurtosis-based contrast functions are more robust than the information theoretic ones when the source distributions are multimodal.

Relaxing independence: DCA

In the previous section, we have shown that Lemma 1 suffices to guarantee the validity of the SE, the GA moment and the NG measure as objective functions for ICA. We are interested now to look at the problem of separation of potentially dependent sources. Then, a natural question raises here: what kind of dependence should have the sources in order to guarantee the same behavior of the pdf as in the ICA scenario? The following result provides a necessary and sufficient condition.

Lemma 2 (n dependent sources case)

Given a set of zero-mean and unit-variance source variables S_i(i=1,2,…,n) the pdf f_X(x;α) of the mixture variable X=α₁S₁ + α₂S₂ + ⋯ + α_nS_n, constrained to the case of having unit-variance E[X²]=1, has local extrema for every x at the separation points (α_k=1 and α_i=0 for all i≠k) iff the LCE law defined in Equation (4) holds.

Proof

Here it is only necessary to prove the local extrema condition (maximum or minimum) for only one point so we arbitrarily choose the case α_n=1 and α_i=0 for all i≠n. In this case, in the neighborhood of α₀=(0,0,…,1), i.e. α=α₀ + δ, we can write the pdf of the mixture as follows:

\begin{array}{l} f_{X} (x; α) = \frac{1}{α_{n}} \int \dots \int f_{S_{1} \dots S_{n}} \\ \times (s_{1}, \dots y, \frac{x - α_{1} s_{1} - \dots y - α_{n - 1} s_{n - 1}}{α_{n}}) \\ \times d s_{1} \dots d s_{n - 1} . \end{array}

(18)

Following the Lagrangean method, the condition for the existence of a local extrema point at α=α₀ under the unit-variance constraint is as follows:

\nabla L (α_{0}) = 0,

(19)

with

L (α) = f_{X} (x; α) + λ (\sum_{i = 1}^{n} α_{i}^{2} + 2 \sum_{i < k} ρ_{ik} α_{i} α_{k} - 1),

(20)

where λ is the Lagrange multiplier and ρ_ik=E[S_iS_k] is the correlation coefficient between sources i and k.

Now, we take the derivatives of the pdf in Equation (18) with respect to the coefficients α_i with i=1,2,…,n−1 which, evaluated at α=α₀give:

\begin{array}{l} {\frac{\partial f_{X} (x; α)}{\partial α_{i}}|}_{α = α_{0}} = - \int \frac{\partial f_{S_{i} S_{n}} (s_{i}, x)}{∂x} s_{i} d s_{i} \\ = - {(E [S_{i} | S_{n} = x] f_{S_{n} (x)})}^{'} . \end{array}

(21)

Similarly, the derivative of the pdf in Equation (18) with respect to the coefficient α_n is:

{\frac{\partial f_{X} (x; α)}{\partial α_{n}}|}_{α = α_{0}} = - {(x f_{S_{n}} (x))}^{'} .

(22)

Using Equations (20), (21) and (22) in Equation (19) we obtain the following set of conditions:

\begin{array}{l} {\frac{∂L}{\partial α_{i}}|}_{α = α_{0}} = - {(E [S_{i} | S_{n} = x] f_{S_{n} (x)})}^{'} + 2 λ ρ_{in} = 0 \\ for i = 1, 2, \dots, n - 1 \end{array}

(23)

{\frac{∂L}{\partial α_{n}}|}_{α = α_{0}} = - {(x f_{S_{n}} (x))}^{'} + 2 λ = 0 .

(24)

The last equation determines the Lagrange multiplier, i.e. $λ = (1 / 2) {(x f_{S_{n}} (x))}^{'}$ and, by inserting it into (23) we arrive to the desired condition:

E [S_{i} | S_{n} = s_{n}] = ρ_{in} s_{n} .

(25)

□

It is important to note that the LCE condition is also valid for the particular case of independent sources, i.e. E[S₁|S₂=s₂]=E[S₁]=0 and ρ=0. In Figure2, some examples of sources are given indicating whether they follow or not the LCE law.

Before proceeding with our additional results, we have to solve a technical problem because, as our sources are now dependent they are allowed to be correlated, thus the parameterization X=cos(θ)S₁ + sin(θ)S₂ does not longer preserve the variance of the mixture variable X. Let us consider the general linear mixture X=α₁S₁ + α₂S₂, if we are constrained to the unit-variance case E[X²]=1, then $α_{1}^{2} + α_{2}^{2} + 2 ρ α_{1} α_{2} = 1$ , where we used ρ=E[S₁S₂] to denote the correlation coefficient between sources. Then, the following parameterization preserves the variance and uses only one parameter τ:

α_{1} (τ) = τ and α_{2} (τ) = - τρ + \sqrt{τ^{2} (ρ^{2} - 1) + 1} .

(26)

The following results can be considered as generalizations of Theorems 2 and 3 to the case of two dependent sources.

Theorem 4

Local Extrema of SE (general case): Given two zero-mean and unit-norm source variables S_i(i=1,2) following the LCE law with respect to S₂, the SE of the mixture variable X=α₁S₁ + α₂S₂constrained to the unit-variance case E[X²]=1, has a local extremum at (α₁,α₂)=(0,1).

Proof

The proof can be obtained identically to the proof of Theorem 2, taking into account the parameterization (26) and using the fact that the LCE condition implies the existence of local extrema of the pdf as stated by Lemma 2. □

Theorem 5

Local extrema of the GA moment (general case): Given two zero-mean and unit-norm source variables S_i(i=1,2) following the LCE law with respect to S₂, the GA moment of order β of the mixture variable X=α₁S₁ + α₂S₂, constrained to the unit-variance case E[X²]=1, has a local extremum at (α₁,α₂)=(0,1).

Proof

The proof can be obtained identically to the proof of Theorem 3, taking into account the parameterization (26) and using the fact that the LCE condition implies the existed of the local extrema of the pdf as stated by Lemma 2 □

Detailed analysis of SE and GA moment s

In previous sections, we proved that some objective functions applied to a unit-variance mixture of sources verifying the LCE law, have local extrema when only one of the coefficients is non-zero, which means that we can separate those sources by searching for local extrema. Nevertheless, a more detailed analysis is required in order to determine if each local extremum corresponds to a maximum or a minimum.

Here, we compute the second order derivative of the objective function with respect to τ for the special cases of the SE and GA moments of order β. As we will show, the condition of a maximum or minimum depends on the second order conditional expectation of sources and on their marginal pdfs. First we need to compute the second order derivative of the pdf with respect to the parameter τ which is as follows (its derivation is included in Appendix 1):

\begin{align} {\frac{\partial^{2} f_{X} (x; α_{1} (τ), α_{2} (τ))}{\partial τ^{2}}|}_{τ = 0} & = {(f_{S_{2}} (x) E [S_{1}^{2} | x])}^{″} + (1 - 3 ρ^{2}) \\ \times f_{S_{2}} (x) + x (1 - 5 ρ^{2}) f_{S_{2}}^{'} (x) \\ - ρ^{2} x^{2} f_{S_{2}}^{″} (x) . \end{align}

(27)

We note that the second order derivative explicitly depends on the second order conditional expectation $E [S_{1}^{2} | S_{2} = x]$ and the marginal pdf $f_{S_{2}} (x)$ .

Using this result, we are able to obtain the second order derivatives of the objective function as follows:

(1)
SE measure: to obtain the second order derivative of SE we take the derivative of Equation (16) with respect to the parameter τ arriving at:
$g_{SE}^{″} (τ) = - \int \frac{d^{2} f_{X} (x; τ)}{d τ^{2}} (log (f_{S_{2}} (x) + 1)) d x,$
(28)

and, by using Equation (27) in the last equation and taking into account that the LCE law holds, i.e. E[S₁|x]=ρx, we obtain (see its derivation in Appendix 2):
$\begin{array}{l} g_{SE}^{″} (0) = \int (E [S_{1}^{2} | x] - ρ^{2} x^{2}) \frac{{(f_{S_{2}}^{'} (x))}^{2}}{f_{S_{2}} (x)} d x - \int E^{″} [S_{1}^{2} | x] \\ \times f_{S_{2}} (x) d x + (3 ρ^{2} - 1) . \end{array}$
(29)
(2)
GA moment: To compute the second order derivative of the GA moment we need to take the derivative of Equation (17) with respect to the parameter τ reaching to:
$μ_{β}^{″} (τ) = \int | x |^{β} \frac{d^{2} f_{X} (x; τ)}{d τ^{2}} d x .$
(30)

Again, by using Equation (27) into the last equation and using the LCE law we obtain (see its derivation in Appendix 3):
$\begin{array}{l} μ_{β}^{″} (0) = β (β - 1) \int | x |^{β - 2} f_{S_{1}} (x) E [S_{1}^{2} | x] d x - βμ \\ \times (1 + ρ^{2} (β - 2)), \end{array}$
(31)

which is valid only when the integral $\int | x |^{β - 2} f_{S_{1}} (x) E [S_{1}^{2} | x] d x$ exists.

Some particular cases

In this section, we analyze selected examples to illustrate our theoretical results applied to different types of independent and dependent sources.

(1)
Independent sources: Let us consider the simplest case of having two independent sources S ₁ and S ₂. We see that the LCE law (Equation (4)) holds since ρ=0 and E S ₁|S ₂=s ₂=E S ₁=0 which means that SE, GA moment and NG measure have a local extrema at τ=0 using the parameterization of Equation (26). Additionally, we note that the second order conditional expectation is $E [S_{1}^{2} | S_{2} = s_{2}] = E [S_{1}^{2}] = 1$ and then the second order derivative of SE using Equation (29) becomes:
$g_{SE}^{″} (0) = \int \frac{{(f_{S_{2}}^{'} (x))}^{2}}{f_{S_{2}} (x)} d x - 1,$
(32)

which is always greater than zero except for the Gaussian distribution for which is equal to zero $(\int \frac{{(f_{S_{2}}^{'} (x))}^{2}}{f_{S_{2}} (x)} d x is the Fisher information)$ (see for example[33], p. 23). This confirms the fact that, at the separation point, we have a local minimum of the SE.Now, using Equation (31) we evaluate the second order derivative of the GA moment which is
$μ_{β}^{″} (0) = β [(β - 1) μ_{β - 2} - μ_{β}] .$
(33)

Let us now analyze different cases corresponding to different values of β. For example, if we consider the fourth order moment case (β=4), we obtain $μ_{4}^{′′} (0) = 4 [3 - μ_{4}]$ which means that, for sources with β₄>3 (super-Gaussian) the fourth order moment of the mixture has a minimum at τ=0. On the other hand, for sources with μ₄<3 (sub-Gaussian), a maximum of the fourth order moment of the mixture is found. More interestingly, we can evaluate any arbitrarily order β and Equation (33) will tell us if we need to search for a maximum or a minimum to attain the separation.
(2)
Uncorrelated but dependent sources: We consider here two sources S ₁ and S ₂ generated as follows: S ₁=N ₁ N ₂and S ₂=N ₂, where N ₁ and N ₂ are independent non-Gaussian random variables with E[N ₁]=E[N ₂]=0 and $E [N_{1}^{2}] = E [N_{2}^{2}] = 1$ . We see that S ₁ and S ₂ are highly dependent but are uncorrelated because $ρ = E [S_{1} S_{2}] = E [N_{1} N_{2}^{2}] = E [N_{1}] E [N_{2}^{2}] = 0$ . The first order conditional expectation is zero, i.e. E[S ₁|S ₂=s ₂]=E[N ₁]s ₂=0. We also compute the second order conditional expectation which is $E [S_{1}^{2} | S_{2} = s_{2}] = E [N_{1}^{2} N_{2}^{2} | N_{2}] = s_{2}^{2} E [N_{1}^{2}] = s_{2}^{2}$ . Then, by using Equation (29), the second order derivative of SE at τ=0 becomes:
$g_{SE}^{″} (0) = \int x^{2} \frac{{(f_{S_{2}}^{'} (x))}^{2}}{f_{S_{2}} (x)} d x - 3 .$
(34)

It is interesting to note that SE could have a maximum at τ=0 if the integral in the last equation is smaller than three as in the case of our example in Figure3d.
Figure 3
Computation of SE, NG measure and GA moments for different types of independent and dependent sources S₁and S₂. After a de-correlation step (whitening) the measures are computed using the polar parameterization y(θ)=cos(θ)X₁ + sin(θ)X₂where X₁and X₂are the whitened variables. The corresponding scatter plots are shown in the 1st row. The position of theoretical positions (in polar coordinates) are shown as red arrows. The measures were normalized in order to cover the range [0,1]. We used signals with a total number of samples T=10⁶but we used only a subset of 10,000 samples to compute SE and NG measure to avoid the extremely high computational demand. For the generation of sub-Gaussian and super-Gaussian sources we used the transformation sinh−1(x) and sinh(x) applied to a Gaussian variable x, respectively.
Full size image

Regarding the GA moment, for these sources, Equation (31) becomes:
$μ_{β}^{″} (0) = β (β - 2) μ_{β},$
(35)

and we conclude that we have a minimum at the separation point ( $μ_{β}^{′′} (0) > 0$ ) for every β>2.
(3)
A simplified model for material abundances in spectral unmixing (dependent sources): A simple model to generate a special type of sources which are dependent, correlated and constrained to have their sum constant is as follows [19]. First, we generate P>2 independent, nonnegative random variables N ₁ N ₂,…,N _P; then, we define the following random variables: $U_{i} = N_{i} / \sum_{p = 1}^{P} N_{p}$ , for i=1,2,…,P. We note that these signals meet the constraint $\sum_{i = 1}^{P} U_{i} = 1$ as in the spectral unmixing application. Now, we define our sources by normalizing two of these constrained sources, i.e.: $S_{i} = (U_{i} - \bar{U_{i}}) / σ_{U_{i}}$ , i=1,2. It is not hard to prove that these sources meet the LCE law since E S ₁ S ₂=ρ=−1/(P−1) and E S ₁|S ₂=s ₂=ρ s ₂=−1/(P−1)s ₂. Additionally, It is not difficult to prove that, for this particular type of sources we have constant GA moment of order β=4 which makes it not suitable as an objective function for this case. This behavior was already observed in [24] but not theoretical explanation was available until now. In Section 7.3, we generate data and test ICA/DCA algorithms using a more realistic model for material abundances in hyperspectral images by computing directly the material percentages per pixel in a real image.

In order to illustrate these theoretical results, in Figure3, plots for SE, the NG measure, and GA moments with several values of β, are shown for the following types of datasets using a sample size of T=10⁶ (except for SE and NG for which we used T=10⁴): (a) Independent sub-Gaussian sources, generated by applying the function sinh(u)⁻¹ to zero-mean Gaussian independent signals; (b) Independent super-Gaussian sources, generated by applying the function sinh(u) to zero-mean Gaussian independent signals; (c) Independent bimodal sources, where each of the independent sources were generated by mixing two Gaussians with (μ₁,σ₁)=(0. 5,0. 2) and (μ₂,σ₂)=(−0. 5,0. 2), respectively; (d) Dependent uncorrelated sources, generated by using ${s_{1}}_{t} = {n_{1}}_{t} {n_{2}}_{t}$ and ${s_{2}}_{t} = {n_{2}}_{t}$ where ${n_{1}}_{t}, {n_{2}}_{t}$ were generated as independent zero-mean uniform distributions; and (e) Dependent constrained sources, generated by using $s_{i} (t) = \frac{{n_{i}}_{t}}{\sum_{p = 1}^{4} {n_{p}}_{t}}$ with i=1,2, where signals ${n_{p}}_{t}$ (p=1,2,…y,4) were generated using independent uniform distributions in [0,1].

We see that for the cases (a), (b), (c) and (e), the separation of each source is attained at the minima of the SE and the maxima of the NG measure. Interestingly, sources in case (d) (dependent and correlated) shows that one of the sources is detected at one maximum of the SE and one minimum of the NG measure. It is important to note that the SE have also spurious local minima for the case of bimodal distributions (case (c)). This behavior in information theoretic measures was already analyzed in[30–32] for the independent sources case. On the other hand, in our results, we see that the NG measure and GA moments are more robust having no spurious local extrema. We also note that, for Sub-Gaussian independent sources (a), the GA moment measure have local minima at source locations, on the other side, for super-Gaussian sources, they are located at local maxima. Nevertheless, it is important to note that for large order (β=4 and β=7) one local maxima is less evident because moments of a large order are affected by outliers (see scatter plot in Figure3b). In the case (e), we observe GA moments provide a local maximum for β=3 and local minima for β=7,10, and, for β=4 the second order derivative is in theory zero and for that reason the local extrema are not clear.

Parzen windows based algorithms for source separation

Parzen windows method is a non-parametric technique used to estimate a pdf based on a set of samples[34]. Using Parzen windows we can obtain the following estimators for SE[6] and the NG measure[12]:

ĝ_{SE} (θ) = - \sum_{t_{1} = 1}^{T} log [\frac{1}{Th} \sum_{t_{2} = 1}^{T} ϕ (\frac{y_{t_{1}} (θ) - y_{t_{2}} (θ)}{h})],

(36)

\begin{array}{l} ĝ_{N G} (θ) = - \frac{2}{T \sqrt{h^{2} + 1}} \sum_{t_{1} = 1}^{T} ϕ (\frac{y_{t_{1}} (θ)}{\sqrt{h^{2} + 1}}) + \frac{1}{T^{2} h \sqrt{2}} \sum_{t_{1} = 1}^{T} \\ \sum_{t_{2} = 1}^{T} ϕ (\frac{y_{t_{1}} (θ) - y_{t_{2}} (θ)}{\sqrt{2} h}) + \frac{1}{2 \sqrt{Π}}, \end{array}

(37)

where T is the number of samples,

y_{t} (θ) = cos (θ) {x_{1}}_{t} + sin (θ) {x_{2}}_{t},

(38)

is the projected variable^asampled at time t (x₁ and x₂ are assumed uncorrelated, i.e. whitened), ϕ(·) is the kernel function (typically a Gaussian kernel) and h is a parameter which determines the size of the windows (we adopt $h = 1.06 \times T^{- \frac{1}{5}}$ as determined by the minimum mean integrated square error (MISE)[34]). From Equations (36) and (37) we see that their computational complexity is quadratic in terms of the number of available samples ( $O (T^{2})$ ).

On the other hand, for the estimation of GA moments we can use the ergodic average formula:

{\hat{μ}}_{β} (θ) = \frac{1}{T} \sum_{t = 1}^{T} | y_{t} (θ) |^{β} .

(39)

Clearly, a big advantage of GA moments over the other measures is its lower computational cost since it is linear in the number of samples, i.e. $O (T)$ .

As usual, in order to simplify the search of the maximum (or minimum), we first apply a whitening filter, i.e. $x_{t} \leftarrow T x_{t}$ after which we obtain E x x^T=I. The filter matrix is given by $T = Θ^{- \frac{1}{2}} U^{T}$ with Θ and U being the diagonal matrix of singular values and the matrix of singular vectors of the covariance matrix $C_{x x} = E [x x^{T}]$ , respectively[3, 12].

The search for a local extrema θ^∗can be done by iteratively evaluating the objective function and/or its derivatives at a current estimate θ^(k) and by generating a sequence θ⁽¹⁾,θ⁽²⁾,…y,θ^(k) that converges to θ^∗. Note that the derivatives of the measures can be easily computed from Equations (36), (37) and (39). The simplest way to generate this sequence could be to use a steepest ascend/descend method, i.e. $θ^{(k + 1)} = θ^{(k)} \pm \in g^{'} (θ^{(k)})$ . In this case the step size ∈ must be chosen in order to guarantee the convergence in few steps which is not a simple task. To avoid this problem, we consider here a simple and efficient algorithm based in the N–R iteration which, in the one dimensional case, is equivalent to the steepest ascend/descend method with an adaptive step size defined by $\in_{k} = \frac{1}{| g^{′′} (θ^{(k)}) |}$ , i.e.:

θ^{(k + 1)} = θ^{(k)} \pm \frac{g^{'} (θ^{(k)})}{| g^{″} (θ^{(k)}) |},

(40)

where g(θ) could be any of $ĝ_{SE} (θ)$ , $ĝ_{N G} (θ)$ or ${\hat{μ}}_{β} (θ)$ , and the sign “ + ” or “−” must be chosen for the case of a maximum or minimum, respectively; g^′(θ) and g^′′(θ) are the first and second order derivatives, respectively, whose formulae can be derived from Equations (36), (37) or (39), providing similar computation complexity. A great advantage of the N–R algorithm is that it is proven to converge quickly in general (quadratic convergence). A potential drawback of the N–R method is that a close to zero second order derivative can make the method diverge. Anyway, our simulations showed always very fast convergence suggesting that the zero second order derivative condition is not likely to occur in general.^b

Algorithm 1: DCA algorithm (two-sources case)

Require: mixtures x_t(t=1,2,…,T) (centered), tolerance tol, max. # of Iterations K_max, attempts N_att.Ensure: estimated sources ${\hat{s_{1}}}_{t}$ and ${\hat{s_{2}}}_{t}$ .

1:
$C_{xx} = \frac{1}{T} ?_{t = 1}^{T} x_{t} x_{t}^{T}$ ; Covariance matrix.
2:
$U T V^{T} = C_{xx}$ ; Singular Value Decomposition SVD.
3:
$x_{t} = T^{- 1 / 2} U^{T} x_{t}$ , (t=1,2,..y,T); Whitening.
4:
Search for first extremum
5:
?⁽⁰⁾=2?u; Initialization: u is a random number uniformly distributed in[0,1].
6:
$d_{?} = + 8$ , k=0;
7:
while d_?>tol and k<K_maxdo
8:
$?^{(k + 1)} = ?^{(k)} \pm \frac{g^{'} (?^{(k)})}{| g^{''} (?^{(k)}) |}$ ; N--R iteration^c.
9:
d_?=|?^{(k + 1)}-?^(k)|;
10:
k=k + 1;
11:
endwhile
12:
?₁=?^(k-1); First local extremum found
13:
Search for second extremum
14:
?⁽⁰⁾=?₁ + ?/2; Initialization
15:
Repeat STEPs 6-11;
16:
n=1;^d
17:
while|?^(k-1)-?₁|<tol and n<N_attdo
18:
?⁽⁰⁾=2?u; Initialization: u is a random number uniformly distributed in[0,1].
19:
Repeat STEPs 6-11;
20:
n=n + 1;
21:
endwhile
22:
?₂=?^(k-1); Second local extremum found
23:
return ${\hat{s_{i}}}_{t} = cos (?_{i}) {x_{1}}_{t} + sin (?_{i}) {x_{2}}_{t}$ (i=1,2);

In Algorithm 1, the algorithm for the case n=m=2 (two mixtures and two sources) is shown. In this case, after the first local extremum is found, the algorithm searches for the second local extrema starting from an initial guess θ⁽⁰⁾=θ₁ + Π/2 which, in the case of having independent sources, would correspond exactly to the location of the second source (orthogonal case). It is noted that, in the general dependent sources case, it is possible that this procedure results in finding the same local extremum again. In order to avoid this situation, the algorithm re-start the local extrema search by using different random initial guesses until the proper local extremum is found. The maximum number of attempts N_att is a parameter which was set to N_att=20 in our simulations.

It is important to highlight that, if we generalize Algorithm 1 to the case of arbitrary number of sources and m=n>2, we may apply a deflation step by eliminating every local extrema after they are detected preventing from multiple detections. However, this deflation step is not trivial in the dependent case since the sources are not orthogonal and the classical deflation technique used in ICA is not longer valid. For the particular case of the NG measure, in[12] a special deflation step was developed by transforming the data in order to make it Gaussian at the location of any detected source.

We highlight that computing the derivatives of the SE based on Parzen windows produces numerically unstable results because $\frac{d}{dθ} log (f (x, θ)) = \frac{1}{f (x, θ)} \frac{df (x, θ)}{dθ}$ , thus, the errors in the estimation of the pdf are amplified in the derivative. On the other hand, the estimation of the derivatives for GA moments and NG measure do not suffer this problem and showed to be numerically stable in our simulations.

Source separation experiments

Separation performance evaluation on different datasets

In this section, we show the results of applying our N–R algorithm based on GA moments (order β=0. 5,1,1. 5,2. 5,…,10) and NG measure (MaxNG) compared with FastICA^e(with g(x)=x³and g(x)=tanh(x) nonlinearities) and the BCA algorithm recently proposed in[22]. FastICA is a classic, very fast algorithm developed for ICA, on the other side, BCA algorithm is a powerful geometric method for ICA/DCA based on the idea that the mixture of bounded sources increases the volume of the support of random variables. BCA obtains the separation by minimizing the volume of the support of estimated sources by assuming that the support of the sources is equal to the cartesian product of the individual supports[11]. The last condition is valid for independent sources and can be seen as a strong condition for dependent sources, for instance, sources found in the blind spectral unmixing do not meet this condition as Figure2 illustrates.

In Figure4, we present the performance results in terms of the obtained signal to interference ratio (SIR) which is defined as ${SIR}_{i} = - 10 \underset{10}{log} (\frac{1}{T} \sum_{t = 1}^{T} {({\hat{s_{i}}}_{t} - {s_{i}}_{t})}^{2})$ . We used the following datasets: (a) Independent Sub-Gaussian sources, generated by applying the function sinh(u)⁻¹ to zero-mean Gaussian independent signals; (b) Independent Super-Gaussian sources, generated by applying the function sinh(u) to zero-mean Gaussian independent signals; (c) Independent bimodal sources, where each of the independent sources were generated by mixing two Gaussians with (μ₁,σ₁)=(0. 5,0. 2) and (μ₂,σ₂)=(−0. 5,0. 2), respectively; (d) Independent and uniformly distributed zero-mean sources; (e) Dependent constrained sources, generated by using ${s_{i}}_{t} = \frac{{n_{i}}_{t}}{\sum_{p = 1}^{4} {n_{p}}_{t}}$ with i=1,2, where signals ${n_{p}}_{t}$ (p=1,2,…,4) were generated as independent uniform distributions in [0,1]. (f) Dependent sources with Copula-t distributions, where ${s_{1}}_{t}$ and ${s_{2}}_{t}$ were generated from a Copula-t with 4 degrees of freedom and with linear correlation ρ=0. 8 which makes them highly dependent.^fWe observe that, for the case of Sub-Gaussian independent sources (a), GA moments with β=3,4,…,10 give a similar performance as FastICA and MaxNG. For the case (b) (Super-Gaussian independent sources), the performance of GA moments is slighter less than FastICA and MaxNG. For bimodal independent sources (c) and uniformly distributed independent sources (d), the performance of GA moment is similar to FastICA and MaxNG for values β=1. 0,1. 5,2. 5,…,6. 5. For constrained dependent sources (e), the best performance is obtained for β=6. 0,6. 5,…,10 and MaxNG with a SIR of approximately 40 dB. It is noted that the LCE condition holds exactly, thus the separation is almost perfect by using NG measure. On the other hand, in case (f) sources modelled with Copula-t distribution with correlation ρ=0. 8 where the LCE condition holds only approximately as the Figure2c illustrated, for this reason, the quality of separation by using the NG measure is degraded (SIR of approximately 20 dB) and BCA outperforms all the other methods because sources fulfil the BCA conditions. It is important to mention that dataset (e) does not fulfil the assumptions for FastICA (independence) neither for BCA (support of sources is not equal to the cartesian product of individual supports). It is also interesting to note that for β=4, the performance drops because the second order derivative is zero (not a maximum neither a minimum). It is clear that, thse lower performance of BCA for cases (a), (b), (e) and (d) can be attributed to the fact that these sources do not fulfil the conditions for BCA, i.e. or they have not bounded support or the support of sources can not be written as the cartesian product of individual supports.

Robustness to the sample size T

We have theoretically proved that several objective functions are valid to separate sources verifying the LCE condition. Nevertheless, in practice, the GA moments and the NG measure are estimated from available samples which implies that the measures are sensible to the size of the dataset T. In Figure5, the robustness of the measures is shown by evaluating the mean SIR of the separation versus the sample size T. In the small dataset size case, the errors on the estimation of the moments and their derivatives can be significant, on the other side, the NG measure showed to be significantly more robust.

Blind spectral unmixing example

In order to have a realistic set of sources for testing our method in the context of the blind spectral unmixing problem, we used a set of material abundances generated as follows. Based on a real ground-truth (see Figure6 (left)) of a selected area of Rome city, we assign a source to each one of the classes. For the estimation of each source (abundance) we divide the map in 8×8 pixel subareas and we calculated the material abundances as the percentages of the classes within each subarea. As a result we obtained nine sources with a total of T=2814 (67×42) samples each (in Figure7a scatter plots for some examples of pair of sources are shown). In Figure7b, the performance results are shown for MaxNG, FastICA and BCA algorithms applied to different combinations of two sources and using randomly selected mixing matrices over a total of 50 simulations. We note that the results with GA moments are not included because their performance was poor (similar to FastICA). We think this is because the sample size is too small (T=2814) and the distributions are very irregular. On the other hand MaxNG showed the best performance. BCA and FastICA has lower than MaxNG because sources does not fulfill the conditions required by the algorithms i.e. they are not independent and their support can not be written as the cartesian product of individual supports.

Conclusions and discussion

This article contributes to shed light on the theoretical aspects of the separation of independent and dependent sources based on the maximization (or minimization) of objective functions by filling the gaps existing among previous works and giving rigorous theoretical answers to important questions. Furthermore, this new theoretical framework opens the possibility to analyze new objective functions for BSS problems. We have shown that, under the LCE assumption, several objective functions such as GA moments, NG measure and SE are valid for the separation of dependent sources. However, among these measures, we showed that GA moments are less robust to the sample size T than the NG measure but has much lower computational complexity. We have also shown that simple and efficient algorithms can be developed based on these measures by using Parzen windows technique combined with a N–R iterative search of local extrema. Nevertheless, it was noted that estimations of derivatives of the SE, based on Parzen windows, becomes numerically unstable.

Another disadvantage of the GA moments is that additional information about the sources is needed in order to determine if the separation is obtained at a maximum or a minimum. When sources are independent, we can determine the sign of the second order derivative by just evaluating Equation (33) which can be done quickly and easily from data. On the other side, for dependent sources, it is necessary to know the second order conditional expectations, i.e. $E [S_{i}^{2} | S_{j}]$ . Additionally, it is needed to chose the proper order β which could be not simple and it is out of scope of this article. On the other hand, the NG measure does not require any extra parameter, it is very robust to the sample size T and usually the separation is obtained at local maxima (except in pathological cases as shown in our example in Figure ??).

As a main conclusion, we have found that the separation of dependent sources is possible but additional constraints, or assumptions, on the type of dependence among sources must be taken into account. For example, if we know that the support of sources can be written as the cartesian product of the individual supports, then an elegant and very efficient method is to apply the BCA algorithm, or if sources have LCE, as in the case of abundances in the blind spectral unmixing application, then the methods presented in this article are the most appropriate.

Appendix 1

Applying the differentiator operator under the integral sign in Equation (18) for the case of n=2 sources, we to obtain the partial derivatives of the pdf evaluated at (α₁,α₂)=(0,1) as follows:

{\frac{\partial f_{X} (x)}{\partial α_{1}}|}_{α = α_{0}} = - {(f_{S_{2}} (x) E [S_{1} | x])}^{'},

(41)

{\frac{\partial^{2} f_{X} (x)}{\partial α_{1}^{2}}|}_{α = α_{0}} = {(f_{S_{2}} (x) E [S_{1}^{2} | x])}^{″},

(42)

{\frac{\partial f_{X} (x)}{\partial α_{2}}|}_{α = α_{0}} = - {(x f_{S_{2}} (x))}^{'},

(43)

{\frac{\partial^{2} f_{X} (x)}{\partial α_{2}^{2}}|}_{α = α_{0}} = 2 f_{S_{2}} (x) + 4 x f_{S_{2}}^{'} (x) + x^{2} f_{S_{2}} (x),

(44)

{\frac{\partial^{2} f_{X} (x)}{\partial α_{1} α_{2}}|}_{α = α_{0}} = 2 {(f_{S_{2}} (x) E [S_{1} | x])}^{'} + x {(f_{S_{2}} (x) E [S_{1} | x])}^{″},

(45)

Using the chain rule of derivatives we have that

\begin{array}{l} \frac{d^{2} f_{X} (x; τ)}{d τ^{2}} = \frac{\partial^{2} f}{\partial α_{1}^{2}} {(α_{1}^{'} (τ))}^{2} + 2 \frac{\partial^{2} f}{\partial α_{1} \partial α_{2}} α_{1}^{'} (τ) α_{2}^{'} (τ) + \frac{\partial^{2} f}{\partial α_{2}^{2}} \\ \times {(α_{2}^{'} (τ))}^{2} + \frac{∂f}{\partial α_{1}} α_{1}^{″} + \frac{∂f}{\partial α_{2}} α_{2}^{″} . \end{array}

(46)

And, using the fact that

α_{1}^{'} (0) = 1, α_{1}^{″} (0) = 0, α_{2}^{'} (0) = - ρ, α_{2}^{″} (0) = ρ^{2} - 1;

(47)

we obtain the desired result of Equation (27).

Appendix 2

The second order derivative of the SE at τ=0 is:

g_{SE}^{″} (0) = \int \frac{d^{2} f_{X} (x; 0)}{d τ^{2}} (log (f_{X} (x; τ)) + 1) d x .

(48)

In the following, in order to simplify the notation we replace f(x)≡f_{S 2}(x) and $g_{SE} \equiv - \int f (x) log (f (x)) d x$ .

Now, by using Equation (27) into (48) and, taking into account the following results:

\begin{array}{l} \int {(f (x) E [S_{1}^{2} | x])}^{'} \frac{f^{'} (x)}{f (x)} d x = - \int E [S_{1}^{2} | x] \frac{{(f^{'} (x))}^{2}}{f (x)} d x \\ + \int E^{″} [S_{1}^{2} | x] f (x) d x, \\ \int x f^{'} (x) (log (f (x)) + 1) d x = g_{SE}, \\ \int x^{2} f^{″} (x) (log (f (x)) + 1) d x = - 2 g_{SE} - \int x^{2} \frac{{(f^{'} (x))}^{2}}{f (x)} d x, \end{array}

we finally arrive at the desire result of Equation (29).

Appendix 3

By using Equation (27) into (30) and, taking into account the following results:

\begin{array}{l} \int {(f (x) E [S_{1}^{2} | x])}^{″} | x |^{β} d x = β (β - 1) \int f (x) E [S_{1}^{2} | x] | x |^{β - 2} d x, \\ \int x | x |^{β} f^{'} (x) d x = - (β + 1) μ_{β}, \\ \int x^{2} | x |^{β} f^{″} (x) d x = (β + 2) (β + 1) μ_{β}, \end{array}

we finally arrive at the desire result of Equation (31).

Endnotes

^aFor ease of the presentation, we consider here only the case of two sources which correspond to have only one parameter θ. For the case of n>2 a hyper-spheric coordinate system can be used as shown in[12].^bIn order to solve the problem of possible zero second order derivatives, more sophisticated methods well known in the literature can be implemented as, for example, by using the Conjugated Gradient method.^cg^′(.)and g^′′(.) are the first and second order derivatives of a selected measure and can be computed by taking derivatives on Equations (36), (37) or (39) for the case of SE, NG or GA moment, respectively. Sign ‘+’ and sign ‘-’ correspond to maximum or minimum search, respectively.^dIf the same local extremum is found then a new search starts (up to N_attattempts).^eFastICA package was downloaded from the author^′s webpage http://research.ics.tkk.fi/ica/fastica/.^fWe used the Matlab command s=copularnd(‘t’,0.8,4,T).

References

Comon P: Independent component analysis, a new concept. Signal Process 1994, 36(3):287-314. 10.1016/0165-1684(94)90029-9
Article MATH Google Scholar
Comon P, Jutten C: Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, Oxford Burlington; 2010.
Google Scholar
Hyvärinen A, Karhunen J, Oja E: Independent Component Analysis. Wiley, New York; 2001.
Book Google Scholar
Cichocki A, Amari SI: Adaptive Blind Signal and Image Processing: Learning Algorithms and Applications. Wiley, Chichester; 2002.
Book Google Scholar
Cruces S, Cichocki A, Amari S: The minimum entropy and cumulants based contrast functions for blind source extraction. In 6th International Work-Conference on Artificial and Natural Neural Networks: Bio-inspired Applications of Connectionism-Part II. (Granada; 2001):786-793.
Google Scholar
Erdogmus D, Hild I, Kenneth E, Principe J: Blind source separation using Renyi’s [alpha]-marginal entropies. Neurocomputing 2002, 49(1–4):25-38.
Article MATH Google Scholar
Pham D, Vrins F, Verleysen M: On the risk of using Renyi’s entropy for blind source separation. IEEE Trans. Signal Process 2008, 56(10):4611-4620.
Article MathSciNet Google Scholar
Cardoso J: High-order contrasts for independent component analysis. Neural Comput 1999, 11: 157-192. 10.1162/089976699300016863
Article Google Scholar
Hyvärinen A: Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Netw 2002, 10(3):626-634.
Article Google Scholar
Zarzoso V, Phlypo R, Comon P: A contrast for independent component analysis with priors on the source kurtosis signs. IEEE Signal Process. Lett 2008, 15: 501-504.
Article Google Scholar
Cruces S: Bounded component analysis of linear mixtures: a criterion of minimum convex perimeter. IEEE Trans. Signal Process 2010, 58(4):2141-2154.
Article MathSciNet Google Scholar
Caiafa C, Proto A: Separation of statistically dependent sources using an L2-distance non-Gaussianity measure Signal Process. 2006, 86(11):3404-3420.
MATH Google Scholar
Lee J, Vrins F, Verleysen M: Blind source separation based on endpoint estimation with application to the MLSP 2006 data competition. Neurocomputing 2008, 72(1–3):47-56.
Article Google Scholar
Hyvärinen A, Hoyer P: Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 2000, 12(7):1705-1720. 10.1162/089976600300015312
Article Google Scholar
Lee DD, Seung HS: Learning the parts of objects by nonnegative matrix factorization. Nature 1999, 401: 788-791. 10.1038/44565
Article Google Scholar
Theis F, Stadlthanner K, Tanaka T: First results on uniqueness of sparse non-negative matrix factorization. IEEE Trans. Image Process 2005, 15: 81-88.
Google Scholar
Caiafa C, Cichocki A: Estimation of sparse nonnegative sources from noisy overcomplete mixtures using MAP. Neural Comput 2009, 21(12):3487-3518. 10.1162/neco.2009.08-08-846
Article MathSciNet MATH Google Scholar
Bedini L, Herranz D, Salerno E, Baccigalupi C, Kuruoǧlu E: A Tonazzini, Separation of correlated astrophysical sources using multiple-lag data covariance matrices. EURASIP J. Appl. Signal Process 2005, 2005: 2400-2412. 10.1155/ASP.2005.2400
Article MATH Google Scholar
Caiafa C, Salerno E, Proto A, Fiumi L: Blind spectral unmixing by local maximization of non-Gaussianity. Signal Process 2008, 88: 50-68. 10.1016/j.sigpro.2007.07.011
Article MATH Google Scholar
Caiafa C, Kuruoglu E: A minimax entropy method for blind separation of dependent components in astrophysical images. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering(AIP Conference Proceedings). (Paris; 2006):81-88.
Google Scholar
Kuruoglu E: Dependent component analysis for cosmology: a case study. In Latent Variable Analysis and Signal Separation. St. Malo; 2010):538-545.
Chapter Google Scholar
Erdogan A: A family of bounded component analysis algorithms. In IEEE International Conference on Acoustics, Speech and Signal Processing, 2012, ICASSP 2012. (Kyoto; 2012):1-4.
Google Scholar
Donoho DL: On minimum entropy deconvolution. Appl. Time Ser. Anal 1981, 2: 564-608.
Google Scholar
Caiafa C, Salerno E, Proto A: Blind source separation applied to spectral unmixing: comparing different measures of nongaussianity. In Knowledge-Based Intelligent Information and Engineering Systems. (Vietri Sul Mare; 2010):1-8.
Google Scholar
Keshava N, Mustard JF: Spectral unmixing. IEEE Signal Process. Mag 2002, 19: 44-57. 10.1109/79.974727
Article Google Scholar
Chang CI, Wu CC, Liu W, Ouyang YC: A new growing method for simplex-based endmember extraction algorithm. IEEE Trans. Geosci. Remote Sens 2006, 44(10):2804-2819.
Article Google Scholar
Heylen R, Burazerovic D, Scheunders P: Fully constrained least squares spectral unmixing by simplex projection. IEEE Trans. Geosci. Remote Sens 2011, 49(11):4112-4122.
Article Google Scholar
Nascimento JMP, Dias JMB: Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE Trans. Geosci. Remote Sens 2005, 43(4):898-910.
Article Google Scholar
Yang Z, Zhou G, Xie S, Ding S, Yang JM, Zhang J: Blind spectral unmixing based on sparse nonnegative matrix factorization. IEEE Trans. Image Process 2011, 20(4):1112-1125.
Article MathSciNet Google Scholar
Vrins F, Pham D, Verleysen M: Mixing and non-mixing local minima of the entropy contrast for blind source separation. IEEE Trans. Inf. Theory 2007, 53(3):1030-1042.
Article MathSciNet Google Scholar
Pham DT, Vrins F: Local minima of information-theoretic criteria in blind source separation. IEEE Signal Process. Lett 2005, 12(11):788-791.
Article Google Scholar
Vrins F, Verleysen M: Information theoretic versus cumulant-based contrasts for multimodal source separation. IEEE Signal Process. Lett. 10 2005, 12(3):190-193.
Article Google Scholar
Johnson O: Information Theory and Central Limit Theorem. (Imperial College Press, River Edge London; 2004.
Book MATH Google Scholar
Silverman BW: Density Estimation for Statistics and Data Analysis. Chapman &amp Hall/Crc, Boca Raton; 1985.
Google Scholar

Download references

Acknowledgements

We thank Dr. Alper Erdogan for providing his Matlab code with the implementation of the BCA used in his recent article[22]. We are also grateful to Dr. Ercan Kuruoğlu from ISTI, Consiglio Nazionale delle Richerche (CNR), Pisa, Italy, for his useful comments and discussions on a seminal technical report on which this work was based. We also thank to anonymous reviewers for their useful comments. This work was developed under the scope of the CONICET project PIP 2012-2014, number 11420110100021.

Author information

Authors and Affiliations

Instituto Argentino de Radioastronomía (CCT La Plata, CONICET), C.C.5, 1894, Villa Elisa, Buenos Aires, Argentina
Cesar F Caiafa
Facultad de Ingeniería, Universidad de Buenos Aires, Paseo Colón 850, Buenos Aires, C1063ACV, Argentina
Cesar F Caiafa

Authors

Cesar F Caiafa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cesar F Caiafa.

Additional information

Competing interests

The author declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Caiafa, C.F. On the conditions for valid objective functions in blind separation of independent and dependent sources. EURASIP J. Adv. Signal Process. 2012, 255 (2012). https://doi.org/10.1186/1687-6180-2012-255

Download citation

Received: 26 April 2012
Accepted: 15 October 2012
Published: 11 December 2012
DOI: https://doi.org/10.1186/1687-6180-2012-255

On the conditions for valid objective functions in blind separation of independent and dependent sources

Abstract

Introduction

Notation and assumptions

A motivation for this work: the blind spectral unmixing problem

Theorem 1

Relationship with previous works and new contributions

Detection of sources by maximizing or minimizing objective functions

Independent sources case

Lemma 1

Proof

Theorem 2

Proof

Theorem 3

Proof

Relaxing independence: DCA

Lemma 2 (n dependent sources case)

Proof

Theorem 4

Proof

Theorem 5

Proof

Detailed analysis of SE and GA moment s

Some particular cases

Parzen windows based algorithms for source separation

Algorithm 1: DCA algorithm (two-sources case)

Source separation experiments

Separation performance evaluation on different datasets

Robustness to the sample size T

Blind spectral unmixing example

Conclusions and discussion

Appendix 1

Appendix 2

Appendix 3

Endnotes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords