Contribution of statistical tests to sparseness-based blind source separation

We address the problem of blind source separation in the underdetermined mixture case. Two statistical tests are proposed to reduce the number of empirical parameters involved in standard sparseness-based underdetermined blind source separation (UBSS) methods. The first test performs multisource selection of the suitable time–frequency points for source recovery and is full automatic. The second one is dedicated to autosource selection for mixing matrix estimation and requires fixing two parameters only, regardless of the instrumented SNRs. We experimentally show that the use of these tests incurs no performance loss and even improves the performance of standard weak-sparseness UBSS approaches.


I. INTRODUCTION
Source separation is aimed at reconstructing multiple sources from multiple observations (mixtures) captured by an array of sensors.In what follows, we assume these sensors to be linear, which is acceptable in many applications.The problem is said to be blind when the observations are linearly mixed by the transfer medium and no prior knowledge on the transfer medium or the sources is available.Blind source separation (BSS) is an important research topic in a variety of fields, including radar processing [1], medical imaging [2], communication [3], [4], speech and audio processing [5].BSS problems can be classified according to the nature of the mixing process (instantaneous, convolutive) and the ratio between the number of sources and the number of sensors of the problem (underdetermined, overdetermined).If the sources are assumed to be statistically independent, solutions to the BSS problem are calculated so as to optimize separation criteria based on higher order statistics [6], [7].Otherwise, when the sources have temporal coherency [8], are nonstationary [9], or possibly cyclostationary [10], the separation criteria to optimize are based on second-order statistics.
Although BSS algorithms exist in great profusion, the underdetermined case (UBSS for underdetermined blind source separation), where the number of sensors is smaller than the number of sources, is less addressed than the overdetermined case, where the number of sensors is greater than or equal to the number of sources.Therefore, the UBSS problem is still challenging.
In the UBSS case, one way to deal with the lack of information is to use an Expectation-Maximizationbased method [11] to obtain a maximum likelihood estimation of the mixing matrix and sources.However, such an approach requires prior knowledge of the source distributions.In contrast, sparseness-based methods solve the UBSS problem [12]- [20] without prior knowledge on the source distribution, by exploiting the sparseness of the non-stationary sources in the time-frequency domain.Roughly speaking, sparseness-based approaches [21] involve transforming the mixtures into an appropriate representation domain.The transformed sources are then estimated thanks to their sparseness and, finally, the sources are reconstructed by inverse transform.A source is said to be sparse in a given signal representation domain if most of its coefficients, in this domain, are (almost) zero and only a few of them are big.
In the instantaneous mixture case, where each observation consists of a sum of sources with different signal intensity in presence of noise, the sparseness-based methods introduced in [12]- [17], among others, rely on parameters that are chosen empirically.The general question addressed in this paper is then to what extent this empirical parameter choice can be by-passed thanks to statistical methods, specifically designed to cope with sparse representations.This question is particularly relevant because a whole family of sparseness-based UBSS algorithms relies on assumptions very similar to those employed in theoretical frameworks dedicated to the detection and estimation of sparse signals.Our contribution to this question is then the following.
The UBSS algorithms proposed in [12]- [17] estimate the unknown mixing matrix by assuming the presence of only one single source at each time-frequency point.In practice, a selection of time-frequency points that probably pertain to one single source is expected to improve performance of the mixing matrix estimation.The mixing matrix estimate is then used to recover the source signals.Rejecting timefrequency points of noise alone and, thus, selecting and processing the time-frequency points where the possibly multiple sources are present only, should also improve the overall performance of the methods.Our contribution is then to perform the selection processes mentioned in the foregoing, by hal-00739565, version 1 -8 Oct 2012 considering them as statistical decision problems and reducing the number of empirical parameters for better robustness.Sparseness hypotheses are then particularly suitable for detecting the time-frequency points needed by the separation procedure, whereas such hypotheses are useless for selecting the timefrequency points used by the mixing matrix estimation.
More specifically, Section II recalls the source recovery and mixing matrix estimation steps in classical UBSS methods based on sparseness assumptions.By so proceeding, we highlight the empirical parameters required by these steps.Then, Section III is the main core of the paper because it introduces the statistical tests for the selection of the time-frequency points needed by source recovery and mixing matrix estimation.For source recovery, the selection of the time-frequency points relies on a weak notion of sparseness, exploited through an estimate-and-plug-in detector: We begin by estimating the noise standard deviation via the d-Dimensional Amplitude Trimmed Estimator (DATE), recently introduced in [22], especially designed for coping with noisy representations of weakly-sparse signals; then, the noise standard deviation estimate is used instead of the unknown true value in the expression of a statistical test, specifically designed for noisy representations of weakly-sparse signals as well.For the mixing matrix estimation, the physics of the signal suggest introducing a novel strategy.Indeed, the problem is to select time-frequency points whose energy is big enough in noise to consider that they pertain to one single source.We thus introduce a tolerance above which the energy of these relevant points must be regardless of noise.A statistical test involving this tolerance and based on Signal Norm Testing (SNT) recently introduced in [23] is then used to select these points in presence of noise.
Summarizing, we thus extend significantly [24], by introducing three new features of importance.
First, we replace the Modified Complex Essential Supremum Estimate (MC-ESE) of the noise standard deviation by the DATE, which is as accurate, relies on an even stronger theoretical background and has a computational cost significantly lower.Second, the selection of the time-frequency points of interest for source recovery is performed by using a thresholding test, as in [24], but the value of the detection threshold is determined automatically on the basis of the results provided in [25] for the detection of signals satisfying the weak-sparseness model in noise.Third, the mixing matrix estimation is carried out by taking the physical nature of the signals into account.
In Section IV, we apply the statistical tests of Section III to several standard UBSS methods [15], [16], [18], [26], [27] in the instantaneous mixture case.We thus show that our statistical algorithms reduce the number of empirical parameters and improve the overall performance of the UBSS methods under consideration.For instance, by using these statistical algorithms, the subspace-based method presented in [15] can be significantly automatized so as to involve two parameters only.These two parameters are hal-00739565, version 1 -8 Oct 2012 adjusted once for all possible SNRs, in contrast to standard UBSS methods.
In Section V, these results are discussed.In particular, the convolutive mixture case is addressed for its importance in practice.Some perspectives of this work are then presented in the concluding Section VI.

A. Principles
We consider the instantaneous mixing system: where t ranges in some finite set of sampling times such that, for every t in this set of sampling times, Gaussian processes, mutually decorrelated and independent of the sources.In the sequel, we address the underdetermined case where N > M .Without loss of generality, we assume that the column vectors of A have all unit norm, i.e., a i = 1 for all i ∈ {1, 2, . . ., N }.One well-known time-frequency representation and most used in practice is the short-time discrete Fourier transform (STFT).The mixing process can be modeled in the time-frequency domain via the STFT as: where S x (t, f ), S s (t, f ) and S n (t, f )) are the vectors of the STFT coefficients at time-frequency bin (t, f ) of the mixtures, the sources and noise, respectively.
Given x(t), our purpose is to recover s(t) or equivalently S s (t, f ).As formalized in [28], the UBSS problem is generally decomposed in two separate subproblems.First, in the so called mixing matrix estimation, the normalized columns (a i ) 1≤i≤N are estimated so as to obtain an estimate of A. Then, hal-00739565, version 1 -8 Oct 2012 on the basis of this estimate, the second step called signal recovery, provides a solution to equation (2).We now detail the mixing matrix estimation and the source recovery based on sparseness assumptions.

B. Mixing matrix estimation
The UBSS methods based on sparse signal representations in the time-frequency domain share the following main assumption: Assumption 1 For each source, there exists a set of time-frequency points where this source exists alone.
The elements of this set can be assumed to be isolated time-frequency points as in DUET (Degenerate Unmixing Estimation Technique) [26] and [15] or to form a time-frequency box as in TIFROM (TIme-Frequency Ratio Of Mixtures) [16] and TIFCORR (TIme-Frequency CORRelation) [27].Assumption 1 is often reasonable thanks to the sparseness of the time-frequency representation of the sources, especially when this number of sources is moderate.
As mentioned above, the first step in UBSS methods is to estimate the mixing matrix A to achieve source recovery.In most two-step source separation algorithms [12], [13], [15]- [18] an autosource selection is performed.By autosource selection, it is meant the detection of regions where only one source occurs.The methods for estimating A on the basis of assumption 1 can then be summarized as follows.
Jourjine et al. [26] present the DUET method, which is restricted to two mixtures (M = 2).They address the anechoic case, where source transmission attenuations and delays between sensors are taken into account.The columns of the mixing matrix are estimated by finding picks in a 2D histogram of amplitude-delay estimates.
In [16], the mixing matrix estimation of the TIFROM method is based on the complex ratios where, given m ∈ {1, 2, . . ., M }, S xm (t, f ) stands for the m th coordinate of S x (t, f ).These ratios are computed for each time-frequency point and for two arbitrarily chosen indices j and k in {1, 2, . . ., M }.
A first limitation of this method is to assume non-null matrix coefficients.A second limitation is the use of an empirical threshold to select the smallest empirical variances of these ratios.
In TIFCORR [27], the mixing matrix estimation is similar by selecting the empirical covariance coefficients above a certain threshold chosen manually.
The subspace-based UBSS (SUBSS) method [15] relies on another type of mixing matrix estimation.
Let Ω k stand for the set of all the time-frequency points (t, f ) where the k th source is present and Ω stand for the union of all these sets Ω k for k = 1, 2, . . ., N .According to assumption 1, the sets Ω k are non-empty and so is Ω.For (t, f ) ∈ Ω k , (2) reduces to According to this result, the mixing matrix can be estimated as follows.First, all the spatial direction vectors d(t, f ) = Sx(t,f ) Sx(t,f ) , with (t, f ) ∈ Ω, are clustered by using an unsupervised clustering algorithm and taking into account that the number of sources is supposed to be known.Since (3) shows that for all the time-frequency points (t, f ) of Ω k , the STFT vectors S x (t, f ) have same spatial direction a k , the column vectors of the mixing matrix A are then estimated as the centroids of the N classes returned by the clustering algorithm.In [15], the authors propose the use of the k-means algorithm but other techniques could be employed.The set Ω required for the clustering procedure is determined by comparing the ratio S x (t, f ) /max ξ S x (t, ξ) to a threshold height, whose value is chosen empirically.

C. Source recovery
This section presents a number of techniques used in the source recovery stage of two-step UBSS algorithms.In the underdetermined case, the system (2) has less equations than unknowns, and thus it has (in general) infinitely many solutions.In order to recover the original sources, additional assumptions are needed.
The DUET method [26] assumes the sources to be (approximately) W-disjoint orthogonal in the timefrequency domain, that is, the supports of the STFTs of any two sources present in the observations are disjoints.The source recovery is performed by partitioning the time-frequency plane using the mixing parameter estimates.This procedure assigns a source to each time-frequency point, even if this point is due to noise alone, which is detrimental to the method overall performance.
Although TIFROM and TIFCORR do not require the sources to be W-disjoint orthogonal for source recovery, they however suffer from the same limitation as DUET in that they also assign time-frequency points of noise alone to sources.
Bofill and Zibulevsky [18] use the 1 -norm minimization to recover the sources.In the noiseless case, this can be accomplished by solving the convex optimization min where • 1 is the 1 norm.In presence of noise, the foregoing constraint must be modified so as to take the noise standard deviation into account.In practice, this noise standard deviation is unknown and must be estimated.
For the SUBSS approach in [15], the source recovery is based on the following assumptions: Assumption 2 The number of active sources at any (t, f ) is strictly less than the number M of sensors.
Assumption 3 Any M × M sub-matrix of the mixing matrix has full rank, that is , for all J ⊂ {1, 2, • • • , N } with cardinality less than M , (a j ) j∈J are linearly independent.
The subspace approach then performs multisource selection, that is, the selection of time-frequency points pertaining to a mixture and then, identifies the sources present at a multisource time-frequency points.
Thanks to assumption 2, the method then involves solving the resulting locally overdetermined linear problem.By construction, the methods requires rejecting time-frequency points of noise alone.In [15], the time-frequency points with energy below some empirically chosen threshold are rejected.

III. STATISTICAL TESTS FOR SPARSENESS-BASED UBSS
This section is the main core of the paper since it is dedicated to a series of improvements brought to the classical UBSS methods presented in Section II.These improvements concern the selection of the timefrequency points of interest for source separation (multisource selection) and the selection of the timefrequency points suitable for mixing matrix estimation (autosource selection).The crux of the approach followed bellow is to consider the aforementioned selections of time-frequency points as statistical testing problems of accepting or rejecting the presence of sources in noise.These two hypothesis testing problems are different in that mixing matrix estimation requires selecting points where only one single source is present, whereas this constraint is useless for denoising and source recovery.
The issue in these binary hypothesis testing problems is twofold.On the one hand, the observation in each problem has unknown distribution because basically the possible source signal distributions are themselves unknown.On the other hand, the noise standard deviation is unknown as well.Because of this lack of prior knowledge, standard likelihood theory or extensions such as generalized likelihood ratios or invariance-based approaches do not apply.
For source recovery, our solution is an estimate-and-plug-in detector.Based on a weak-sparseness model for the signal sources in noise, it begins by estimating the noise standard deviation via the DATE introduced in [22] .Then, the noise standard deviation estimate is used instead of the unknown true value in the expression of a statistical test, also designed for noisy sparse signal representations.
For mixing matrix estimation, we exploit the physical nature of the signals to  and additive complex Gaussian noise, where S n (t, f ) ∼ N c (0, σ 2 ) and Θ(t, f ) stands for the mixture of signals possibly present at time-frequency point (t, f ).
The issue is then the following.Although S x (t, f ) can reasonably be modeled as a random complex variable, the distribution of S x (t, f ) can hardly be known and standard likelihood theory thus becomes useless.This difficulty can however be overcome by resorting to a weak-sparseness model that can be introduced as follows.
Figure 3-(a) displays the spectrogram obtained by STFT of a mixture of audio signals.This spectrogram exhibits many time-frequency components with small or even null amplitudes.When this mixture is corrupted by additive and independent noise as in Figure 3-(b), small components are masked and only big ones are still visible.We must also note that the proportion of these big components remains seemingly less than or equal to one half.In other words, it is reasonable to assume that 1) the signal components are either present or absent in the time-frequency domain with a probability of presence less than or hal-00739565, version 1 -8 Oct 2012 equal to one half and 2) when present, the signal components are relatively big in that their amplitude is above some minimum value.These two assumptions specify the weak sparseness model by bounding our lack of prior knowledge on the signal distribution.The weak-sparseness model slightly differs from the "strong" sparsity model encountered in compressive sensing, where it is assumed that the non-null significant signal components are very few.In the weak sparseness model, we do not restrict our attention to very small proportions of big time-frequency components.To take the weak-sparseness model into account in our binary hypothesis problem statement, we assume that 1) the probability of occurrence of hypothesis H 1 is less than or equal to one half and 2) there exists some positive real value α such that |Θ(t, f )| > α.The value α can be regarded as the minimum signal amplitude.We thus write that with S n (t, f ) ∼ N c (0, σ 2 ), |Θ(t, f )| > α and P(H 1 ) 1/2.Furthermore, we do not assume that the probability distribution of Θ(t, f ) is known.In what follows, we prefer summarizing this testing problem by introducing a Bernoulli distributed random variable ε(t, f ), valued in {0, 1}, independent of Θ(t, f ) and S n (t, f ), but defined on the same probability space, so as to write that

hal-00739565, version 1 -8 Oct 2012
We thus have P(H 1 ) = P[ε(t, f ) = 1].Given any test T , that is, any measurable map of C M into {0, 1}, we then say that T accepts (resp.rejects) the null hypothesis H 0 if T (S x (t, f )) = 0 (resp. T (S x (t, f )) = 1).In other words, T is said to return the expected value of the true hypothesis.The error probability of T is then defined as the probability According to [25, Theorem VII.1], the decision should then be performed by using the thresholding test with threshold height λ D (α, σ) = (σ/ √ 2)ξ(α √ 2/σ) where, for any positive ρ, ξ(ρ) = I −1 0 (e ρ 2 /2 )/ρ and I 0 is the zeroth order modified Bessel function of the first kind.By thresholding test with threshold height h ∈ [0, ∞), we mean the test T h such that The reasons for which this test is recommended are the following ones.Let L MPE be the Minimum-Probability-of-Error (MPE) test, that is, the likelihood ratio test that guarantees the least possible probability of error among all possible tests and that could be computed if the probability distribution of Θ(t, f ) and the prior probability of presence P(H 1 ) were known.Two facts follow from [ To carry out this test, we must choose an appropriate value for α and perform an estimate of σ.
The value of α is fixed by following the same reasoning as in [29] and considering that the minimum amplitude of the signal to detect is the noise maximum value.More specifically, given m random variables X 1 , X 2 , . . . ,X m that are independent and identically distributed with where λ u = σ √ 2 ln m is often called the universal threshold [33].The maximum amplitude of (X k ) 1 k m has thus a strong probability of being close to λ u when m is large and the universal threshold can be regarded as the noise maximum amplitude of m noise samples.In our case, we have M sensors so that each observation S x (t, f ) is an M -dimensional complex vector.Let L stand for the number of timefrequency points (t, f ) obtained for each sensor.We thus have M × L time-frequency points (t, f ) and, therefore, 2M L random variables -the real and imaginary parts of S n (t, f ) -that are N (0, σ 2 /2).The maximum amplitude of these 2M L Gaussian independent and identically distributed random variables with standard deviation σ/ √ 2 will then be considered as the minimum signal amplitude so that we set α = σ log(2M L).The threshold height used to detect the relevant time-frequency points is then λ D (σ) = σξ( log(2M L)), which is henceforth called the detection threshold.
As far as the estimation of the noise standard deviation is concerned, usual solutions based on standard robust estimators such as the MAD (Median Absolute Deviation) [34] , the trimmed or the winsorized estimators [35] do not apply.Indeed, by considering the spectrogram of Figure 3-(b), it can easily be guessed that such standard estimators would fail because the proportion of significant noisy time-frequency points pertaining to the signals is large.Therefore, the noisy time-frequency points are not very few and cannot play the role of outliers with respect to the main core data distribution.In a recent paper [22], a new noise standard deviation estimator called the DATE has been proposed.This estimator relies on the weak-sparseness model presented before.An exhaustive presentation of the theoretical background on which this estimator is based is beyond the scope of the present paper and the reader is asked to refer to [22] for an heuristic presentation and a complete mathematical description of the DATE.In the context addressed in the present paper, this algorithm applies as follows.
With the notation used so far, each S x (t, f ) is an M -dimensional complex vector.Let S xj (t, f ), j = 1, 2, . . ., M , be the components of S x (t, f ).For any given j = 1, 2, . . ., M , we assume that the L time-frequency components S xj (t, f ) for the j th sensor are independent and that each time-frequency component obeys the binary hypothesis model of ( 5) with α = σ log(2M L).According to [22] and where Γ is the standard Gamma function, there exists a specific convergence criterion, for which we have: when the number L of time-frequency bins (t, f ) is large enough.In the previous equation,  7) is specified in [22] and is not given here because of its intricateness.It also turns out that the noise standard deviation σ is the unique solution of (9) with respect to the convergence criterion involved.Therefore, the DATE basically performs an estimate of the noise standard deviation by solving (7) with regard to this convergence criterion.The several steps involved in the computation are then the following ones.

hal-00739565, version 1 -8 Oct 2012
The DATE: According to Bienaymé-Chebyshev's inequality and since the probabilities of presence of the signals are assumed to be less than or equal to one half, the probability that the number of observations due to noise alone is above k min is larger than or equal to Q.In the experimental results presented below, Q was set to 0.95 for the computation of k min .

2) [Existence]:
IF there exists a smallest integer k in {k min , . . ., L} such that with set k * = k.
ELSE, set k * = k min .
3) [Value]: The estimate σ * j of the noise standard deviation on the j th sensor is then The final estimate σ of the noise standard deviation is then obtained by averaging the values σ j so that σ = (1/M ) M j=1 σ j .

B. Signal source detection for mixing matrix estimation (autosource selection)
In this section, we propose a test for selecting the time-frequency points where one signal source is probably present alone.

hal-00739565, version 1 -8 Oct 2012
1) The case of signals with low overlapping rate: Since the sources have low overlapping rate, we suppose that the observations detected by the thresholding test of Section III-A mostly pertain to one signal source.In other words, we neglect the effect on the matrix estimation performance of the few points where sources may overlap, inasmuch as the impact of such time-frequency points is further reduced by the averaging effect inherent to any mixing matrix estimation method.
2) The case of signals with high overlapping rate: When signals overlap significantly in the timefrequency domain, the time-frequency detection of Section III-A is now inappropriate.Indeed, the statistical procedure of Section III-A is aimed at detecting time-frequency points where signal sources are present, whatever the number of these sources, whereas it is now required to discriminate points where one single source is present from points where multiple sources occur.We assume that in case of different sources present at time-frequency point (t, f ), they are uncorrelated and incoherently combined.
The resulting energy at (t, f ) is thus supposed to be smaller than the energy attained at the time-frequency points where one single source is present only.
Our purpose is thus to detect the time-frequency points where the signal energy is big enough in presence of noise.Basically, this problem amounts to deciding whether |AS s (t, f )| is above some value τ or not.The value τ 2 thus represents the minimum energy level above which we consider that the signal energy is big enough to assume that one single source is actually present at (t, f ).For any λ ∈ (0, ∞), it follows from [23, Lemma 4, statement (iii)] that where F χ 2 d (δ) (•) stands for the cumulative distribution function of the non-centered chi-2 distribution with d degrees of freedom and non-centrality parameter δ.The degree of freedom in (11) is 2M since each S x (t, f ) is an M -dimensional complex random vector and, thus, a 2M -dimensional real random vector.
Given some level γ ∈ (0, 1), it then suffices to choose to guarantee a "false alarm probability" P |S x (t, f )| > λ |AS s (t, f )| < τ less than or equal to γ.Therefore, for a given time-frequency point (t, f ), the decision is that For mixing matrix estimation, we then keep the time-frequency points (t, f ) such that |S x (t, f )| λ(τ, γ), which are considered as to time-frequency points pertaining to one single source.In practice, since the actual value of σ is unknown, we replace this true value by its estimate σ provided by the DATE.
Although the two parameters γ and τ must be fixed, there is no need to choose them for each signal to noise ratio.Parameter τ , which is independent of the noise level, can be fixed via a small noiseless database.Similarly, level γ can be determined via a few preliminary test on a small representative database.

IV. SIMULATION RESULTS
In most of the following simulations, the mixing matrix is chosen according to [14, Eq. ( 38 The source separation performance is measured by the normalized mean square error (NMSE): Throughout this section, NMSEs are calculated over 100 Monte-Carlo runs.

A. SUBSS method
The modified SUBSS algorithm is obtained by using both the DATE and SNT for source recovery and  In Figure 7, the performance of the modified SUBSS algorithm, with and without denoising, is compared to that obtained by the originally SUBSS algorithm of [15].The denoising mentioned above is described in appendix A as a standard linear estimation.
The modified SUBSS algorithm outperforms the original SUBSS algorithm [15], which relies on thresholds that are manually chosen for each input SNR.Moreover, modified SUBSS without denoising yields performance measurements that do not significantly depart from those attained by the original subspace-based UBSS algorithm.In addition, Figure 7  SUBSS and the original SUBSS when the number of sources increases and for SNR = 10dB and SNR = 20dB.In both figures, the NMSEs degrade, because an increase of the source interference invalidates assumption 1.
We now consider the case of complex chirp signals.These ones were generated by slightly modifying the MATLAB routine MakeSignal.m of the WAVELAB toolbox, so as to obtain complex chirp signals.The 4 chirp signals we use as sources are s and s 4 (t) = e i 2 3 πTt , where t ∈ [0, 1] and T = 8192 is the number of samples for each signal.Two of these chirp signals are LFM ones and one is a pure sine.Figure 10 then displays the spectrograms of the four chirp signals under consideration, whereas Figure 11 presents the spectrogram of a mixture of these sources when matrix A is applied and SNR= 10dB.The spectrograms of the other mixtures are not displayed for the same reasons as those given previously for the speech signal mixtures.
The experimental procedure for assessing the modified SUBSS in comparison to the original SUBSS method is then the same as above.As specified in Section III-B1, the thresholds used for the mixing matrix estimation are the detection ones.Therefore, no additional parameter is needed.The results obtained in Figure 12 show the relevance of this choice for the thresholds, explained by the fact that chirp signals present very few overlapping time-frequency components.

B. Other methods
As described in Sections III-A and III-B, The DATE and SNT can be used to perform multisource and autosource selections, respectively.Said otherwise, the statistical tests of the aforementioned sections make it possible to obtain the time-frequency points where noisy mixtures are present and the set of time-frequency points where only one single source exists.In this subsection, we comment the results we obtain by so proceeding with respect to the several UBSS methods addressed in Section II and other than SUBSS.
In the underdetermined case, TIFROM achieves partial source separation only.Therefore, to better assess the contribution of our statistical tests to TIFROM, we consider the determined case where four source signals from four speakers are mixed.The mixing matrix is now 4 × 4 with independent Gaussian entries.Figure 13, we present the NMSEs obtained by the TIFROM, SNT-TIFROM and Modified SNT-TIFROM.Specifically, SNT-TIFROM uses SNT to select times frequency points where a source exists alone.SNT-TIFROM, as TIFROM, performs no multisource selection for source recovery.In contrast, the modified SNT-TIFROM performs multisource selection and forces to zero the unselected time-frequency points.These results show that SNT makes it possible to actually select the autosource time-frequency points, with no performance loss and without resorting to the empirical threshold required by the original TIFROM.The performance yielded by the modified SNT-TIFROM further emphasizes that the detection threshold adjusted with the DATE selects appropriate multisource time-frequency points for source recovery.The gain for low SNRs is explained by the fact that this selection can be regarded as a non-linear denoising.The gain brought by this denoising effect decays when the SNR increases.
Another contribution of our statistical approach to sparseness-based methods is the estimation of the noise standard deviation.Indeed, some methods need an estimate or the true value of the noise standard deviation.For instance, Bofill and Zibulevsky, in [18], use the 1 -norm minimization to recover the sources.In the noisy case, they propose to solve the optimization problem: Because of the weakly sparseness of the sources in noise, we hereafter prefer following [37] dedicated to stable recovery of not exactly sparse signals.We therefore solve the optimization problem min This approach can then be improved in two ways.First, by solving this optimization problem on only the time-frequency points selected by the multisource procedure propounded in Section III-A.Second, by replacing the unknown true value of the noise standard deviation by its estimate provided by the DATE.
In this respect, Figure 14 displays the performance measurements obtained by the original method based on the 1 -criterion of Eq. ( 4) (L1 Minimization) in comparison to the modified 1 -criterion of Eq. ( 14) applied to the outcome of the the multisource selection when the noise standard deviation is estimated by the DATE (Modified L1 minimization).As expected, the gain brought by multisource selection and Eq. ( 14), both adjusted by the noise standard deviation estimate provided by the DATE, is significant.It is also worth noticing that the DATE estimation error does not impact significantly the separation performance in comparison to the case where the noise standard deviation is perfectly known.This can also be seen in Figure 14, where the performance measurements are given when the multisoure selection and 1 -criterion of Eq. ( 14) are both adjusted with the actual value of the noise standard deviation (Oracle Modified L1 Minimization).In contrast, there is significant performance loss when the multisource selection and Eq. ( 4) are calculated by using the MAD instead of the DATE (MAD Modified L1 Minimization).The reason still relates to the fact that the DATE is more robust to weak-sparseness than the MAD.
The multisource selection based on the detection threshold adjusted by the estimate provided by the DATE can be further exploited by the DUET reconstruction, as illustrated in Figure 15.In this simulation, the input signals are the chirp signals considered above, so that the W-disjoint orthogonality assumption is satisfied.Moreover, the mixing matrix A is now assumed to be known.On the one hand, we perform the DUET source recovery by considering the whole time-frequency plane.On the other hand, we consider the modified DUET, that is, the DUET source recovery applied to the selected multisource time-frequency points only.The results are similar to those obtained above by TIFROM and its modified versions.Here, the gain brought by the multisource selection, which acts as a denoising, is bigger on a wider SNR range because the time-frequency representation of chirp signals is sparser than that of audio signals.

A. Assessment
The algorithms we propose are very general.They are not dedicated to a given sparseness-based BSS method.They are simple to apply without any adjustment.From the results of Section IV, our procedures can therefore be used to improve, simplify or bring robustness to the standard sparseness-based BSS methods considered in the paper.
More specifically, the weak-sparseness-based time-frequency detection procedure of Section III-A can be used as an automatized pre-processing for multisource selection.For example, the time-frequency detection in [15] requires one threshold value for each instrumented SNR.The detection procedure of Section III-A then makes it possible to avoid this empirical parameter choice, which brings robustness and significant simplification.Used as a pre-processing for TIFROM [16], which basically involves no selection of time-frequency points, the multisource selection we propound can improve the separation performance.
For mixing matrix estimation, our approach described in Section III-B relies on no weak-sparseness version 1 -8 Oct 2012    assumption and involves two parameters only, that is, the tolerance and the false-alarm probability.
These parameters are valid over the signal-to-noise ratio (SNR) range, in contrast to [15] for instance.
Furthermore, the assumptions made by TIFROM can be relaxed by using the autosource selection of Section III-B.It is also worth noticing that the two parameters we need for mixing matrix estimation have a physical meaning, which is not the case for some standard sparseness-based BSS methods.

B. Convolutive mixture case
There exists a great variety of possible strategies for dealing with the convolutive mixture case, which is more realistic than the instantaneous one.In the convolutive mixture case, exhibiting a well-established family of methods such as that considered above in the instantaneous mixture one is hardly feasible.However, despite this variety, the statistical framework proposed in this paper can be expected to be used in the convolutive mixture case, at least, for methods based on time-frequency representations for which, separating time-frequency points of noise alone from those of noisy signals can be helpful.For instance, this detection procedure for multisource selection can be used straightforwardly to detect the time-frequency points required by the convolutive SUBSS presented in [38].The modified convolutive SUBSS thus obtained discards the empirical threshold required in [38] for multisource selection.This entails no significant performance loss, as illustrated by Figure 16.Studying the added-value brought by SNT in the convolutive mixture case requires further analysis that could be achieved in some forthcoming work.

VI. CONCLUSION AND PERSPECTIVES
The algorithms presented in this paper contribute to blind source separation in the underdetermined mixture case, by avoiding empirical choices of parameters present for the so-called family of weaksparseness based methods.Our first algorithm aimed at selecting the suitable time-frequency points for source recovery is full automatic.The second, dedicated to mixing matrix estimation, requires fixing two parameters only, regardless of the instrumented SNRs.
The question is now to what extent the statistical tests used above in the instantaneous mixture case can possibly be exploited in the convolutive mixture case, especially in complement to the results discussed in Section V-B.It can also be wondered whether these tests can be extended so as to deal with colored noise.Work on this topic is under progress.The theoretical and experimental results of this paper pinpoint that the subfunctions of the source separation methods considered above, completed with the statistical tests we have proposed, can be regarded as elementary components that can be interchanged and associated to provide new algorithms for source separation in different applicative contexts.This opens new practical prospects.For instance, it would be desirable to construct a toolbox involving all these elementary components for further developments and studies.Such a toolbox would also make it possible to carry out exhaustive experimental assessments on large databases of signals via the BSSEval toolbox, downloadable from [39].

APPENDIX A DENOISING-BASED SOURCE RECOVERY
The SUBSS method presented in [15] estimates the index set of the sources present at a given timefrequency point (t, f ).Let us denote by J this set of indexes.Then, equation (2) reduces to: and the STFT coefficients of these active sources can be recovered using: where A # J = (A H J A J ) −1 A H J is the Moore-Penrose pseudoinverse of A J .We propose to use the noise standard deviation estimate provided by the DATE to jointly denoise and separate the sources on the basis of the time-frequency points selected by the statistical test of Section III-A.So, instead of performing the source separation as specified by Eq. ( 16), the source separation is now carried out by computing where σ is the noise standard estimate returned by the DATE and R sJ = E[S sJ (t, f )S H sJ (t, f )].The derivation of the optimal linear estimator of ( 17) is standard.It involves minimizing the risk E S sJ (t, f ) − DS x (t, f ) 2 when D ranges over the space of the card(J) × M matrices and under the assumption that the sources are spatially decorrelated.In practice, matrix R sJ is unknown and must be estimated.We then proceeded as follows.On the one hand, we have R x = AR s A H + σ 2 I M .On the other hand, R x can be estimated by R x = 1 #t t S x (t, f )S x (t, f ) H , where #t stands for the number of time windows on which the STFT is calculated.Since estimates of A and σ are known, we derive from the expressions of R x and R x an estimate R s of R s .An estimate of R sJ follows by picking the appropriate columns in R s .

Figure 1 Fig. 1 .
Figure 1 presents the flowchart of such a two-step approach.
Fig. 3. (a) Noiseless audio signal mixture in the time-frequency domain.Many time-frequency coefficients are close to 0. (b) Noisy audio signal mixture in the time-frequency domain.The time-frequency coefficients with small amplitudes are masked by noise.Only big time-frequency coefficients remain visible.They are not really affected by noise as long as the signal to noiseratio is large enough.The proportion of these significant coefficients is less than one half.
25, Theorem VII.1].First, the error probability of T λD(α,σ) is above the error probability of the Minimum-Probabilityof-Error (MPE) test and less than or equal to the error probability of an explicit function V (α √ 2/σ), whose expression is useless in the sequel.Second, V (α √ 2/σ) is a sharp upper-bound since it is attained by the error probabilities of tests L MPE and T λD(α,σ) in the least favorable case where P[ε = 1] = 1/2 and Θ(t, f ) = αe iΦ(t,f ) with Φ(t, f ) uniformly distributed in [0, 2π) and i is the imaginary unit (i 2 = −1).

1 (
|S xj (t, f )| λ D (σ)) stands for the indicator function of event |S xj (t, f )| λ D (σ), The specific convergence criterion involved in ( ) , . . ., Y j (L) be the L values |S xj (t, f )| sorted by ascending order.1) [Search interval]: a) Choose some positive real value Q less than or equal 1 − )] so as to model N sources arriving at the sensor array at different angles θ 1 , θ 2 , . . ., θ M .The entries of matrix A are therefore a j,k = e iπ(j−1) sin(θk) for j ∈ {1, • • • , M } and k ∈ {1, • • • , N }.In the sequel, we proceed by choosing four sources (N = 4), three sensors (M = 3), θ 1 = 15 • , θ 2 = 30 • , θ 3 = 45 • and θ 4 = 75 • .Unless specified otherwise, the source signals are speech signals randomly chosen in the TI-digits database [36] .This large speech database collected in a quiet environment is commonly used in speech processing.In this paper, the chosen speech signals were downsampled to 8 kHz.All signals involve 8192 samples.In Figure 4, the left four subplots (a)-(d) show the time-domain representations of the original source signals and the right four subplots (e)-(h) represent their corresponding spectrograms.Figure 5 displays a spectrogram of a mixture of these speech signals when the mixing matrix A is applied to them at SNR = 10dB.The spectrograms of the other mixtures are not presented because the differences between any two of them are not visually noticeable since the mixing matrix A involves no null entry.The two parameters required for the mixing matrix estimation are then fixed to τ = 4 and γ = 10 −3 .

Figure 6 .
Figure 6.The left four subplots (a)-(d) show the time-domain representations of the recovered source signals in the noiseless case (input SNR = 45dB), and the right four subplots (e)-(h) represent time-domain representations of the recovered source signals with input SNR = 10dB.

Fig. 4 .
Fig. 4. (a)-(d) show the waveforms of the original source signals in the time domain, (e)-(h) display the spectrograms of these source signals in the time-frequency domain.

FrequencyFig. 5 .
Fig. 5. Speech mixture spectrogram when mixing matrix A is applied to the four sources of Figure 4 (SNR = 10dB).

Fig. 6 .
Fig. 6.Simulation results: (a)-(d) show the waveforms of the source signals recovered by modified SUBSS with input SNR=45dB, (e)-(h) show the waveforms of the source signals recovered by modified SUBSS with input SNR=10dB

Fig. 7 .
Fig. 7. Comparison between SUBSS, modified SUBSS with and without denoising, modified SUBSS with MAD estimate instead of DATE and without denoising: NMSE versus SNR.

Fig. 11 .
Fig. 11.Chirp signal mixture spectrogram when mixing matrix A is applied to the chirp signals of Figure 10 (SNR= 10dB).

Fig. 14 .
Fig. 14.Comparison of performance (NMSE versus SNR) between the original Bofill and Zibulevsky's method based on the 1-criterion of Eq. (4) (L1 Minimization), the modified 1-criterion of Eq. (14) after multisource selection when: the noise standard deviation is known (Oracle Modified L1 Minimization) or estimated via either the DATE (Modified L1 Minimization) or the MAD (MAD Modified L1 Minimization).

Fig. 15 .
Fig. 15.Comparison of performance between DUET reconstruction and Modified DUET reconstruction on chirp signals

Fig. 16 .
Fig. 16.Comparison of performance between standard convolutive SUBSS and modified convolutive SUBSS: the signals used are same audio one as those considered in Simulation Section.Each mixture is a sum of filtered source signal where each filter is randomly chosen RIF with order 4.

the signal energy versus time and frequency simultaneously. The sparseness of the time-frequency coefficients of the source signals is one of the main keys to solve the UBSS problem.
Time-frequency signal processing provides effective tools for analyzing nonstationary signals, whose frequency contents vary in time.It involves representing signals in a two-dimensional space, that is, the joint time-frequency domain, hence providing a distribution of To perform this selection, we make the distinction between signals with either low or high overlapping rate in the time-frequency domain.Chirp signals (resp.audio signals) are typical examples of signals with low (resp.high) overlapping rate.It is worth noticing that the estimation procedures proposed below for each class have reasonable computational costs.