Spatial location priors for Gaussian model based reverberant audio source separation

We consider the Gaussian framework for reverberant audio source separation, where the sources are modeled in the time-frequency domain by their short-term power spectra and their spatial covariance matrices. We propose two alternative probabilistic priors over the spatial covariance matrices which are consistent with the theory of statistical room acoustics and we derive expectation-maximization algorithms for maximum a posteriori (MAP) estimation. We argue that these algorithms provide a statistically principled solution to the permutation problem and to the risk of overfitting resulting from conventional maximum likelihood (ML) estimation. We show experimentally that in a semi-informed scenario where the source positions and certain room characteristics are known, the MAP algorithms outperform their ML counterparts. This opens the way to rigorous statistical treatment of this family of models in other scenarios in the future.


Introduction
We consider the task of reverberant audio source separation, that is, to extract individual sound sources from a multichannel microphone array recording. Many approaches have been proposed in the literature, which typically operate in the time-frequency domain via the short-time Fourier transform (STFT) [1][2][3]. One category of approaches models the mixture STFT coefficients as the product of the source STFT coefficients and complexvalued mixing vectors, which are estimated by frequencydomain independent component analysis (FDICA) [4,5] or by clustering [6,7]. In under-determined conditions when the number of sources is greater than the number of channels, the source STFT coefficients are then obtained via binary masking [6], soft masking [7], or 1 -norm minimization [8]. Lately, a Gaussian framework has emerged where the mixture STFT coefficients are modeled as a function of the power spectra and the spatial covariance matrices of the sources, and separation is achieved by multichannel Wiener filtering [9][10][11]. These covariance matrices may equivalently be expressed as the outer product of subsource mixing matrices, which reduce to mixing *Correspondence: emmanuel.vincent@inria.fr 2 Inria, 54600 Villers-lès-Nancy, France Full list of author information is available at the end of the article vectors when the spatial covariance matrices have rank 1 [12]. Full-rank matrices have been shown to improve separation performance in reverberant conditions by modeling not only the spatial position of the sources but also their spatial width [11].
While a number of deterministic [12][13][14] and probabilistic [15][16][17] priors have been proposed over the source spectra, the mixing vectors and the source spatial covariance matrices are usually estimated in an unconstrained manner. The lack of a constraint relating these quantities across frequency causes a permutation problem, which has been coped with by reordering the estimates in each frequency bin while keeping their value [7,18]. More crucially, the estimated values of the mixing vectors and the source spatial covariance matrices in a given frequency bin are likely to suffer from overfitting when the corresponding sources are little active in that bin.
Building upon the studies for instantaneous mixtures in [19,20] and the deterministic subspace constraints in [21,22], a few algorithms have been designed that exploit soft penalties or probabilistic priors over the mixing vectors for increased estimation accuracy. These algorithms typically target semi-informed scenarios such as formal meetings or in-car speech where the spatial locations of the sources are known and they rely on the assumption http://asp.eurasipjournals.com/content/2013/1/149 that the mixing vectors are close to the steering vectors representing the direct path from the sources to the microphones. Squared Euclidean penalties over the blocking vectors are a common choice for FDICA [21,23]. An inverse-Wishart prior over the outer product of the mixing vectors was also employed in [24]. These penalties and priors were not designed according to the actual statistics of reverberation. Moreover, to the best of our knowledge, no such priors have been designed for full-rank matrices.
In this article, we propose two probabilistic priors over the source spatial covariance matrices or the subsource mixing matrices which are consistent with the theory of statistical room acoustics. One of them was briefly introduced in our preliminary paper [25]. We extend the two Gaussian expectation-maximization (EM) algorithms in [12,26] so as to perform maximum a posteriori (MAP) estimation. We then compare the resulting separation performance with conventional maximum likelihood (ML) estimation and with two baseline approaches in an underdetermined full-rank semi-informed scenario where the source positions and certain room characteristics are known. For clarity, we do not assume any other constraint on the model parameters, which allows us to assess the improvement resulting from these priors alone.
The structure of the article is as follows. In Section 2, we recall the Gaussian framework for audio source separation and we present a result of the theory of statistical room acoustics. We introduce an EM algorithm using an inverse-Wishart prior in Section 3 and an EM algorithm using a Gaussian prior in Section 4. We evaluate their separation performance in Section 5 and we conclude in Section 6.

Gaussian modeling for source separation
Let us consider a mixture signal x(t) =[ x 1 (t), . . . , x I (t)] T recorded by an array of I microphones. Denoting by J the number of sources, the mixing process is expressed as [27] x where c j (t) =[ c 1j (t), . . . , c Ij (t)] T is the spatial image of the jth source, which is its contribution to the signals recorded at the microphones. The STFT coefficients c j (n, f ) of the source spatial images in each time frame n and each frequency bin f are modeled as zero-mean Gaussian random vectors where v j (n, f ) are scalar nonnegative variances encoding the short-term power spectra of the sources and R j ( f ) are I × I spatial covariance matrices encoding their spatial position and their spatial width [9,11]. Under the assumption that the sources are uncorrelated, the mixture covariance matrix x (n, f ) is equal to The log-likelihood is then given by [26] log L = where tr(.) and |.| denote the trace and the determinant of a square matrix, and R x (n, f ) is the empirical mixture covariance matrix obtained by local averaging of x(n, f )x H (n, f ) over the neighborhood of each timefrequency bin where w nf is a bi-dimensional window specifying the shape of the neighborhood [26]. Source separation can then be achieved by estimating the model parameters θ = {v j (n, f ), R j ( f )} in the ML sense and by deriving the spatial images of all sources in the minimum mean square error sense via multichannel Wiener filtering of the mixture STFT coefficients x(n, f )

A result from the theory of statistical room acoustics
In a scenario such as in [21][22][23], the distance and the orientation of the sources and the microphones with respect to each other (aka, the scene geometry) are assumed to be known but their absolute position in the room is unknown. According to the theory of statistical room acoustics [28,29], the mean spatial covariance matrix of a source over all possible source and microphone positions and orientations, under the constraint that the scene geometry remains fixed, can be expressed as where . H denotes conjugate transposition. The first term of this expression models the contribution of direct sound, where is the steering vector representing the direct paths from the source to the microphones, with c the sound velocity and r ij the distance from the jth source to the ith microphone. The second term of this expression models the contribution of echoes and reverberation, which are assumed to come from all possible directions on average over all absolute positions: σ 2 rev is the power of echoes and reverberation and ( f ) is the covariance matrix of a diffuse sound field.
The entries ii ( f ) of ( f ) depend on the microphone directivity patterns and on the distance d ii between the ith and the i th microphone. For omnidirectional microphones, this quantity can be shown to be real-valued and equal to [28], Moreover, the power of the reverberant part within a parallelepipedic room with dimensions L x , L y , L z is given by where A is the total wall area and β the wall reflection coefficient computed from the room reverberation time T 60 via Eyring's formula [29], In order to match the physics of reverberation, a prior over the source spatial covariance matrices or over the subsource mixing matrices should lead to a mean spatial covariance matrix μ R j ( f ) satisfying the constraint (7). This is not the case of the prior in [24], whose mean is equal to d j ( f )d H j ( f ) + I I with I I the identity matrix of size I and a small constant. Isotropic Gaussian priors over the subsource mixing matrices would not satisfy this constraint either due to the interchannel correlation introduced by ( f ). Fixed spatial covariance matrices set to the value in (7) were employed for single source localization in [29] and for source separation in [30]. Later work confirmed that the model (7) is valid on average over all absolute positions in the room but that R j ( f ) varies with the absolute position so that it must be estimated from the observed mixture signal [11].

General EM algorithm
Assuming that the spatial covariance matrices R j ( f ) are full-rank, ML estimation can be achieved using the source image-based EM (SIEM) algorithm in [26] where the spatial images {c j (n, f )} n,f of all sources in all time-frequency bins are considered as hidden data. Strictly speaking, this algorithm is a generalized form of EM [31] because the M step increases but does not maximize the expectation of the log-likelihood of the hidden data. Since the priors proposed hereafter pertain to the spatial covariance matrices only, MAP estimation can be achieved via the same algorithm except for the corresponding update in the M step.
The resulting EM updates are listed in Algorithm 1. In the E step, the Wiener filter W j (n, f ) and the secondorder raw moment R c j (n, f ) of the spatial images of all sources are computed a . In the M step v j (n, f ) and R j ( f ) are updated. In the ML case, the update for R j ( f ) in (17) is given by [26] R where N is the total number of time frames.

Algorithm 1 SIEM algorithm [26]
E step: Update Given this algorithm, we now consider the design of suitable priors over R j ( f ). In addition to the physical constraint (7), the priors must satisfy practical engineering constraints: they must be defined over the space of Hermitian positive definite matrices, have a small number of parameters, have a closed-form mean and result in closedform EM updates. The inverse-Wishart and the Wishart distributions satisfy these constraints. In this paper we present only the inverse-Wishart prior since we observed experimentally that the Wishart prior results in poorer separation performance compared to both the ML algorithm and the MAP algorithm using the inverse-Wishart prior.

Inverse-Wishart prior
The inverse-Wishart distribution is the conjugate prior for the likelihood (4) of our model. This prior is defined as where is the inverse-Wishart density over Hermitian positive definite matrices R with positive definite inverse scale matrix , m degrees of freedom, and mean /(m−I) [32], with the gamma function. This density, its mean, and its variance are finite for m > I − 1, m > I, and m > I + 1, respectively. We fix the inverse scale matrix j ( f ) as so that the mean of R j ( f ) is consistent with (7). The deviation allowed from the mean is controlled by the so-called number of degrees of freedom m, which is not necessarily an integer.

Learning the hyper-parameter
In order to obtain the best fit between this prior and the actual prior distribution of spatial covariance matrices, we learn the number of degrees of freedom m from training data. We assume that m depends on the distance and the orientation of the microphones with respect to each other (aka, the array geometry) and on the distance from the source to the center of the array, but not on the source direction of arrival. Given the microphone array geometry and the source distance, we generate training signals c p (t) indexed by p for a number of microphone array positions and orientations and for a number of source directions of arrival by convolving the corresponding room impulse responses with a single-channel signal. We derive the spatial covariance matrix R p ( f ) associated with each training signal in an oracle fashion [30] by alternately applying (16) and (12) to the empirical covariance matrices R c p (n, f ) computed as in (5). Such training data can be generated in any practical scenario where the source separation system is to be deployed in fixed known environments, where the impulse responses can be pre-recorded or simulated via the image method [33].
Since R p ( f ) is measured only up to an arbitrary nonnegative scaling factor α p ( f ), we jointly estimate the number of degrees of freedom m and the scaling factors in the ML sense by maximizing where J α p ( f ) = α I 2 p ( f ) is the Jacobian of the scaling transform and p ( f ) is the inverse scale matrix in (20) which depends on p. Maximization with respect to m can be achieved using a nonlinear optimization technique [34], where the optimal scaling factors for a given m are given by The values of m learned for the geometrical setting and the reverberation times tested in Section 5 are shown in Table 1.

MAP EM update
Given the hyper-parameters j ( f ) and m, the spatial covariance matrices R j ( f ) can be estimated in the MAP sense in step (17) of Algorithm 1 by maximizing the expectation of the log-posterior of the hidden data where γ is a trade-off hyper-parameter determining the strength of the prior. Strictly speaking, MAP estimation corresponds to γ = 1. However, as in other fields of signal processing [35], a larger strength parameter is needed in practice in order to balance the absolute values of the prior and the likelihood, and this generalized rule is loosely referred to as MAP. By computing the partial derivatives of Q IW with respect to each entry of R j ( f ) and equating them to zero, we obtain the MAP update When γ = 0, the contribution of the prior is excluded and (24) becomes equal to the ML update in (12). The setting of γ will be discussed in Section 5.3.

General EM algorithm
Besides the SIEM algorithm, an alternative subsourcebased EM (SSEM) algorithm was proposed for ML estimation in [12] that applies to spatial covariance matrices of any rank R j . This algorithm relies on the non-unique representation of the source spatial images as c j (n,  is an I × R j complex-valued subsource mixing matrix satisfying the constraint [12] R This subsource mixing matrix reduces to a mixing vector in the particular case when R j ( f ) has rank 1. Overall, the mixture STFT coefficients are written as where The log-likelihood (4) can then be maximized by considering the set {x(n, f ), s j (n, f )} j,n of observed mixture STFT coefficients and hidden subsource STFT coefficients in all time-frequency bins as complete data. Once again, it turns out that MAP estimation can be achieved via the same algorithm except for the mixing matrix update in the M step.
The details of one iteration are summarized in Algorithm 2, where R j denotes the set of subsource indices associated with the jth source andṽ r (n, f ) = v j (n, f ) if and only if r ∈ R j . In the E step, the Wiener filter W j (n, f ) and the second-order cross-moments R s (n, f ) and R xs (n, f ) are computed. In the M step v j (n, f ) and H( f ) are updated. In the ML case, the update for H( f ) in (34) is given by [12] Algorithm 2 SSEM algorithm [12] E step: Update H( f ).

Gaussian prior
The design of a suitable prior over H( f ) is subject to the same practical engineering constraints as above, which leads us to propose a Gaussian prior. We model each col- with mean μ h jr ( f ) and covariance h jr ( f ). Following the assumption in Section 2.2, echoes and reverberation cancel out on average over all orientations in the room so that they appear only in the covariance, while only the part corresponding to direct sound appears in the mean. Without loss of generality, let us select H j ( f ) such that direct sound is concentrated in the first subsource of each source, i.e., the first subsource includes direct sound, echoes, and reverberation, while the other subsources include echoes and reverberation only b . The mean and the covariance of the prior can then be expressed as where the echo and reverberation power of all subsources sums up to the total power in (10): Contrary to the inverse-Wishart prior whose variance is governed by a single hyper-parameter m, this prior involves R j − 1 free hyper-parameters σ 2 r , r = 2, . . . , R j , which makes it potentially more flexible as soon as I ≥ R j > 2. The priors are distinct, however, in the sense that the Gaussian prior does not generalize the inverse-Wishart prior whatever the choice of the hyperparameters.

Learning the hyper-parameters
In order to fit the actual distribution of subsource mixing matrices, we learn these free hyper-parameters from training data. The training data consist of the spatial covariance matrices R p ( f ) computed in Section 3.3 for different positions p, from which we derive the corresponding subsource mixing matrices and covariance the hyper-parameters and the multiplication factors are jointly estimated in the ML sense by maximizing where J α p ( f ) = |α p ( f )| 2I 2 is the Jacobian of the multiplication. Maximization is achieved using a nonlinear optimization technique, where the optimal multiplication factors as a function of the hyper-parameters are found as The values of σ 2 1 and σ 2 2 learned in the setting of Section 5 (R j = I = 2) are displayed in Table 1.

MAP EM update
Similarly to (39), let us denote by h( f ) the vectorization of H( f ) as an IR × 1 column vector. The prior distribution (35) translates into where μ h ( f ) is the IR×1 vector obtained by concatenating μ h jr ( f ) for all j, r; and h ( f ) is the IR × IR block-diagonal matrix whose entries are equal to h jr ( f ) for all j, r.
The MAP update for H( f ) is derived by maximizing the expectation of the log-posterior of the complete data that is equal up to a constant to (see Equation 18 in [12]) for the expression of the expectation of the log-likelihood) where γ is a trade-off hyper-parameter determining the strength of the prior. By rewriting the matrix quadratic form in the log-likelihood term of (46) as a vector quadratic form in terms of h( f ) and by computing the gradient of Q G and equating it to zero, we obtain where . T denotes transposition, ⊗ is the Kronecker product and vec(.) concatenates the columns of a matrix into a single column vector. The mixing matrix H( f ) is then obtained by devectorizing h( f ). This update boils down to the ML update (27) when γ = 0.

Experimental evaluation
We evaluate the performance of the proposed MAP estimation algorithms compared to the conventional ML estimation algorithms and to two baseline approaches for the separation of two-channel convolutive mixtures of three sources. We target a semi-informed scenario where the relative positions of the sources and the microphones are known, but nothing is known about their absolute position in the room nor about the source signals. The reverberant character of the data calls for the use of fullrank spatial covariance matrices and subsource mixing matrices, i.e., R j = 2 for all j. We do not constrain the source variances v j (n, f ), so as to measure the improvement due to the priors alone. The full Matlab code for our experiments can be downloaded from [36].

Data
The proposed priors can be applied in any scenario where the source separation system is to be deployed in fixed, known environments, where the impulse responses can be pre-recorded or simulated. In the following, we use simulated mixtures so as to test a wide range of room reverberation times. The use of simulated data is widespread in audio source separation and it has been shown to yield comparable separation performance to real-world data in http://asp.eurasipjournals.com/content/2013/1/149 general [37]. As a matter of fact, the results of the ML algorithms reported below are comparable to those previously reported on real-world recordings in Figure six in [11]. The positions of the sources and the microphones in the test data are illustrated in Figure 1. The room dimensions are 4.45 × 3.55 × 2.5 m as in [37], and the microphone spacing and the source-to-microphone distances are fixed to d = 5 and r = 50 cm, respectively. We generated room impulse responses via the image method [33] using the Roomsimove toolbox c for four reverberation times: T 60 = 50, 130, 250, or 500 ms, which we convolved with 10 s speech signals sampled at 16 kHz. For each T 60 , 6 mixture signals were generated using speech signals from the Signal Separation and Evaluation Campaign (SiSEC) [37]: 2 mixtures of English and Japanese male speech, 2 mixtures of English and Japanese female speech, and 2 mixtures of male and female speech, resulting in 24 mixture signals in total.
Training data were generated in a similar fashion by simulating room impulse responses for 20 random source directions of arrival for each of 20 random microphone pair positions and orientations for the same d and r as above. This resulted in a total of 400 source image signals indexed by p for each T 60 .

Learned hyper-parameter values
Regarding training, preliminary experiments showed that the functions (21) and (42) are concave in practice. Hence, we maximized them using Matlab's fmincon optimizer (Mathworks Inc., Natick, MA, USA). The resulting hyperparameter values are shown in Table 1.
As expected, the total power of echoes and reverberation σ 2 rev = σ 2 1 + σ 2 2 strongly increases with T 60 , such that the direct-to-reverberant ratio is 14 dB lower when T 60 = 500 ms than when T 60 = 50 ms. The variance of the inverse-Wishart prior, which is inversely related to m [32], decreases with T 60 . The ratio σ 2 1 /σ 2 rev decreases with T 60 , which indicates that the echoic and reverberant part of the impulse responses becomes more and more diffuse.

Tested algorithms and evaluation criteria
In addition to the proposed MAP versions of SIEM and SSEM (MAP inverse-Wishart and MAP Gaussian), we consider the conventional ML versions of these algorithms where the initial values of R j ( f ) and H( f ) are either set to μ R j ( f ) and μ H ( f ) given the scene geometry (ML geom. init) or blindly estimated via hierarchical clustering followed by permutation alignment as detailed in [11] (ML blind init). Subsequent permutation alignment of the sources after convergence of the ML algorithms was found not to improve performance and therefore it is not used in the following. For comparison, we evaluate two baseline approaches, namely, binary masking and 0norm minimization, using the reference software in [38] where the mixing matrix in each frequency bin is estimated by hierarchical clustering followed by permutation alignment [11]. In order to assess the respective impact of the priors on solving the permutation problem and on reducing overfitting, we also report an upper bound on the performance of the MAP and the ML geom. init algorithms with oracle permutation alignment. In each frequency bin, the best possible permutation is found by considering all possible permutations of the estimated sources and by selecting the one that leads to the smallest mean square error compared to the true source signals in that bin.
We computed the STFT with half-overlapping sine windows of length 1,024 and the empirical mixture covariance using a window w nf of size 3 × 3 as in [26]. The trade-off parameter γ does not significantly affect the results but we observed that γ = 100 and γ = 10 are good choices for SIEM and SSEM respectively on average. The number of iterations was fixed to 10 for SIEM and 30 for SSEM, since the convergence of SSEM is typically slower.
The priors did not significantly increase running time. Indeed, the MAP inverse-Wishart update has the same computational complexity as the ML SIEM update. The MAP Gaussian update has greater complexity than the ML SSEM update, but it occurs only once per iteration in each frequency bin, in contrast with the updates in the E step which occur in each time frame. For a typical number of time frames N, the computational complexity is therefore dominated by the E step, regardless of the priors.
We evaluated the separation quality via the signal-todistortion ratio (SDR), signal-to-interference ratio (SIR), signal-to-artifact ratio (SAR), and source image-to-spatial distortion ratio (ISR) criteria in decibels (dB) [37], which account respectively for overall distortion, residual crosstalk, musical noise, and target distortion. These criteria were computed using version 3.0 of the BSS Eval toolbox d and averaged over all sources and all mixtures for each T 60 .

Results for source image-based EM algorithms
The results of the SIEM algorithms and the baselines are compared in Figure 2. Binary masking and 0 -norm minimization provide lower SDR than all other algorithms for all reverberation conditions. ML geom. init results in better performance than ML blind init in terms of SDR and SAR for all T 60 . Overall, MAP inverse-Wishart outperforms all other algorithms for all considered T 60 in terms of SDR, SIR, and ISR. For instance, at T 60 = 250 ms, it improves the SDR by 1.7, 1.6, 2.8, and 4.2 dB compared to ML blind init, ML geom init, binary masking, and 0 -norm minimization, respectively. This confirms the benefit of the proposed inverse-Wishart spatial location prior and the associated MAP algorithm.
These results are shown against the corresponding upper bounds obtained with oracle permutation alignment in Table 2. By comparing the first two lines with the last two lines, it appears that ML geom. init and MAP inverse-Wishart both solve the permutation problem at low reverberation times up to T 60 = 130 ms and that little SDR improvement from 0.2 to 0.4 dB is to be expected from better permutation at higher reverberation times. By contrast, comparison of the third and the fourth lines of the table indicates that even if the permutation problem were solved, MAP inverse-Wishart would still outperform ML geom. init by 1.8 dB at T 60 = 250 ms, which can be attributed to better robustness to overfitting.

Results for subsource-based EM algorithms
The results of the SSEM algorithms are depicted in Figure 3. Again, ML geom init results in significantly better performance than ML blind init in terms of all criteria for all T 60 , and it also offers higher SDR than binary masking and 0 -norm minimization for all T 60 . But the best performance is achieved by MAP Gaussian in terms of These results are shown against the corresponding upper bounds obtained with oracle permutation alignment in Table 3. Again, MAP Gaussian significantly outperforms ML geom. init in the oracle case, meaning that the overfitting issue in ML estimates is better addressed in MAP estimates with a proper prior. On the other hand, it can be seen that MAP Gaussian does not fully solve the permutation problem at medium and high reverberation conditions, but that the gap with the oracle permutation is small and slightly smaller than for ML geom. init.

Conclusions
We considered two classes of source separation algorithms grounded on the emerging Gaussian EM framework. In contrast with classical ML estimation of the spatial parameters, we proposed two priors exploiting a result from the theory of statistical room acoustics and we derived closed-form MAP updates. The SIEM algorithm with an inverse-Wishart prior and the SSEM algorithm with a Gaussian prior were shown to outperform their ML counterparts for all room reverberation times in a semi-informed scenario. We showed that this performance improvement can be mostly attributed to the greater robustness to overfitting of MAP compared to ML. The proposed MAP algorithms also provide a solution to the problem of permutation of the source estimates that is consistent with the statistics of sound fields. The resulting permutations and those obtained by ML estimation initialized with the known geometric setting are, however, comparably good.
The results in this paper can readily be used in certain real-world scenarios where the source positions are known from, e.g., physical constraints or visual input, and the reverberation characteristics can be learned from the environment [21][22][23]. Perhaps more importantly, they constitute a first step towards full Bayesian treatment of this family of models in other blind or semi-blind scenarios in the future. In addition to blind estimation of the source positions and possibly of the microphone distance and directivity [39], robustness to erroneous estimation of these hyper-parameters, and blind estimation of the hyper-parameters σ 2 rev , m and σ 2 r both pose significant challenges, which go beyond the scope of this paper. Future work will concentrate on these challenges by extending blind techniques for room reverberation time estimation [40]. Usage of the proposed Gaussian prior, which is also valid for rank-1 mixing vectors, may also be explored in the context of FDICA, with the difficulty of translating this prior into a prior over the blocking vectors which are usually considered as parameters in this context instead.
Endnotes a Note that in order to yield nonzero likelihood, v j (n, f ) must be nonzero for at least one source j. x (n, f ) in (14) is therefore the sum of Hermitian positive semi-definite matrices, at least one of which is definite, so it is Hermitian positive definite and invertible.
b If several μ h jr ( f ) are nonzero multiples of d j ( f ), a unitary transform can be applied to H j ( f ) in (25) such that only the first one remains nonzero. c http://www.irisa.fr/metiss/members/evincent/ Roomsimove.zip This toolbox provides a command-line interface which, in contrast with the original GUI by D. R. Campbell, allows generation of a large amount of data. d http://bass-db.gforge.inria.fr/bss_eval/