Independent vector analysis based on overlapped cliques of variable width for frequency-domain blind signal separation
- Intae Lee^{1} and
- Gil-Jin Jang^{2}Email author
https://doi.org/10.1186/1687-6180-2012-113
© Lee and Jang; licensee Springer. 2012
Received: 15 June 2011
Accepted: 23 May 2012
Published: 23 May 2012
Abstract
A novel method is proposed to improve the performance of independent vector analysis (IVA) for blind signal separation of acoustic mixtures. IVA is a frequency-domain approach that successfully resolves the well-known permutation problem by applying a spherical dependency model to all pairs of frequency bins. The dependency model of IVA is equivalent to a single clique in an undirected graph; a clique in graph theory is defined as a subset of vertices in which any pair of vertices is connected by an undirected edge. Therefore, IVA imposes the same amount of statistical dependency on every pair of frequency bins, which may not match the characteristics of real-world signals. The proposed method allows variable amounts of statistical dependencies according to the correlation coefficients observed in real acoustic signals and, hence, enables more accurate modeling of statistical dependencies. A number of cliques constitutes the new dependency graph so that neighboring frequency bins are assigned to the same clique, while distant bins are assigned to different cliques. The permutation ambiguity is resolved by overlapped frequency bins between neighboring cliques. For speech signals, we observed especially strong correlations across neighboring frequency bins and a decrease in these correlations with an increase in the distance between bins. The clique sizes are either fixed, or determined by the reciprocal of the mel-frequency scale to impose a wider dependency on low-frequency components. Experimental results showed improved performances over conventional IVA. The signal-to-interference ratio improved from 15.5 to 18.8 dB on average for seven different source locations. When we varied the clique sizes according to the observed correlations, the stability of the proposed method increased with a large number of cliques.
Keywords
1 Introduction
where the integer numbers j, M, N, and T are, the microphone number, number of sources, number of microphones, and order of the FIR filter, respectively. The time-domain sequences x_{ j } (t) and s_{ i } (t) are the signals recorded by microphone j and generated by source i, respectively, and a_{ ji } (t) is the coefficient at time t of the FIR filter for the transfer function from source i to microphone j; it is affected by the recording environment, including the source and microphone locations. To ensure that the linear transformation is invertible, the number of sources should be equal to the number of microphones, i.e., N = M[4].
where y^{ b } [k] is a vector of M estimated independent sources and W^{ b } is an M × N matrix. Ideally, when W^{ b } = (A^{ b } ) ^{ - }^{1}, we can perfectly reconstruct the original sources by y^{ b } [k] = (A^{ b } ) ^{ - }^{1}A^{ b }s^{ b } [k] = s^{ b } [k]. However, all frequency-domain ICA algorithms inherently suffer from permutation and scaling ambiguity because they assume different frequency components to be independent [4, 9]. The instantaneous ICA may assign individual frequency bins of a single source to different outputs, so grouping the frequency components of individual source signals is required for the success of the frequency-domain BSS [10]. One of the simplest solutions is smoothing the frequency-domain filter [10–12] at the expense of performance because of the lost frequency resolution. There are other methods for colored signals, such as explicitly matching components with larger inter-frequency correlations of signal envelopes [13–15].
Recently, a method called independent vector analysis (IVA) has been developed to overcome the permutation problem by embedding statistical dependency across different frequency components [16–19]. The joint dependency model assumes that the frequency bins of the acoustic sources have radially symmetric distributions [20]. Because speech signals are known to be spherically invariant random processes in the frequency domain [21], such an assumption seems valid and also results in decent separation results. However, when compared to the frequency-domain ICA followed by perfect permutation correction, the separation performance of IVA using spherically symmetric joint densities is slightly inferior [19]. This suggests that such source priors do not exactly match the distribution of speech signals and that the IVA performance for speech separation can be improved by finding better dependency models [22, 23].
We propose a new dependency model for IVA. The single and fully-connected clique is decomposed into many cliques of smaller sizes. A new objective function is derived to account for strong dependency inside the individual cliques and weak dependency across the cliques by retaining a considerable amount of overlap between adjacent cliques. The clique sizes are either fixed or determined by a mel-scale with its frequency index reversed; the latter was proven to be more robust to the increased number of cliques through simulated 2 × 2 speech separation experiments.
This article is organized as follows. Section 2 explains conventional IVA; Section 3 gives a detailed algorithm of the proposed method to contrast with IVA. Section 4 presents the results of the simulated speech separation experiments, and Section 5 summarizes the proposed method and its future extensions.
2 IVA
The key idea behind IVA is that all of the frequency components of a single source are regarded as a single vector, the components of which are dependent on one another. The independence between source vectors is approximated by a multivariate, joint probability density function (pdf) of the components from each source vector, and the joint pdf is maximized rather than the individual independencies between each frequency bin. The IVA model consists of a set of basic ICA models where the univariate sources across different dimensions have some dependency such that they can be grouped and aligned as a multidimensional variable.
The mixing of the multivariate sources is dimensionally constrained so that a linear mixture model is formulated in each layer. The instantaneous ICA is extended to a formulation with multidimensional variables or vectors, where the mixing process is constrained to the sources on the same horizontal layer or on the same dimensions. The joint dependency within the dependent sources is modeled by a multidimensional pdf, and hence, correct permutation is achieved.
The goal of IVA is optimizing {W^{1}, W^{2}, . . . , W^{ B } } to maximize the independence among the separated sources, {y_{1}, y_{2}, . . . , y _{ M }}, where the independence is approximated by the sum of the log likelihoods of the given data computed by Equation (10). The detailed learning algorithm can be found in [19, 20].
3 Proposed dependency models for IVA
For real-sound sources, it is unreasonable for neighboring and distant frequency components to be assigned the same dependency because the dependency of neighboring frequency components is much stronger than that of distant frequency components. This section describes the proposed dependency models in which the single and fully connected statistical dependency of IVA is decomposed into several cliques whose sizes are set to be fixed or mel-scaled. The details of the proposed models are explained in this section.
3.1 Overlapped cliques of a fixed size
3.2 Overlapped cliques of variable sizes
where b_{ c } is the center-bin number of clique c. The max and min operators ensure that the computed bin numbers are within a valid range.
4 Experiments
where q(i) indicates the separated source index of the i th source and r_{ iq(j) } is the overall impulse response computed by ${r}_{iq\left(j\right)}={\sum}_{m}{w}_{im}^{b}{a}_{mq\left(j\right)}^{b}$. In order to represent how close the estimated ${\mathbf{W}}_{i}^{b}$ was to the inverse of the mixing filters ${\mathbf{A}}_{j}^{b}$, the SIR numbers were measured in decibels, because the acoustic signal power ratio is in the log scale [26]. The higher SIR is, the closer the result is to perfect separation.
The "CR" number in each of the clique designs in Figure 9 is the ratio of the sum of correlation coefficients enclosed by the union of all the cliques to the sum of the total correlation coefficients. It approaches unity as the enclosed region approaches the total area. The correlation map is identical to Figure 4A from the speech of female 1, who was one of the input sources of our experiments. The CR number does not account for the separation performance directly but roughly shows how well a clique design models the dependence of the frequency bins.
Separation performances in SIR (dB)
Exp. number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | Average | ||
---|---|---|---|---|---|---|---|---|---|---|
Source location | A,H | B,G | E,G | H,J | C,D | E,F | H,I | SIR | Iter. | CR |
IVA | 16.5 | 17.5 | 16.6 | 12.0 | 15.5 | 15.2 | 15.0 | 15.5 | 936 | 1.000 |
LIN2 | 21.5 | 19.6 | 19.3 | 14.2 | 17.2 | 18.7 | 17.1 | 18.2 | 674 | 0.905 |
LIN4 | 22.0 | 19.6 | 19.3 | 14.7 | 17.4 | 19.2 | 18.4 | 18.7 | 397 | 0.677 |
LIN8 | 22.7 | 19.9 | 19.5 | 14.9 | 17.5 | 19.2 | 18.2 | 18.8 | 544 | 0.443 |
LIN12 | 7.3 | 18.8 | 9.0 | 5.8 | 17.6 | 19.6 | 18.8 | 13.9 | 468 | 0.333 |
LIN16 | 11.8 | 1.8 | 10.1 | 8.2 | 16.9 | 17.1 | 18.3 | 12.0 | 493 | 0.272 |
MEL2 | 19.4 | 19.0 | 18.5 | 13.5 | 16.7 | 16.5 | 15.8 | 17.1 | 825 | 0.890 |
MEL4 | 22.0 | 19.8 | 19.4 | 14.5 | 17.6 | 19.2 | 18.2 | 18.7 | 543 | 0.663 |
MEL8 | 22.3 | 19.8 | 19.3 | 14.7 | 17.3 | 19.1 | 18.6 | 18.7 | 408 | 0.466 |
MEL12 | 20.4 | 18.7 | 19.4 | 14.8 | 17.2 | 19.4 | 18.2 | 18.3 | 922 | 0.356 |
MEL16 | 20.9 | 19.0 | 18.6 | 14.9 | 17.5 | 19.2 | 18.2 | 18.3 | 1000 | 0.291 |
5 Conclusions
The totally spherical dependency model of IVA was relaxed by the dependency models of chained cliques. The new clique designs are advantageous because the weak dependency among distant frequencies is modeled by indirect dependency propagation, which helps in finding a better local solution compared to the original IVA, where the same amount of dependency is assigned to any pair of frequency bins. In this article, two types of non-spherical models are proposed. The first uses the same number of frequency bins for all of the cliques, while the other varies the number of frequency bins in reversed mel-scales based on the measured correlation coefficients between different frequency bins. Both dependency models achieved higher source separation performance and faster convergence to correct solutions owing to more accurate modeling of the statistical dependency. For simulated mixtures of male and female speech signals, both models obtained the highest performance when the number of cliques was set to 4 or 8. When the clique size was fixed, the performance degraded drastically for more than eight cliques. However, when the clique size was determined by the mel-scales, the same level of performance was kept at the expense of convergence rate. This implies the presence of up to 16 independent units in speech signals along the mel-scale frequency axis. One of the ongoing research issues is finding more flexible dependency models, such as instantaneously varying the dependency graph based on the correlation coefficients measured from the input signals or on their harmonic structures. Another research issue is finding appropriate dependency models for natural sounds because the dependency among the frequency components may not be related to the mel-scale.
Declarations
Acknowledgements
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2010-0025642), and by the Ministry of Knowledge Economy, Korea (2008-S-019-02, Development of Portable Korean-English Automatic Speech Translation Technology).
Authors’ Affiliations
References
- Stephens RB: AE Bate. In Acoustics and Vibrational Physics. Edward Arnold Publishers, London; 1966.Google Scholar
- Allen JB, Berkley DA: Image method for efficiently simulating small room acoustics. J Acoust Soc Am 1979, 65: 943-950. 10.1121/1.382599View ArticleGoogle Scholar
- Gardner WG: The virtual acoustic room. Master's thesis, MIT 1992.Google Scholar
- Bell AJ, Sejnowski TJ: An information maximization approach to blind separation and blind deconvolution. Neural Comput 1995, 7(6):1129-1159. 10.1162/neco.1995.7.6.1129View ArticleGoogle Scholar
- Yellin D, Weinstein E: Multichannel signal separation: methods and analysis. IEEE Trans Signal Process 1996, 44: 106-118. 10.1109/78.482016View ArticleGoogle Scholar
- Torkkola K: Blind separation of convolved sources based on information maximization. In Proc IEEE Int Workshop on Neural Networks for Signal Processing. Kyoto, Japan; 1996:423-432.Google Scholar
- Lambert R: Multichannel blind deconvolution: FIR matrix algebra and separation of multipath mixtures. PhD thesis. University of Southern California; 1996.Google Scholar
- Lee TW, Bell AJ, Lambert R: Blind separation of delayed and convolved sources. Adv Neural Inf Process Syst 1997, 9: 758-764.Google Scholar
- Hyvärinen A, Oja E: Independent Component Analysis. John Wiley and Sons, New York; 2002.MATHGoogle Scholar
- Smaragdis P: Blind separation of convolved mixtures in the frequency domain. Neurocomputing 1998, 22: 21-34. 10.1016/S0925-2312(98)00047-2View ArticleMATHGoogle Scholar
- Parra L, Spence C: Convolutive blind separation of non-stationary sources. IEEE Trans Speech Audio Process 2000, 8(3):320-327. 10.1109/89.841214View ArticleMATHGoogle Scholar
- Asano F, Ikeda S, Ogawa M, Asoh H, Kitawaki N: A combined approach of array processing and independent component analysis for blind separation of acoustic signals. Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing, Salt Lake City, Utah 2001, 5: 2729-2732.Google Scholar
- Anemueller J, Kollmeier B: Amplitude modulation decorrelation for convolutive blind source separation. In Proc Int Conf on Independent Component Analysis and Blind Source Separation. Helsinki, Finland; 2000:215-220.Google Scholar
- Murata N, Ikeda S, Ziehe A: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 2001, 41: 1-24. 10.1016/S0925-2312(00)00345-3View ArticleMATHGoogle Scholar
- Anemueller J, Sejnowski TJ, Makeig S: Complex independent component analysis of frequency-domain electroencephalographic data. Neural Netw 2003, 16(9):1311-1323. 10.1016/j.neunet.2003.08.003View ArticleGoogle Scholar
- Hiroe A: Solution of permutation problem in frequency domain ICA, using multivariate probability density functions. Lecture Notes in Computer Science 2006, 3889: 601-608. 10.1007/11679363_75View ArticleMATHGoogle Scholar
- Lee I, Kim T, Lee TW: Complex FastIVA: a robust maximum likelihood approach of MICA for convolutive BSS. Lecture Notes in Computer Science 2006, 3889: 625-632. 10.1007/11679363_78View ArticleMATHGoogle Scholar
- Lee I, Kim T, Lee TW: Independent Vector Analysis for Convolutive Blind Speech Separation. Volume chap 6. Springer, New York; 2007:169-192.View ArticleGoogle Scholar
- Kim T, Attias H, Lee SY, Lee TW: Blind source separation exploiting higher-order frequency dependencies. IEEE Trans Audio Speech Lang Process 2007, 15: 70-79.View ArticleGoogle Scholar
- Lee I, Lee TW: On the assumption of spherical symmetry and sparseness for the frequency-domain speech model. IEEE Trans Speech Audio Lang Process 2007, 15(5):1521-1528.View ArticleGoogle Scholar
- Brehm H, Stammler W: Description and generation of spherically invariant speech-model signals. Signal Process 1987, 12(2):119-141. 10.1016/0165-1684(87)90001-6View ArticleGoogle Scholar
- Lee I, Jang GJ, Lee TW: Independent vector analysis using densities represented by chain-like overlapped cliques in graphical models for separation of convolutedly mixed signals. Electron Lett 2009, 45(13):710-711. 10.1049/el.2009.0945View ArticleGoogle Scholar
- Jang GJ, Lee IT, Lee TW: Independent vector analysis using non-spherical joint densities for the separation of speech signals. In Proc IEEE Int Conf on Acoustics, Speech, and Signal Processing. Volume 2. Honolulu, Hawaii; 2007:629-632.Google Scholar
- Amari SI, Cichocki A, Yang HH: A new learning algorithm for blind signal separation. Adv Neural Inf Process Syst 1996, 8: 757-763.Google Scholar
- Matsuoka K, Nakashima S: Minimal distortion principle for blind source separation. In Proc Int Conf on Independent Component Analysis and Blind Source Separation. San Diego, California; 2001:722-727.Google Scholar
- O'Shaughnessy D: Speech Communication: Human and Machine. Addison-Wesley, New York; 1987.MATHGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.