- Research Article
- Open access
- Published:
Permutation Correction in the Frequency Domain in Blind Separation of Speech Mixtures
EURASIP Journal on Advances in Signal Processing volume 2006, Article number: 075206 (2006)
Abstract
This paper presents a method for blind separation of convolutive mixtures of speech signals, based on the joint diagonalization of the time varying spectral matrices of the observation records. The main and still largely open problem in a frequency domain approach is permutation ambiguity. In an earlier paper of the authors, the continuity of the frequency response of the unmixing filters is exploited, but it leaves some frequency permutation jumps. This paper therefore proposes a new method based on two assumptions. The frequency continuity of the unmixing filters is still used in the initialization of the diagonalization algorithm. Then, the paper introduces a new method based on the time-frequency representations of the sources. They are assumed to vary smoothly with frequency. This hypothesis of the continuity of the time variation of the source energy is exploited on a sliding frequency bandwidth. It allows us to detect the remaining frequency permutation jumps. The method is compared with other approaches and results on real world recordings demonstrate superior performances of the proposed algorithm.
References
Parra LC, Spence C: Convolutive blind separation of non-stationary sources. IEEE Transactions on Speech and Audio Processing 2000, 8(3):320–327. 10.1109/89.841214
Smaragdis P: Blind separation of convolved mixtures in the frequency domain. Proceedings of the International ICSC Workshop on Independence & Artificial Neural Networks (I&ANN '98), February 1998, Tenerife, Spain 9–10.
Wu H-C, Principe JC: Simultaneous diagonalization in the frequency domain (SDIF) for source separation. Proceedings of the 1st International Conference on Independent Component Analysis and Signal Separation (ICA '99), January 1999, Aussois, France 245–250.
Mukai R, Araki S, Makino S: Separation and dereverberation performance of frequency domain blind source separation. Proceedings of the 3rd International Conference on Independent Component Analysis and Blind Signal Separation (ICA '01), December 2001, San Diego, Calif, USA 230–235.
Pham D-T, Cardoso J-F: Blind separation of instantaneous mixtures of nonstationary sources. IEEE Transactions on Signal Processing 2001, 49(9):1837–1848. 10.1109/78.942614
Pham D-T, Servière Ch, Boumaraf H: Blind separation of convolutive audio mixtures using nonstationarity. Proceedings of the 4th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan 981–986.
Pham D-T, Servière Ch, Boumaraf H: Blind separation of speech mixtures based on nonstationarity. Proceedings of 7th International Symposium on Signal Processing and Its Applications (ISSPA '03), July 2003, Paris, France 2: 73–76.
Matsuoka K, Nakashima S: Minimal distortion principle for blind source separation. Proceedings of the 3rd International Conference on Independent Component Analysis and Blind Signal Separation (ICA '01), December 2001, San Diego, Calif, USA 722–727.
Murata N, Ikeda S, Ziehe A: An approach to blind source separation based on temporal structure of speech signals. Neurocomputing 2001, 41(1–4):1–24.
Sawada H, Winter S, Mukai R, Araki S, Makino S: Estimating the number of sources for frequency-domain blind source separation. Proceedings of the 5th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '04), September 2004, Granada, Spain 610–617.
Torkkola K: Blind separation for audio signals—Are we there yet? Proceedings of the 1st International Workshop on Independent Component Analysis and Signal Separation (ICA '99), January 1999, Aussois, France 239–244.
Westner A: Object-based audio capture: separating acoustically mixed sounds, M.S. thesis. Massachusetts Institute of Technology, Cambridge, Mass, USA; 1998.
Parra LC, Spence C: On-line convolutive blind source separation of non-stationary signals. Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology 2000, 26(1–2):39–46.
Anemüller J, Kollmeier B: Amplitude modulation decorrelation for convolutive blind source separation. Proceedings of the 2nd International Workshop on Independent Component Analysis and Blind Signal Separation (ICA '00), June 2000, Helsinki, Finland 215–220.
Wang W, Chambers JA, Sanei S: A novel hybrid approach to the permutation problem of frequency domain blind source separation. Proceedings of the 5th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '04), September 2004, Granada, Spain 532–539.
Ikeda S, Murata N: A method of blind separation based on temporal structure of signals. Proceedings of the 5th International Conference on Neural Information Processing (ICONIP '98), October 1998, Kitakyushu, Japan 737–742.
Servière Ch, Pham D-T: A novel method for permutation correction in frequency-domain in blind separation of speech mixtures. Proceedings of the 5th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '04), September 2004, Granada, Spain 807–815.
Asano F, Ikeda S, Ogawa M, Asoh H, Kitawaki N: A combined approach of array processing and independent component analysis for blind separation of acoustic signals. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '01), May 2001, Salt Lake City, Utah, USA 5: 2729–2732.
Kamata K, Hu X, Kobatake H: A new approach to the permutation problem in frequency domain blind source separation. Proceedings of the 5th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '04), September 2004, Granada, Spain 849–856.
Ikram MZ, Morgan DR: A beamforming approach to permutation alignment for multichannel frequency-domain blind speech separation. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA 1: 881–884.
Knaak M, Araki S, Makino S: Geometrically constrained ICA for robust separation of sound mixtures. Proceedings of the 4th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '03), April 2003, Nara, Japan 951–956.
Kurita S, Saruwatari H, Kajita S, Takeda K, Itakura F: Evaluation of blind signal separation method using directivity pattern under reverberant conditions. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '00), June 2000, Istanbul, Turkey 5: 3140–3143.
Mitianoudis N, Davies M: Permutation alignment for frequency domain ICA using subspace beamforming methods. Proceedings of the 5th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '04), September 2004, Granada, Spain 669–676.
Mukai R, Sawada H, Araki S, Makino S: Frequency domain blind source separation for many speech signals. Proceedings of the 5th International Conference on Independent Component Analysis and Blind Signal Separation (ICA '04), September 2004, Granada, Spain 461–469.
Parra LC, Alvino CV: Geometric source separation: merging convolutive source separation with geometric beamforming. IEEE Transactions on Speech and Audio Processing 2002, 10(6):352–362. 10.1109/TSA.2002.803443
Saruwatari H, Kawamura T, Shikano K: Fast-convergence algorithm for ICA-based blind source separation using array signal processing. Proceedings of the IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (WASPAA '01), October 2001, New Platz, NY, USA 91–94.
Soon VC, Tong L, Huang YF, Liu R: A robust method for wideband signal separation. Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS '93), May 1993, Chicago, Ill, USA 1: 703–706.
Matsuoka K, Ohya M, Kawamoto M: A neural net for blind separation of nonstationary signals. Neural Networks 1995, 8(3):411–419. 10.1016/0893-6080(94)00083-X
Pham D-T: Joint approximate diagonalization of positive definite Hermitian matrices. SIAM Journal on Matrix Analysis and Applications 2001, 22(4):1136–1152. 10.1137/S089547980035689X
Oppenheim AV, Schafer RW: Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, USA; 1975.
Sawada H, Muaki R, Araki S, Makino S: A robust and precise method for solving the permutation problem of frequency-domain blind source separation. IEEE Transactions on Speech and Audio Processing 2004, 12(5):530–538. 10.1109/TSA.2004.832994
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Servière, C., Pham, D. Permutation Correction in the Frequency Domain in Blind Separation of Speech Mixtures. EURASIP J. Adv. Signal Process. 2006, 075206 (2006). https://doi.org/10.1155/ASP/2006/75206
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1155/ASP/2006/75206