- Research Article
- Open Access

# Musical-Noise Analysis in Methods of Integrating Microphone Array and Spectral Subtraction Based on Higher-Order Statistics

- Yu Takahashi
^{1}Email author, - Hiroshi Saruwatari
^{1}, - Kiyohiro Shikano
^{1}and - Kazunobu Kondo
^{2}

**2010**:431347

https://doi.org/10.1155/2010/431347

© Yu Takahashi et al. 2010

**Received:**5 August 2009**Accepted:**16 March 2010**Published:**26 April 2010

## Abstract

We conduct an objective analysis on musical noise generated by two methods of integrating microphone array signal processing and spectral subtraction. To obtain better noise reduction, methods of integrating microphone array signal processing and nonlinear signal processing have been researched. However, nonlinear signal processing often generates musical noise. Since such musical noise causes discomfort to users, it is desirable that musical noise is mitigated. Moreover, it has been recently reported that higher-order statistics are strongly related to the amount of musical noise generated. This implies that it is possible to optimize the integration method from the viewpoint of not only noise reduction performance but also the amount of musical noise generated. Thus, we analyze the simplest methods of integration, that is, the delay-and-sum beamformer and spectral subtraction, and fully clarify the features of musical noise generated by each method. As a result, it is clarified that a specific structure of integration is preferable from the viewpoint of the amount of generated musical noise. The validity of the analysis is shown via a computer simulation and a subjective evaluation.

## Keywords

- Input Signal
- Independent Component Analysis
- Microphone Array
- Tonal Component
- Spectral Subtraction

## 1. Introduction

There have recently been various studies on microphone array signal processing [1]; in particular, the delay-and-sum (DS) [2–4] array and the adaptive beamformer [5–7] are the most conventionally used microphone arrays for speech enhancement. Moreover, many methods of integrating microphone array signal processing and nonlinear signal processing such as spectral subtraction (SS) [8] have been studied with the aim of achieving better noise reduction [9–15]. It has been well demonstrated that such integration methods can achieve higher noise reduction performance than that obtained using conventional adaptive microphone arrays [13] such as the Griffith-Jim array [6]. However, a serious problem exists in such methods: artificial distortion (so-called *musical noise* [16]) due to nonlinear signal processing. Since the artificial distortion causes discomfort to users, it is desirable that musical noise is controlled through signal processing. However, in almost all nonlinear noise reduction methods, the strength parameter to mitigate musical noise in nonlinear signal processing is determined heuristically. Although there have been some studies on reducing musical noise [16] and on nonlinear signal processing with less musical noise [17], evaluations have mainly depended on subjective tests by humans, and no objective evaluations have been performed to the best of our knowledge.

In our recent study, it was reported that the amount of generated musical noise is strongly related to the difference between higher-order statistics (HOS) before and after nonlinear signal processing [18]. This fact makes it possible to analyze the amount of musical noise arising through nonlinear signal processing. Therefore, on the basis of HOS, we can establish a mathematical metric for the amount of musical noise generated in an objective manner. One of the authors has analyzed single-channel nonlinear signal processing based on the objective metric and clarified the features of the amount of musical noise generated [18, 19]. In addition, this objective metric suggests the possibility that methods of integrating microphone array signal processing and nonlinear signal processing can be optimized from the viewpoint of not only noise reduction performance but also the sound quality according to human hearing. As a first step toward achieving this goal, in this study we analyze the simplest case of the integration of microphone array signal processing and nonlinear signal processing by considering the integration of DS and SS. As a result of the analysis, we clarify the musical-noise generation features of two types of methods on integration of microphone array signal processing and SS.

- (i)
The amount of musical noise generated strongly depends on not only the oversubtraction parameter of SS but also

*the statistical characteristics of the input signal*. - (ii)
Except for the specific condition that the input signal is Gaussian, the noise reduction performances of the two methods are not equivalent even if we set the same SS parameters.

- (iii)
Under equivalent noise reduction performance conditions, chSS+BF generates less musical noise than BF+SS for almost all practical cases.

The most important contribution of this paper is that these findings are mathematically proved. In particular, the amount of musical noise generated and the noise reduction performance resulting from the integration of microphone array signal processing and SS are analytically formulated on the basis of HOS. Although there have been many studies on optimization methods based on HOS [21], this is the first time they have been used for musical-noise assessment. The validity of the analysis based on HOS is demonstrated via a computer simulation and a subjective evaluation by humans.

The rest of the paper is organized as follows. In Section 2, the two methods of integrating microphone array signal processing and SS are described in detail. In Section 3, the metric based on HOS used for the amount of musical noise generated is described. Next, the musical-noise analysis of SS, microphone array signal processing, and their integration methods are discussed in Section 4. In Section 5, the noise reduction performances of the two integration methods are discussed, and both methods are compared under equivalent noise reduction performance conditions in Section 6. Moreover, the result of a computer simulation and experimental results are given in Section 7. Following a discussion of the results of the experiments, we give our conclusions in Section 8.

## 2. Methods of Integrating Microphone Array Signal Processing and SS

In this section, the formulations of the two methods of integrating microphone array signal processing and SS are described. First, BF+SS, which is a typical method of integration, is formulated. Next, an alternative method of integration, chSS+BF, is introduced.

### 2.1. Sound-Mixing Model

### 2.2. SS after Beamforming

In BF+SS, the single-channel target-speech-enhanced signal is first obtained by beamforming, for example, by DS. Next, single-channel noise estimation is performed by a beamforming technique, for example, null beamformer [22] or adaptive beamforming [1]. Finally, we extract the resultant target-speech-enhanced signal via SS. The full details of signal processing are given below.

where is the filter coefficient vector of the null beamformer [22] that steers the null directivity to the speech direction , and is the gain adjustment term, which is determined in a speech break period. Since the null beamformer can remove the speech signal by steering the null directivity to the speech direction, we can estimate the noise signal. Moreover, a method exists in which independent component analysis (ICA) is utilized as a noise estimator instead of the null beamformer [15].

### 2.3. Channelwise SS before Beamforming

where is the target-speech-enhanced signal obtained by SS at a specific channel and is the estimated noise signal in the th channel. For instance, the multichannel noise can be estimated by single-input multiple-output ICA (SIMO-ICA) [24] or a combination of ICA and the projection back method [25]. These techniques can provide the multichannel estimated noise signal, unlike traditional ICA. SIMO-ICA can separate mixed signals not into monaural source signals but into SIMO-model signals at the microphone. Here SIMO denotes the specific transmission system in which the input signal is a single source signal and the outputs are its transmitted signals observed at multiple microphones. Thus, the output signals of SIMO-ICA maintain the rich spatial qualities of the sound sources [24] Also the projection back method provides SIMO-model-separated signals using the inverse of an optimized ICA filter [25].

where is the final output of chSS+BF.

Such a chSS+BF structure performs DS after (multichannel) SS. Since DS is basically signal processing in which the summation of the multichannel signal is taken, it can be considered that interchannel smoothing is applied to the multichannel spectral-subtracted signal. On the other hand, the resultant output signal of BF+SS remains as it is after SS. That is to say, it is expected that the output signal of chSS+BF is more natural (contains less musical noise) than that of BF+SS. In the following sections, we reveal that chSS+BF can output a signal with less musical noise than BF+SS in almost all cases on the basis of HOS.

## 3. Kurtosis-Based Musical-Noise Generation Metric

### 3.1. Introduction

It has been reported by the authors that the amount of musical noise generated is strongly related to the difference between the kurtosis of a signal before and after signal processing. Thus, in this paper, we analyze the amount of musical noise generated through BF+SS and chSS+BF on the basis of the change in the measured kurtosis. Hereinafter, we give details of the kurtosis-based musical-noise metric.

### 3.2. Relation between Musical-Noise Generation and Kurtosis

Hence, we introduce kurtosis to quantify the isolated spectral components, and we focus on the changes in kurtosis. Since isolated spectral components are dominant, they are heard as tonal sounds, which results in our perception of musical noise. Therefore, it is expected that obtaining the number of tonal components will enable us to quantify the amount of musical noise. However, such a measurement is extremely complicated; so instead we introduce a simple statistical estimate, that is, kurtosis.

This strategy allows us to obtain the characteristics of tonal components. The adopted kurtosis can be used to evaluate the width of the probability density function (p.d.f.) and the weight of its tails; that is, kurtosis can be used to evaluate the percentage of tonal components among the total components. A larger value indicates a signal with a heavy tail in its p.d.f., meaning that it has a large number of tonal components. Also, kurtosis has the advantageous property that it can be easily calculated in a concise algebraic form.

### 3.3. Kurtosis

where
denotes the p.d.f. of
. Note that this
is not a central moment *but a raw moment*. Thus, (7) is not kurtosis according to the mathematically strict definition, but a modified version; however, we refer to (7) as kurtosis in this paper.

### 3.4. Kurtosis Ratio

*kurtosis ratio*[18] to measure the kurtosis change:

where is the kurtosis of the processed signal and is the kurtosis of the input signal. A larger kurtosis ratio 1) indicates a marked increase in kurtosis as a result of processing, implying that a larger amount of musical noise is generated. On the other hand, a smaller kurtosis ratio ( 1) implies that less musical noise is generated. It has been confirmed that this kurtosis ratio closely matches the amount of musical noise in a subjective evaluation based on human hearing [18].

## 4. Kurtosis-Based Musical-Noise Analysis for Microphone Array Signal Processing and SS

### 4.1. Analysis Flow

- (i)
First, an analysis on musical-noise generation in BF+SS and chSS+BF based on kurtosis that does not take noise reduction performance into account is performed in this section.

- (ii)
The noise reduction performance is analyzed in Section 5, and we reveal that the noise reduction performances of BF+SS and chSS+BF are not equivalent. Moreover, a flooring parameter designed to align the noise reduction performances of BF+SS and chSS+BF is also derived to ensure the fair comparison of BF+SS and chSS+BF.

- (iii)
The kurtosis-based comparison between BF+SS and chSS+BF under the same noise reduction performance conditions is carried out in Section 6.

In the analysis in this section, we first clarify how kurtosis is affected by SS. Next, the same analysis is applied to DS. Finally, we analyze how kurtosis is increased by BF+SS and chSS+BF. Note that our analysis contains no limiting assumptions on the statistical characteristics of noise; thus, all noises including Gaussian and super-Gaussian noise can be considered.

### 4.2. Signal Model Used for Analysis

Musical-noise components generated from the noise-only period are dominant in spectrograms (see Figure 4); hence, we mainly focus our attention on musical-noise components originating from input noise signals.

where is the real part of the complex-valued signal and is its imaginary part, which are independent and identically distributed (i.i.d.) with each other, and the superscript expresses complex conjugation. Thus, the power-domain signal is the sum of two squares of random variables with the same distribution.

Hereinafter, let and be the signals after DFT analysis of signal in a specific microphone , , and we suppose that the statistical properties of equal to and . Moreover, we assume the following; is i.i.d. in each channel, the p.d.f. of is symmetrical, and its mean is zero. These assumptions mean that the odd-order cumulants and moments are zero except for the first order.

Although if is a Gaussian signal, note that the kurtosis of a Gaussian signal in the power spectral domain is 6. This is because a Gaussian signal in the time domain obeys the chi-square distribution with two degrees of freedom in the power spectral domain; for such a chi-square distribution, .

### 4.3. Resultant Kurtosis after SS

The detailed derivation of (14) is given in Appendix B. Although Uemura et al. have given an approximated form (lower bound) of the kurtosis after SS in [18], (14) involves no approximation throughout its derivation. Furthermore, (14) takes into account *the effect of the flooring technique* unlike [18].

The relation between the theoretical kurtosis ratio and the kurtosis of the original input signal is shown in Figure 6(b). In the figure, is fixed to 0.0. It is revealed that the kurtosis ratio after SS rapidly decreases as the input kurtosis increases, even with the same oversubtraction parameter . Therefore, the kurtosis ratio after SS, which is related to the amount of musical noise, strongly depends on the statistical characteristics of the input signal. That is to say, SS generates a larger amount of musical noise for a Gaussian input signal than for a super-Gaussian input signal. This fact has been reported in [18].

### 4.4. Resultant Kurtosis after DS

In this section, we analyze the kurtosis after DS, and we reveal that DS can reduce the kurtosis of input signals. Since we assume that the statistical properties of or are the same as that of , the effect of DS on the change in kurtosis can be derived from the cumulants and moments of .

where is the th-order derivative of .

The detailed derivation of (22) is described in Appendix C.

When input signals involve interchannel correlation, the relation between input kurtosis and output kurtosis after DS approaches that for only one microphone. If all input signals are identical signals, that is, the signals are completely correlated, the output after DS also becomes the same as the input signal. In such a case, the effect of DS on the change in kurtosis corresponds to that for only one microphone. However, the interchannel correlation is not equal to one within all frequency subbands for a diffuse noise field that is a typically considered noise field. It is well known that the intensity of the interchannel correlation is strong in lower-frequency subbands and weak in higher-frequency subbands for the diffuse noise field [1]. Therefore, in lower-frequency subbands, it can be expected that DS does not significantly reduce the kurtosis of the signal.

As it is well known that the interchannel correlation for a diffuse noise field between two measurement locations can be expressed by the sinc function [1], we can state how array signal processing is affected by the interchannel correlation. However, we cannot know exactly how cumulants are changed by the interchannel correlation because (18) only holds when signals are mutually independent. Therefore, we cannot formulate how kurtosis is changed via DS for signals with interchannel correlation. For this reason, we experimentally investigate the effect of interchannel correlation in the following.

If input noise signals contain no interchannel correlation, the distance between microphones does not affect the results. That is to say, the kurtosis change via DS can be well fit to (23). Otherwise, in lower-frequency subbands, it is expected that the mitigation effect of kurtosis by DS degrades with decreasing distance between microphones. This is because the interchannel correlation in lower-frequency subbands increases with decreasing distance between microphones. In higher-frequency subbands, the effect of the distance between microphones is thought to be small.

### 4.5. Resultant Kurtosis: BF+SS versus chSS+BF

In the previous subsections, we discussed the resultant kurtosis after SS and DS. In this subsection, we analyze the resultant kurtosis for two types of composite systems, that is, BF+SS and chSS+BF, and compare their effect on musical-noise generation. As described in Section 3, it is expected that a smaller increase in kurtosis leads to a smaller amount of musical noise generated.

where we use (23).

We should compare and here. However, one problem still remains: comparison under equivalent noise reduction performance; the noise reduction performances of BF+SS and chSS+BF are not equivalent as described in the next section. Moreover, the design of a flooring parameter so that the noise reduction performances of both methods become equivalent will be discussed in the next section. Therefore, and will be compared in Section 6 under equivalent noise reduction performance conditions.

## 5. Noise Reduction Performance Analysis

In the previous section, we did not discuss the noise reduction performances of BF+SS and chSS+BF. In this section, a mathematical analysis of the noise reduction performances of BF+SS and chSS+BF is given. As a result of this analysis, it is revealed that the noise reduction performances of BF+SS and chSS+BF are not equivalent even if the same parameters are set in the SS part. We then derive a flooring-parameter design strategy for aligning the noise reduction performances of BF+SS and chSS+BF.

### 5.1. Noise Reduction Performance of SS

where is the power-domain (noise) signal of the input and is the power-domain (noise) signal of the output after processing.

This corresponds to the mean of a random variable with a gamma distribution.

### 5.2. Noise Reduction Performance of DS

### 5.3. Resultant Noise Reduction Performance: BF+SS versus chSS+BF

In the previous subsections, the noise reduction performances of SS and DS were discussed. In this subsection, we derive the resultant noise reduction performances of the composite systems of SS and DS, that is, BF+SS and chSS+BF.

where is defined by (24).

This discussion implies that and are not equivalent under some conditions. Thus the kurtosis-based analysis described in Section 4 is biased and requires some adjustment. In the following subsection, we will discuss how to align the noise reduction performances of BF+SS and chSS+BF.

### 5.4. Flooring-Parameter Design in BF+SS for Equivalent Noise Reduction Performance

In this section, we describe the flooring-parameter design in BF+SS so that and become equivalent.

The detailed derivation of (38) is given in Appendix E. By replacing in (3) with this new flooring parameter , we can align and to ensure a fair comparison.

## 6. Output Kurtosis Comparison under Equivalent NRP Condition

In this section, using the new flooring parameter for BF+SS, , we compare the output kurtosis of BF+SS and chSS+BF.

## 7. Experiments and Results

### 7.1. Computer Simulations

where is the power spectra of the residual noise signal after processing, and is the power spectra of the original noise signal before processing. This kurtosis ratio indicates the extent to which kurtosis is increased with processing. Thus, a smaller kurtosis ratio is desirable. Moreover, the noise reduction performance is measured using (28).

Speech distortion comparison of chSS+BF and BF+SS on the basis of CD for four-microphone case.

Input noise type | chSS+BF | BF+SS |
---|---|---|

Gaussian | 6.15 dB | 6.45 dB |

Super-Gaussian | 6.17 dB | 5.12 dB |

- (i)
Although BF+SS can reduce the amount of musical noise by employing a larger flooring parameter, it leads to a deterioration of the noise reduction performance.

- (ii)
In contrast, chSS+BF can reduce the kurtosis ratio, which corresponds to the amount of musical noise generated, without degradation of the noise reduction performance.

- (iii)
Under the same level of noise reduction performance, the amount of musical noise generated via chSS+BF is less than that generated via BF+SS.

- (iv)
Thus, the chSS+BF structure is preferable from the viewpoint of musical-noise generation.

- (v)
However, the noise reduction performance of BF+SS is superior to that of chSS+BF for a super-Gaussian signal when the same parameters are set in the SS part for both methods.

- (vi)
These results imply a trade-off between the amount of musical noise generated and the noise reduction performance. Thus, we should use an appropriate structure depending on the application.

These results should be applicable under different SNR conditions because our analysis is independent of the noise level. In the case of more reverberation, the observed signal tends to become Gaussian because many reverberant components are mixed. Therefore, the behavior of both methods under more reverberant conditions should be similar to that in the case of a Gaussian signal.

### 7.2. Subjective Evaluation

Next, we conduct a subjective evaluation to confirm that chSS+BF can mitigate musical noise. In the evaluation, we presented two signals processed by BF+SS and by chSS+BF to seven male examinees in random order, who were asked to select which signal they considered to contain less musical noise (the so-called AB method). Moreover, we instructed examinees to evaluate only the musical noise and not to consider the amplitude of the remaining noise. Here, the flooring parameter in BF+SS was automatically determined so that the output SNR of BF+SS and chSS+BF was equivalent. We used the preference score as the index of the evaluation, which is the frequency of the selected signal.

In the experiment, three types of noise, (a) artificial spatially uncorrelated white Gaussian noise, (b) recorded railway-station noise emitted from 36 loudspeakers, and (c) recorded human speech emitted from 36 loudspeakers, were used. Note that noises (b) and (c) were recorded in the actual room shown in Figure 14 and therefore include interchannel correlation because they were recordings of actual noise signals.

Each test sample is a 16-kHz-sampled signal, and the target speech is the original speech convoluted with impulse responses recorded in a room with 200 millisecond reverberation (see Figure 14) and to which the above-mentioned recorded noise signal is added. Ten pairs of signals per type of noise, that is, a total of 30 pairs of processed signals, were presented to each examinee.

## 8. Conclusion

In this paper, we analyze two methods of integrating microphone array signal processing and SS, that is, BF+SS and chSS+BF, on the basis of HOS. As a result of the analysis, it is revealed that the amount of musical noise generated via SS strongly depends on the statistical characteristics of the input signal. Moreover, it is also clarified that the noise reduction performances of BF+SS and chSS+BF are different except in the case of a Gaussian input signal. As a result of our analysis under equivalent noise reduction performance conditions, it is shown that chSS+BF reduces musical noise more than BF+SS in almost all practical cases. The results of a computer simulation also support the validity of our analysis. Moreover, by carrying out a subjective evaluation, it is confirmed that the output of chSS+BF is considered to contain less musical noise than that of BF+SS. These analytic and experimental results imply the considerable potential of optimization based on HOS to reduce musical noise.

As a future work, it remains necessary to carry out signal analysis based on more general distributions. For instance, analysis using a generalized gamma distribution [26, 27] can lead to more general results. Moreover, an exact formulation of how kurtosis is changed through DS under a coherent condition is still an open problem. Furthermore, the robustness of BF+SS and chSS+BF against low-SNR or more reverberant conditions is not discussed in this paper. In the future, the discussion should involve not only noise reduction performance and musical-noise generation but also such robustness.

## Appendices

### A. Derivation of (13)

where the variable is replaced with for convenience.

### B. Derivation of (14)

### C. Derivation of (22)

Using (C.1)–(C.3), the power-domain moments can be expressed in terms of the 4th- and 8th-order moments in the time domain. Therefore, to obtain the kurtosis after DS in the power domain, the moments and cumulants after DS up to the 8th order are needed.

where is the th-order raw moment after DS in the time domain.

where is the th-order cumulant in the square domain.

### D. Derivation of (24)

### E. Derivation of (38)

## Declarations

### Acknowledgment

This work was partly supported by MIC Strategic Information and Communications R&D Promotion Programme in Japan.

## Authors’ Affiliations

## References

- Brandstein M, Ward D (Eds):
*Microphone Arrays: Signal Processing Techniques and Applications*. Springer, Berlin, Germany; 2001.Google Scholar - Flanagan JL, Johnston JD, Zahn R, Elko GW: Computer-steered microphone arrays for sound transduction in large rooms.
*Journal of the Acoustical Society of America*1985, 78(5):1508-1518. 10.1121/1.392786View ArticleGoogle Scholar - Omologo M, Matassoni M, Svaizer P, Giuliani D: Microphone array based speech recognition with different talker-array positions.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), September 1997, Munich, Germany*227-230.Google Scholar - Silverman HF, Patterson WR: Visualizing the performance of large-aperture microphone arrays.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '99), 1999*962-972.Google Scholar - Frost O: An algorithm for linearly constrained adaptive array processing.
*Proceedings of the IEEE*1972, 60: 926-935.View ArticleGoogle Scholar - Griffiths LJ, Jim CW: An alternative approach to linearly constrained adaptive beamforming.
*IEEE Transactions on Antennas and Propagation*1982, 30(1):27-34. 10.1109/TAP.1982.1142739View ArticleGoogle Scholar - Kaneda Y, Ohga J: Adaptive microphone-array system for noise reduction.
*IEEE Transactions on Acoustics, Speech, and Signal Processing*1986, 34(6):1391-1400. 10.1109/TASSP.1986.1164975View ArticleGoogle Scholar - Boll S: Suppression of acoustic noise in speech using spectral subtraction.
*IEEE Transactions on Acoustics, Speech and Signal Processing*1979, 27(2):113-120. 10.1109/TASSP.1979.1163209View ArticleGoogle Scholar - Meyer J, Simmer K: Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), 1997*1167-1170.Google Scholar - Fischer S, Kammeyer KD: Broadband beamforming with adaptive post filtering for speech acquisition in noisy environment.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), 1997*359-362.Google Scholar - Mukai R, Araki S, Sawada H, Makino S: Removal of residual cross-talk components in blind source separation using time-delayed spectral subtraction.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), May 2002, Orlando, Fla, USA*1789-1792.Google Scholar - Cho J, Krishnamurthy A: Speech enhancement using microphone array in moving vehicle environment.
*Proceedings of the IEEE Intelligent Vehicles Symposium, April 2003, Graz, Austria*366-371.View ArticleGoogle Scholar - Ohashi Y, Nishikawa T, Saruwatari H, Lee A, Shikano K: Noise robust speech recognition based on spatial subtraction array.
*Proceedings of the International Workshop on Nonlinear Signal and Image Processing, 2005*324-327.Google Scholar - Even J, Saruwatari H, Shikano K: New architecture combining blind signal extraction and modified spectral subtraction for suppression of background noise.
*Proceedings of the International Workshop on Acoustic Echo and Noise Control (IWAENC '08), 2008, Seattle, Wash, USA*Google Scholar - Takahashi Y, Takatani T, Osako K, Saruwatari H, Shikano K: Blind spatial subtraction array for speech enhancement in noisy environment.
*IEEE Transactions on Audio, Speech and Language Processing*2009, 17(4):650-664.View ArticleGoogle Scholar - Jebara SB: A perceptual approach to reduce musical noise phenomenon with Wiener denoising technique.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '06), 2006*3: 49-52.Google Scholar - Ephraim Y, Malah D: Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator.
*IEEE Transactions on Acoustics, Speech, and Signal Processing*1984, 32(6):1109-1121. 10.1109/TASSP.1984.1164453View ArticleGoogle Scholar - Uemura Y, Takahashi Y, Saruwatari H, Shikano K, Kondo K: Automatic optimization scheme of spectral subtraction based on musical noise assessment via higher-order statistics.
*Proceedings of the International Workshop on Acoustic Echo and Noise Control (IWAENC '08), 2008, Seattle, Wash, USA*Google Scholar - Uemura Y, Takahashi Y, Saruwatari H, Shikano K, Kondo K: Musical noise generation analysis for noise reduction methods based on spectral subtraction and MMSE STSA estimatio.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '09), 2009*4433-4436.Google Scholar - Takahashi Y, Uemura Y, Saruwatari H, Shikano K, Kondo K: Musical noise analysis based on higher order statistics for microphone array and nonlinear signal processing.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '09), 2009*229-232.Google Scholar - Comon P: Independent component analysis, a new concept?
*Signal Processing*1994, 36: 287-314. 10.1016/0165-1684(94)90029-9View ArticleMATHGoogle Scholar - Saruwatari H, Kurita S, Takeda K, Itakura F, Nishikawa T, Shikano K: Blind source separation combining independent component analysis and beamforming.
*EURASIP Journal on Applied Signal Processing*2003, 2003(11):1135-1146. 10.1155/S1110865703305104View ArticleMATHGoogle Scholar - Mizumachi M, Akagi M: Noise reduction by paired-microphone using spectral subtraction.
*Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP '98), 1998*2: 1001-1004.Google Scholar - Takatani T, Nishikawa T, Saruwatari H, Shikano K: High-fidelity blind separation of acoustic signals using SIMO-model-based independent component analysis.
*IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences*2004, E87-A(8):2063-2072.Google Scholar - Ikeda S, Murata N: A method of ICA in the frequency domain.
*Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation, 1999*365-371.Google Scholar - Stacy EW: A generalization of the gamma distribution.
*The Annals of Mathematical Statistics*1962, 1187-1192.Google Scholar - Kokkinakis K, Nandi AK: Generalized gamma density-based score functions for fast and flexible ICA.
*Signal Processing*2007, 87(5):1156-1162. 10.1016/j.sigpro.2006.09.012View ArticleMATHGoogle Scholar - Shin JW, Chang J-H, Kim NS: Statistical modeling of speech signals based on generalized gamma distribution.
*IEEE Signal Processing Letters*2005, 12(3):258-261.View ArticleGoogle Scholar - Rabiner L, Juang B:
*Fundamentals of Speech Recognition*. Prentice-Hall PTR; 1993.MATHGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.