# Microphone array power ratio for quality assessment of reverberated speech

- Reuven Berkun
^{1}Email author and - Israel Cohen
^{1}

**2015**:49

https://doi.org/10.1186/s13634-015-0233-y

© Berkun and Cohen; licensee Springer. 2015

**Received: **7 January 2015

**Accepted: **20 May 2015

**Published: **18 June 2015

## Abstract

Speech signals in enclosed environments are often distorted by reverberation and noise. In speech communication systems with several randomly distributed microphones, involving a dynamic speaker and unknown source location, it is of great interest to monitor the perceived quality at each microphone and select the signal with the best quality. Most of existing approaches for quality estimation require prior information or a clean reference signal, which is unfortunately seldom available. In this paper, a practical non-intrusive method for quality assessment of reverberated speech signals is proposed. Using a statistical model of the reverberation process, we examine the energies as measured by unidirectional elements in a microphone array. By measuring the power ratio, we obtain a measure for the amount of reverberation in the received acoustic signals. This measure is then utilized to derive a blind estimation of the direct-to-reverberation energy ratio in the room. The proposed approach attains a simple, reliable, and robust quality measure, shown here through persuasive simulation results.

## Keywords

## 1 Introduction

Speech signals in closed-space environments are often distorted by reverberation and noise. In a speech communication application with several distributed microphones, it is often desired to quantify the amount of reverberation of the perceived signal at each sensor, in order to select the channel with the highest quality or with the least reverberation.

Many prior studies dealt with the problem of measuring the amount of reverberation and assessing the quality of degraded acoustic signals. The most common methods are based on quantifying the system characteristics, herein termed system-based. The most well-known measure is the direct-to-reverberation energy ratio (DRR), which estimates the reverberation using the room impulse response (RIR) [1, 2]. Another popular approach is based on comparing the distorted signal to a clean reference version [2–5]. Unfortunately, neither an estimate or knowledge of the room characteristics nor a clean reference is normally available, especially in real-time systems. Moreover, some of these methods obtain low correlation with subjective quality tests [6] and thus cannot be used as reliable reverberation measures. Recently, various methods have been proposed to estimate the reverberation ratio and its properties given the distorted signal alone [7–11].

Direct approaches for measuring the reverberation are based on the signal power or the signal-to-noise evaluation [11]. However, such approaches are suitable only when the power of the noise or the late reverberation is uniform, which is not always true. Many popular methods model the coherence of the direct sound and the reverberation, and estimate the signal-to-diffuse ratio of the signal. Jeub et al. [10] measured the complex spatial coherence between a pair of microphones but restricted the arrival of the direct sound at the broadside direction of the array. In [12], Thiergart, Del Galdo, and Habets estimated the signal-to-diffuse ratio based on the coherence, by using omnidirectional microphones and without direction-of-arrival assumptions. However, when using omnidirectional microphones, the signals are highly correlated at low frequencies, resulting in a high estimation variance. Later works already segregated the diffuse and direct part by using beamforming or directional microphones, and obtained a more robust estimation of the signal-to-diffuse ratio [9, 13]. Yet, they were tested only by an artificial simulation of diffuse and coherent noise fields and not under real scenarios of reverberant speech signals. Falk, Zheng, and Chan [7] quantified the coloration and reverberation based on an analysis in the modulation spectral domain. Their proposed quality measure was tested with speech signals and was reported to outperform several standard quality and intelligibility measurement algorithms. Goetze et al. [14] compared several measures using subjective listening tests for the assessment of dereverberation algorithms. They showed that most of the signal-based objective measures fail to judge different reverberation distortions, where only one signal-based measure showed high correlation with the subjective rating. They also argued that measures that are based on the impulse response (when available), like the *Clarity* measure (C50) [1], showed much higher correlation with the tests.

In this paper, we address the problem of estimating the quality and the reverberation level of distorted reverberant signals, using the microphone signals alone. Using a directional microphone array, we utilize the directivity pattern of the array elements to segregate the reverberation contribution from the direct signal. We measure the ratio between the energies of the unidirectional sensors and derive an objective signal-based measure for the reverberation quantity. Additionally, we expand this method and derive a reliable blind DRR estimator. Our proposed approach attains a reliable measure with high correlation to various reverberation parameters and outperforms state-of-the-art methods for quality estimation.

This paper is organized as follows. In Section 2, we define the problem. In Sections 3 and 4, we describe the general and the directional-array signal model, respectively. Next, in Section 5, we present our proposed reverberation quantity measure, the directional power ratio, together with a blind estimate for the channel DRR parameter. Simulation and real speech performance results are presented in Section 6. Finally, conclusions are given in Section 7.

## 2 Problem formulation

*s*(

*t*), which convolves with a causal time-invariant room impulse response (RIR)

*h*(

*t*). Then, the measured signal is given by

*v*(

*t*) denotes ambient additive noise, which is assumed to be null at this part of the discussion. The reverberation related to the RIR is divided into two segments [15],

*h*

_{ d }(

*t*) and

*h*

_{ r }(

*t*), such that

where *h*
_{
d
}(*t*) represents the direct path propagation from the source to the microphone, plus some early reflections of the acoustic wave. These reflections usually arrive up to 50 ms after the direct signal and thus are not considered as reverberation. The late part *h*
_{
r
}(*t*) represents later high-order reflections, which are perceived as reverberation. These reflections are incoherent with the direct sound and constitute the main factor for temporal smearing and quality degradation in reverberant rooms. The parameter *T*
_{
r
} defines the segmentation of the RIR (where *t*=0 denotes the arrival time of the direct signal), so that *h*
_{
d
}(*t*) consists of the direct part and some early reflections, while *h*
_{
r
}(*t*) is composed of the late reverberant part.

where *E*
_{
d
} and *E*
_{
r
} are the energies of the direct and reverberated part, respectively, and *T*
_{
d
} is the arrival time of the direct sound to the microphone. For measured responses that undergo sampling, *T*
_{
d
} is usually chosen to be 8–16 ms larger than the approximate arrival time [15], for higher precision.

Accordingly, our objective is to obtain an estimate for the reverberation amount or alternately for the perceived speech quality, based on the received signals alone (without a priori information of the RIR), and to blindly define an objective criterion for the direct-to-reverberation ratio.

## 3 Reverberation signal model

*b*

_{ d }(

*t*) is a white Gaussian noise process, with zero mean and variance of \({\sigma _{d}^{2}}\). The decay rate

*δ*is given by [16]

*T*

_{60}denotes the reverberation decay time to −60 dB. The late reverberant part is modeled by

where *b*
_{
r
}(*t*) is a white Gaussian noise process, with zero mean and variance of \({\sigma _{r}^{2}}\). The direct and the late parts are uncorrelated, i.e., \({\mathbb E} \left \{b_{d}(t) b_{r}(t+\tau)\right \} = 0, \ \forall \tau \).

where \({\mathbb {E}}_{h}\{\cdot \}\) denotes expectation over the stochastic process *h*.

*z*(

*t*) [18]. Relying on the statistical independency of

*s*(

*t*) and

*h*(

*t*), and based on the segmentation described in (3), we get

*T*

_{60}. Accordingly, it is assumed stationary during the measurement period, so that the source autocorrelation can be excluded from the integral in (9), yielding

where \(\lambda _{s}(t) = {\mathbb {E}}_{s} \{s^{2}(t) \}\) denotes the speech energy at time *t*, i.e., the current variance of the stochastic quasi-stationary speech process.

*T*

_{ r }=

*T*

_{ d }, based on the generalized statistical model, we can easily deduce the direct and late part energies and express (4) as

## 4 Directional array response

In this section, we expand the RIR model from Section 3 and examine the response for perception by a unidirectional microphone array.

Let us assume that the reverberant signal impinges on a unidirectional microphone array, rather than a single omnidirectional microphone. Such an array can be composed of several directional microphone elements or alternately by applying beamforming techniques with a few closely spaced omnidirectional microphones [19, 20]. The overall source-to-microphone response can be described as a convolution of the RIR (2) with the response of the corresponding directional microphone. The acoustic response of a directional microphone (or beamformer) is time-invariant and is defined only by the frequency and angle of the arriving signal.

^{dir}), will perceive the direct signal plus the reverberation part. On the other hand, the element directed at the opposite direction (denoted with

^{opp}superscript) will not perceive the direct-path signal, since it arrives mainly from the speaker direction. It will sense the reverberation alone, which is modeled as diffuse noise and hence propagates in all directions incoherently and is being perceived similarly by both elements. An example of such a configuration is illustrated in Fig. 1. Let us denote by

*θ*the angle of incidence of the direct signal, and the microphone directional gain at the angle

*θ*by

*g*

^{dir}(

*θ*). Then, we can express the energy measured by the direct microphone as

where *g*
^{opp}(*θ*) is the opposite microphone angular gain at the angle *θ*. In light of the aforementioned discussion, we will next derive our proposed approach for measuring the reverberation amount and the reverberant speech quality.

## 5 Directional power ratio

where the second transition is immediately inferred from the definition of the DRR (12).

*T*is around 20–40 ms [21].

*P*

^{{dir,opp}}(

*t*) denotes the current integrated power as sensed by the direct and opposite microphones, respectively, and DRR(

*t*) denotes the DRR at time instance

*t*, with the current speaker and microphone positions. An example of measured power ratio for a reverberated speech signal is given in Fig. 2.

Many popular approaches [8, 10, 22] refer to the DRR and the statistical model as frequency-dependent, due to the frequency dependency of the reflection coefficients and the air absorption coefficient, resulting in a frequency-dependent *T*
_{60} and decay rate *δ*. Nevertheless, we adopt a frequency-independent model, mainly for simplicity reasons. A frequency-dependent measure achieved similar simulation results to the frequency-independent model, which we would describe next.

## 6 Experimental results

### 6.1 In-front simulations

*f*

_{ s }=8 kHz). Two types of tests were performed: varying source-microphone distance with fixed reverberation time

*T*

_{60}, and varying

*T*

_{60}with fixed source-microphone distance. In order to obtain consistent results, we repeated each experiment (for a given distance and

*T*

_{60}) by varying the position of the receiver and the source, keeping the source-receiver distance and the reverberation time fixed. We then spatially averaged each set of same-distance and same-

*T*

_{60}configuration, to evaluate the ensemble average in a better way [26] and to average over disparities caused by position or local-related effects. We simulated a room of size 5×6×4 m (length ×

*width*×height) [18], with different source and receiver positions, as detailed in Fig. 3. In the simulation, the power ratio integration time

*T*[Eq. (18)] was set to 32 ms. We used four calibrated directional microphones of cardioid directivity, with the microphone array mounted exactly in front of the source. Accordingly,

*g*

^{dir}(0) was set to 1, and \(\bar {g}^{2} = \frac {1}{2\pi } \int _{-\pi }^{\pi } [\frac {1+\cos (\theta ')}{2} ]^{2} d\theta ' = \frac {3}{8}\). The direct microphone was set as the sensor that measured the maximum power. The opposite microphone was set as the sensor in front of it (in 180° angle). Alternately, it can be chosen as the sensor with the minimum power (or by applying localization algorithms).

*Clarity*measure C50 [1], which was found to be the most correlative system-based measure with regard to subjective hearing tests [14]. In addition, we calculated correlations to the intrusive quality standard algorithm ITU-T P.563 [28] and the non-intrusive quality algorithm ITU-T P.862 (PESQ) [29] (as done in [7]). Each configuration was first tested with white noise input (of constant temporal variance) [30], and then with reverberated speech signals, where here we calculate the average over all of the speech signals. The correlation results are summarized in Table 1. It details the correlation results of the varying-distance test (with fixed

*T*

_{60}and increasing source-microphone distance from 0.25 to 3 m) and the varying-

*T*

_{60}test (with fixed distance and increasing

*T*

_{60}from 0.1 to 2 s). Note that for the white noise input, the PESQ and the P.563 tests were not performed (they operate only above minimum speech activity level).

Performance comparison - correlation between the (temporal mean) proposed power ratio (PR) (18), SRMR, and EV values, with Clarity (C50), PESQ, and P.563 algorithms

Input type | White noise | Speech signals | |||
---|---|---|---|---|---|

Correlation ref. | Correlation ref. | ||||

Test type | Algorithm | C50 | C50 | PESQ | P. 563 |

| PR | 0.999 | 0.999 | 0.911 | 0.712 |

vs. | SRMR | −0.27 | 0.845 | 0.973 | 0.934 |

increasing distance | EV | −0.66 | 0.931 | 0.994 | 0.875 |

| PR | 0.999 | 0.998 | 0.970 | 0.843 |

vs. | SRMR | 0.454 | 0.967 | 0.991 | 0.921 |

increasing distance | EV | −0.54 | 0.982 | 0.991 | 0.885 |

Distance =0.5 m, | PR | 0.944 | 0.951 | 0.899 | 0.562 |

vs. | SRMR | 0.392 | 0.640 | 0.991 | 0.873 |

increasing | EV | 0.235 | 0.614 | 0.984 | 0.912 |

Distance =2 m, | PR | 0.973 | 0.969 | 0.918 | 0.674 |

vs. | SRMR | 0.787 | 0.808 | 0.998 | 0.892 |

increasing | EV | −0.33 | 0.700 | 0.987 | 0.958 |

The obtained results indicate that the proposed signal-based power ratio approach is highly correlative with the objective system-based C50 quality measure. They also show a relatively high correlation with the PESQ and P.563 scores. Note that the SRMR and the EV obtained even higher correlation with these scores. This can be explained by the fact that they are based on a gammatone [7] and mel-scale [11] subband filtering, like the bark-scale used in the PESQ and P.563. Moreover, note that the standard scores PESQ and P.563 were not developed to measure quality under reverberation conditions and that they attained poor results with respect to subjective tests [14]. However, since they are sensitive to other perceptually important distortions, we use them as additional measures.

*T*

_{ d }larger than the arrival time by 12 ms). Similarly, first we tested the performance with a white noise input and then with the same 120 reverberant speech sources. The same varying-distance and varying-

*T*

_{60}experiments were performed. The corresponding results are summarized in Table 2. In addition, a demonstration of the proposed measure performance vs. distance is illustrated in Fig. 4. It can be inferred that the proposed measure shows high correlation to the theory, with a reliable blind DRR estimation of almost 100

*%*correlation. As expected, it is inversely proportional to the source-microphone distance and to the reverberation time as well.

Performance comparison - correlation between the (temporal mean) proposed power ratio-based DRR estimator (PR-DRR) (19) and Jeub et al. CDR-based DRR estimator, with the true DRR measure

Input type | White noise | Speech signals | |
---|---|---|---|

Correlation ref. | Correlation ref. | ||

Test type | Algorithm | DRR | DRR |

| PR-DRR | 0.999 | 0.999 |

vs. increasing distance | CDR | 0.995 | 0.992 |

| PR-DRR | 0.999 | 0.999 |

vs. increasing distance | CDR | 0.964 | 0.972 |

Distance =0.5 m, | PR-DRR | 0.994 | 0.996 |

vs. increasing | CDR | 0.984 | 0.978 |

Distance =2 m, | PR-DRR | 0.999 | 0.999 |

vs. increasing | CDR | 0.852 | 0.913 |

*v*(

*t*)≠0 to the measured signal (1), we obtain a multiplicative bias factor

*β*

_{noise}in the proposed DRR estimator (19), such that: \(\beta _{\text {noise}} \propto [1+2\delta e^{2\delta T_{r}} \cdot \text {SNR}^{-1} ]^{-1}\). Then, we expect that in high signal-to-noise ratio (SNR), the bias would be negligible, whereas in low SNR, the performance would be affected and biased. This type of behavior was observed in a simulation performed over the same 120 speech signals, with a fixed source-microphone distance, a fixed reverberation time, and increasing levels of SNRs from 5 to 25 dB, using additive babble noise [30]. The rest of the parameters were similar to the previous simulations. An example of such simulation is given in Fig. 5, where we measured the absolute difference (AD) between the DRR estimator and the true DRR (in dB), vs. SNR levels [we defined AD(

*x*)=10 log10(

*x*)−10 log10(DRR)]. As a reference, it was compared to the AD of Jeub et al.’s CDR-based DRR measure and the true DRR. It seems that even though the proposed approach is sensitive to noise, it still manages to estimate correctly the DRR level based on the signals alone. For very low SNR scenarios, one can first remove the noise by applying speech enhancement methods (e.g., [31, 32]) in a pre-processing stage or estimate the noise variance [33] and use it in the measurement.

### 6.2 Off main-lobe simulations

In this part, we repeated the experiments above, but instead of varying the array positions in the room and holding the source exactly in front of the receiver, we changed the source-receiver angle. This would give us more interesting and realistic results, since usually the source is not located exactly in front of the microphone, but there is a slight offset from the directional microphone main-lobe axis. In order to achieve a smaller source-receiver angle (such that the direct microphone gain would be higher), one can use more directional elements at every array (or create it using beamforming techniques). However, clearly, this would increase the complexity and the cost of the system. At this part of the simulations, the receiver was positioned in (*x*,*y*,*z*)=(1,2,1) m in the room, and the source position was uniformly changed along an arch of a fixed radius (for a given distance), creating a −30° to +30° source-receiver angle.

For the varying distance experiment, the range of the source-microphone distance was between 0.25 and 3 m, and for the varying reverberation time, the range of *T*
_{60} was between 0.1 and 1.4 s. Additionally, we repeated the same simulations with swapped source-receiver positions, for a bigger sample space of the experiment.

Off main-lobe performance comparison - correlation between the (temporal mean) proposed power ratio (PR) (18), SRMR, and EV values, with Clarity (C50), PESQ, and P.563 algorithms

Input type | White noise | Speech signals | |||
---|---|---|---|---|---|

Correlation ref. | Correlation ref. | ||||

Test type | Algorithm | C50 | C50 | PESQ | P. 563 |

| PR | 0.999 | 0.998 | 0.889 | 0.703 |

vs. | SRMR | −0.66 | 0.860 | 0.984 | 0.934 |

increasing distance | EV | −0.65 | 0.932 | 0.993 | 0.871 |

| PR | 0.999 | 0.999 | 0.954 | 0.821 |

vs. | SRMR | 0.399 | 0.958 | 0.992 | 0.938 |

increasing distance | EV | −0.50 | 0.981 | 0.989 | 0.891 |

Distance =0.5 m, | PR | 0.946 | 0.948 | 0.926 | 0.573 |

vs. | SRMR | 0.340 | 0.654 | 0.986 | 0.867 |

increasing | EV | 0.369 | 0.640 | 0.980 | 0.893 |

Distance =2 m, | PR | 0.966 | 0.965 | 0.947 | 0.696 |

vs. | SRMR | 0.714 | 0.809 | 0.998 | 0.903 |

increasing | EV | −0.11 | 0.713 | 0.976 | 0.959 |

*T*

_{60}example is shown in Fig. 6. Both the illustration and the correlation results indicate that for the off main-lobe scenario, we obtained promising performance results as well. Moreover, the computed correlation coefficients were as high as the in-front simulations (Table 2), offering a reliable and practical measure.

Input type | White noise | Speech signals | |
---|---|---|---|

Correlation ref. | Correlation ref. | ||

Test type | Algorithm | DRR | DRR |

| PR-DRR | 0.999 | 0.999 |

vs. increasing distance | CDR | 0.999 | 0.998 |

| PR-DRR | 0.999 | 0.999 |

vs. increasing distance | CDR | 0.952 | 0.934 |

Distance =0.5 m, | PR-DRR | 0.992 | 0.993 |

vs. increasing | CDR | 0.995 | 0.996 |

Distance =2 m, | PR-DRR | 0.999 | 0.999 |

vs. increasing | CDR | 0.838 | 0.745 |

### 6.3 Recorded speech experiment

In order to examine our proposed approach in a real environment, we performed speech recordings in a lecture hall of size 15×10×6 m, using six microphone clusters (with a 3-m spacing between adjacent clusters), each composed of four unidirectional microphone units and each facing 90° apart. For the purpose of analyzing the performance of our proposed measure, we placed the microphone clusters on a line along the hall. The speaker in the experiment moved along the line, advancing from the first array toward the sixth. We divided the speech recordings such that every time the speaker was in front of one array or in between two arrays, a separate speech segment was defined. Then, for every active speech segment, we measured the power ratio (18) and calculated its temporal mean separately. Since we could not restore the reference DRR (or C50) precisely, we chose to demonstrate here a qualitative analysis of the results.

## 7 Conclusions

We have proposed a new approach to measure the reverberation ratio for assessment of the acoustic signal quality and mainly for a blind estimation of the direct-to-reverberation ratio of speech signals. Based on a statistical model, we have developed a model for reverberated speech in directional microphones. Supported by this, we measured the power ratio between two opposite unidirectional sensors and segregated the diffuse field influence from the direct signal. This directional-power-ratio measure was shown to properly estimate the ratio between the direct speech and the reverberation amount, yielding a well-founded signal-based quality measure and a blind DRR estimator. It was compared to various state-of-the-art quality measurement algorithms and DRR measures, and provided reliable results which are highly correlated to the system-based DRR measure. Finally, we tested its performance with some real speech input and managed to show that it can be used as a reliable and robust speech quality measure.

Future work will concentrate on analysis of the optimal directional microphone beampattern and its influence, optimizing and adapting the temporal smoothing to the voice activity level, and combination with de-noising algorithms for integration in real-time quality monitoring systems with distributed microphone arrays.

## Declarations

### Acknowledgements

The authors thank Dr. Baruch Berdugo from Phoenix Audio Technologies for his appreciated assistance with the real-data recordings, and the anonymous reviewers for their constructive comments and useful suggestions.

This research was supported by the Israel Science Foundation (grant no. 1130/11).

## Authors’ Affiliations

## References

- H Kuttruff,
*Room Acoustics*(Taylor & Fanncis Press, New York, USA, 2009).Google Scholar - PA Naylor, EAP Habets, in
*Speech Dereverberation*, ed. by PA Naylor, EAP Habets, JY-C Wen, and ND Gaubitch. Models, measurement and evaluation (SpringerLondon, UK, 2010).View ArticleGoogle Scholar - SR Quackenbush, TP Barnwell, MA Clements,
*Objective Measures of Speech Quality*(Yale University Press, Prentice Hall Englewood Cliffs, NJ, 1988).Google Scholar - S Wang, A Sekey, A Gersho, An objective measure for predicting subjective quality of speech coders. Selected Areas Commun. IEEE J. 10(5), 819–829 (1992).View ArticleGoogle Scholar
- PA Naylor, ND Gaubitch, EAP Habets, Signal-based performance evaluation of dereverberation algorithms. J. Electr. Comput. Eng. 2010, 1–5 (2010).View ArticleGoogle Scholar
- I-T Recommendation. P. 800: Methods for Subjective Determination of Transmission Quality (International Telecommunication Union, Geneva, 1996).Google Scholar
- TH Falk, C Zheng, W-Y Chan, A non-intrusive quality and intelligibility measure of reverberant and dereverberated speech. Audio Speech Lang. Process. IEEE Trans. 18(7), 1766–1774 (2010).View ArticleGoogle Scholar
- EAP Habets, S Gannot, I Cohen, Late reverberant spectral variance estimation based on a statistical model. Signal Process. Lett. IEEE. 16(9), 770–773 (2009).View ArticleGoogle Scholar
- O Thiergart, T Ascherl, EAP Habets, in
*Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On*. Power-based signal-to-diffuse ratio estimation using noisy directional microphones (FlorenceItaly, 4–9, May 2014), pp. 7440–7444.Google Scholar - M Jeub, C Nelke, C Beaugeant, P Vary, in 19th European Signal Processing Conference (EUSIPCO 2011). Blind estimation of the coherent-to-diffuse energy ratio from noisy speech signals (Barcelona, Spain, Aug. 29–Sep. 2,2011), pp. 1347–1351.Google Scholar
- M Wolf, C Nadeu, Channel selection measures for multi-microphone speech recognition. Speech Comm. 57, 170–180 (2014).View ArticleGoogle Scholar
- O Thiergart, G Del Galdo, EAP Habets, in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference On. Signal-to-reverberant ratio estimation based on the complex spatial coherence between omnidirectional microphones (Kyoto, Japan, 25–30 March, 2012), pp. 309–312.Google Scholar
- Y Hioka, K Furuya, K Niwa, Y Haneda, et al, in Acoustic Signal Enhancement, Proceedings of IWAENC 2012, International Workshop On (VDE, 2012). Estimation of direct-to-reverberation energy ratio based on isotropic and homogeneous propagation model (Aachen, Germany, 4–6 Sept., 2012), pp. 1–4.Google Scholar
- S Goetze, E Albertin, M Kallinger, A Mertins, K-D Kammeyer, in Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference On. Quality assessment for listening-room compensation algorithms (Dallas, TX, 14–19 March, 2010), pp. 2450–2453.Google Scholar
- EAP Habets, Single- and multi-microphone speech dereverberation using spectral enhancement. PhD thesis, Technische Universiteit Eindhoven (2007).Google Scholar
- J-D Polack, La transmission de l’energie sonore dans les salles. PhD thesis, Universite du Maine, Le Mans, France (1988).Google Scholar
- K Lebart, J-M Boucher, P Denbigh, A new method based on spectral subtraction for speech dereverberation. Acta Acustica united with Acustica. 87(3), 359–366 (2001).Google Scholar
- EAP Habets, in
*Speech Dereverberation*, ed. by PA Naylor, ND Gaubitch. Speech dereverberation using statistical reverberation models (SpringerLondon, UK, 2010).Google Scholar - S Gannot, I Cohen, in
*Springer Handbook of Speech Processing*, ed. by J Benesty, MM Sondhi, and Y Huang. Adaptive beamforming and postfiltering (SpringerBerlin, Germany, 2008), pp. 945–978.View ArticleGoogle Scholar - M Brandstein, D Ward,
*Microphone Arrays: Signal Processing Techniques and Applications*(Springer, Berlin, Germany, 2001).View ArticleGoogle Scholar - JR Deller Jr, JG Proakis, JH Hansen, Discrete-Time Processing of Speech Signals. (Macmillan Pub. Co, New York, 1993).Google Scholar
- EAP Habets, S Gannot, I Cohen, in Proc. IEEE Convention of Electrical & Electronics Engineers in Israel (IEEEI). Speech dereverberation using backward estimation of the late reverberant spectral variance (Eilat, Israel, 3–5 Dec., 2008), pp. 384–388.Google Scholar
- EAP Habets, Room Impulse Response (RIR) Generator. http://www.audiolabs-erlangen.de/fau/professor/habets/software/rir-generator. Accessed Dec. 2014.
- JB Allen, DA Berkley, Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–950 (1979).View ArticleGoogle Scholar
- JS Garofolo,
*TIMIT: Acoustic-Phonetic Continuous Speech Corpus*(Linguistic Data Consortium, Philadelphia, 1993).Google Scholar - J-M Jot, L Cerveau, O Warusfel, in Audio Engineering Society Convention 103. Analysis and synthesis of room reverberation based on a statistical time-frequency model (Audio Engineering Society, 1997).Google Scholar
- TH Falk, C Zheng, W-Y Chan, SRMR (speech-to-reveberation modulation energy ratio) Matlab Toolbox. http://musaelab.ca/pdfs/SRMRtoolbox.zip. Accessed in Dec. 2014.
- I-T Recommendation, P. 563: Single-Ended Method for Objective Speech Quality Assessment in Narrow-Band Telephony Applications. (International Telecommunication Union, Geneva, 2004).Google Scholar
- I-T Recommendation, P. 862, Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs. vol, 23. (International Telecommunication Union, Geneva, 2001).Google Scholar
- A Varga, HJM Steeneken, Assessment for automatic speech recognition: Ii. noisex-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993).View ArticleGoogle Scholar
- I Cohen, B Berdugo, Speech enhancement for non-stationary noise environments. Signal Process. 81(11), 2403–2418 (2001).MATHView ArticleGoogle Scholar
- I Cohen, S Gannot, in
*Springer Handbook of Speech Processing*, ed. by J Benesty, MM Sondhi, and Y Huang. Spectral enhancement methods (SpringerBerlin, Germany, 2008), pp. 873–902.View ArticleGoogle Scholar - I Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. Speech Audio Process. IEEE Trans. 11(5), 466–475 (2003).View ArticleGoogle Scholar

## Copyright

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.