EURASIP Journal on Applied Signal Processing 2002:4, 434–445 c ○ 2002 Hindawi Publishing Corporation SVD-Based Optimal Filtering Technique for Noise Reduction in Hearing Aids Using Two Microphones

AbstractWe introduce a new SVD-based (Singular value decomposition) strategy for noise reduction in hearing aids. This technique is evaluated for noise reduction in a behind-the-ear (BTE) hearing aid where two omnidirectional microphones are mounted in an endfire configuration. The behaviour of the SVD-based technique is compared to a two-stage adaptive beamformer for hearing aids developed by Vanden Berghe and Wouters (1998). The evaluation and comparison is done with a performance metric based on the speech intelligibility index (SII). The speech and noise signals are recorded in reverberant conditions with a signal-to-noise ratio of
 and the spectrum of the noise signals is similar to the spectrum of the speech signal. The SVD-based technique works without initialization nor assumptions about a look direction, unlike the two-stage adaptive beamformer. Still, for different noise scenarios, the SVD-based technique performs as well as the two-stage adaptive beamformer, for a similar filter length and adaptation time for the filter coefficients. In a diffuse noise scenario, the SVD-based technique performs better than the two-stage adaptive beamformer and hence provides a more flexible and robust solution under speaker position variations and reverberant conditions.


INTRODUCTION
A major problem for hearing impaired listeners is the understanding of speech in noise. Indeed, their speech reception threshold (SRT: defined as the sound-pressure level of speech at which 50% of the speech is correctly understood by the listener) in noise, interfering noises or competing speakers, is higher than for normal hearing subjects [1]. To compensate for this difference, several noise reduction strategies with one or multiple microphones have been developed.
As for single microphone approaches, a noise reduction system in hearing aids is typically based on a hardware directional microphone. Some studies have shown that the directional microphone may give an SRT improvement of about 3 dB in difficult listening conditions [2,3]. With other methods, such as spectral subtraction or Wiener filtering, an improvement in physical signal-to-noise ratio (SNR) has been found, but unfortunately a similar improvement of the speech reception thresholds has not been observed [4].
As for multiple microphone approaches, beamforming is by far the most developed method. There are fixed and adaptive beamformers. Fixed beamformers focus on a target direction independent of the interfering signals. This has given significant improvement of the SRT in anechoic conditions. Soede et al. [5] obtained an SRT improvement of 7 dB with an endfire array configuration (an endfire array has its microphones colinear with the target direction), where five cardioid microphones were spaced on a 10 cm-long array, as compared with an omnidirectional microphone system. Similar approaches to the Soede strategy have been investigated [6,7] and comparisons between these different methods have shown improvements of speech intelligibility in noise [8,9].
Widrow and Stearns [10] introduced adaptive noise cancellation (ANC), where the aim is to null out the interfering noise source(s). Griffiths and Jim gave an extension of the ANC, commonly known as "Griffiths-Jim beamforming," and for this approach, several studies have shown improvements in speech intelligibility for directional jammers [11,12]. These studies were carried out with different microphone arrays and with monaural or binaural conditions. Adaptive systems are known to give a significant improvement in speech intelligibility when the noise is located, but not when it is diffuse [12,13]. Generally, these studies tested systems with a microphone array larger than the behindthe-ear (BTE) device (which is a practical impediment) and without any limitations on the available computing power.
Based on the work by Van Compernolle [14], Vanden Berghe and Wouters [15] have developed a two-stage adaptive beamformer for BTE devices. This technique was implemented on a wearable digital signal processor (DSP) platform and tested with a hearing aid that contained a hardware directional microphone and an omnidirectional microphone spaced by 1.5 cm. A significant SRT improvement of about 11 dB was obtained between a hardware directional microphone and the output of the two-stage adaptive beamformer for a directional jammer [16]. Unfortunately, this performance is dependent on the microphone configuration used.
Here, we study the behaviour of a new multiple microphone signal enhancement technique based on SVD [17]. Signals coming from a normal-sized BTE device are used to evaluate offline the speech enhancement using a performance metric based on the so-called SII [18]. The hearing aid contains two omnidirectional microphones, spaced by 2 cm, and a dual microphone technique is implemented to create a directional microphone (see Figure 1). The signals of the software directional microphone and the rear omnidirectional microphone are used as input of the noise reduction algorithms. The SVD-based technique is compared to the two-stage adaptive beamformer of Vanden Berghe in different noise conditions. These noises have the same spectrum as the speech signals and the SNR of the recorded signals was 0 dB. In addition, a comparison has been performed for these algorithms in reverberant and diffuse noise conditions. Also, different parameters of the SVD-based technique were optimized, to improve the speech intelligibility.

Basic principle
Assume that the microphone samples processed at time k are stacked into a vector (N × 1) u k , and that u k = s k + n k , where s k is the speech part and n k is the noise part. To extract the speech part s k from u k , McAulay and Malpass [19] used the well-known Wiener filter which minimizes the mean-square error (MSE). The estimate of the speechs k takes the form with W ∈ R N×N the optimal filter (with N the size of the filter  per channel), u k ∈ R N being the filter input vector at time k, ands k ∈ R N the filter output vector. The ith column of W is then an optimal filter for the ith component of s k . Then, the error of the estimation at time k is defined by e k = s k −s k , and the MSE cost function for optimal filtering is The optimal filter is found by setting the derivative ∂J MSE /∂W equal to zero Hence, the optimal filter W WF is However, obtaining an estimate for Ᏹ{u k · s T k } is not straightforward as s k , obviously, is unknown. If we use a robust speech/noise detection algorithm, and noise-only observations can be made during speech pauses at time k (u k = 0 + n k ), then we can use such observations to estimate which means that we are able to estimate Ᏹ{n k · n T k }. The leftmost term, Ᏹ n k · n T k , indicates that short-term noise stationarity is assumed. During speech activity (at time k), we observe both the signal-of-interest and the noise signal, Ᏹ{u k · u T k }. If we assume that s k and u k are statistically independent of each other (Ᏹ{s k · n T k } = 0), then Given Ᏹ{u k · u T k } and Ᏹ{n k · n T k }, we can thus compute Ᏹ{s k · s T k }. Finally, from the assumed independence of s k and n k , it also follows that so that the optimal filter W WF is finally given by Also, an approach often used to approximate the Wiener filter in the frequency domain is (see [20]) The function Ᏹ{|N( f )| 2 } is obtained by averaging many frames during silence intervals in which the statistics of the background noise can be assumed stationary. To estimate Ᏹ{|S( f )| 2 }, there are different possibilities [21,22]. Doclo and Moonen [17] use an interesting and useful simplification in formula (8), of which more theoretical justification can be found in [23]. The matrix W WF is derived from the joint diagonalization (generalized eigenvalue decomposition) of the symmetric matrices Ᏹ{u k ·u T k } and Ᏹ{n k ·n T k }, where X is an invertible (but not necessarily orthogonal) matrix. Note that diag{σ 2 i } represents a diagonal matrix with diagonal elements σ 2 i , i = 1, . . . , N, and that diag{η 2 i } is similarly defined. In practice, X, σ 2 i , η 2 i are computed by means of a generalized singular value decomposition of the data matrices U k ∈ R p×N and N k ∈ R q×N (with p and q typically larger than N), such that Ᏹ{u k ·u T k }⇒(U T k ·U k )/ p and Ᏹ{n k ·n T k }⇒(N T k ·N k )/q. The generalized singular value decomposition of the matrices U k and N k is defined as where Y ∈ R p×N and Z ∈ R q×N are orthogonal matrices, X ∈ R N×N is an invertible matrix and σ i /η i are the generalized singular values. By substituting the above formulas in (8), we obtain

Single microphone applications
In single microphone speech processing, generally the vector u k is taken from a time series u(k), that is, and similarly The data matrices U k ∈ R pN and N k ∈ R qN , as defined in (11), are then Toeplitz matrices, for example, The optimal filtering approach of (1) then provides an optimal filtered estimates(k) at time k, but in addition optimal smoothed estimatess(k − 1),s(k − 2), . . . ,s(k − N + 1) (all at time k). Conversely, optimal estimates for s(k) are obtained at time k, k + 1, . . . , k + N − 1, corresponding to different columns of W WF , applied to, respectively, u k , u k+1 , . . . , u k+N−1 . The question then arises which estimate should be picked. A first solution consists in computing the error covariance matrix. The estimation error e k is defined as It is easily shown that the error covariance matrix is given as In particular, we are interested in the diagonal elements of the error covariance matrix {Ᏹ{n k · n T k } · W WF } ii . Indeed, the smallest element on the main diagonal of this error covariance matrix corresponds to the best estimator w 1 = W WF (:, i ), where i = arg min i (Ᏹ{e k · e T k }) ii . A second solution consists in computing an average over all available estimates. This technique is often applied to rank truncation based estimation [24,25]. The corresponding filter is then given as with w(i, j) denoting the (i, j)-element of W WF . A third solution is to simply take an arbitrary column of W WF and use that as a noise reduction filter, w 3 = W WF (:, i) for random i ∈ {1, . . . , N}. A comparison between these three different strategies will be given in Section 3.1.

Multiple microphones applications
In a multiple microphone application, the vector u k ∈ R MN takes the form with where j refers to the jth microphone and M is the number of the microphone inputs. In our case we have two microphones, so j = 1, 2. The vector n k is similarly defined. The data matrices U k ∈ R (p×MN) and N k ∈ R (q×MN) as defined in (11) then take the form Using similar formulas as in the one-channel case, the optimal filter W WF and then the (2 × N)-taps estimator w can be computed, so that the estimated signals k is given aŝ . . .
wheres k is an estimate for the (delayed version of the) speech part of either microphone signal 1 or microphone signal 2 depending on the choice for w. This filter can be considered as a two-channel filter (see Figure 2), where each microphone is filtered with an N-taps filter A j

Two-stage adaptive beamformer
The behaviour of the SVD-based technique is compared with another noise reduction algorithm. This algorithm is a twostage adaptive beamformer which has been developed by Vanden Berghe and Wouters [15]. The complete noise reduction strategy is depicted in Figure 3. The two-stage adaptive beamformer, which is based on adaptive noise cancellation (ANC), attempts to model noise during noise periods (where only noise signal is present), and subtracts noise from speech plus noise when speech is present. A speech detection algorithm is implemented to decide whether the signal contains speech plus noise or noise only. The sum and subtraction (middle part of Figure 3) based on the scheme of Griffiths and Jim [26] improve the noise reference of the ANC. Van Compernolle [14] used a second adaptive filter instead of the fixed filter, which adapts when desired speech is present and compensates for the different transfer functions between the speaker and the microphone array. It has been shown that this procedure experiences robustness problems. For example, it can converge to a wrong direction when different speakers are around the listener or if loud noise is wrongly detected as speech by the speech detection algorithm. For this reason, the first filter will be kept fixed, under the assumption that the speaker is always in front of the listener. In fact, we gave a specific look direction to the two-stage adaptive beamformer, namely the direction of the desired signal,  at 0 • . To determine the coefficients of the first filter, we used a stationary ICRA-signal as described below. In anechoic conditions, this signal was presented by a loudspeaker 1 meter in front of the dummy head with the hearing aid. The first filter's coefficients were adapted, by means of a normalized least mean squares procedure (NLMS) [27], during few seconds and then the adaptation was stopped. The obtained coefficients are used from then on to create the fixed first filter. Thus, a front cardioid and a back cardioid are obtained, respectively, at the speech reference and the noise reference. The number of coefficients are 10 and 30, respectively, for the first and the second filter. The additional delays (Figure 3) actually allow to have noncausal filters, and their values are set to half of the size of the filters (5 and 15). The second filter is an adaptive filter, also adapted by an NLMS procedure.

Performance metric
To evaluate the improvement of the speech intelligibility of noise reduction algorithms different performance metrics have been developed, which are mostly based on an averaged intelligibility gain. The basic idea was introduced by Peterson [11]. Greenberg et al. [28] developed an SNR measure where signals of output and input were decomposed in third octave bands and for each frequency band, weights were applied as defined in the articulation index calculation [29]. Hoffman et al. [30] introduced an improvement by limiting the speech peak-to-noise ratios to the range of 0 to 30 dB. Unfortunately, these performance metrics do not take into account the masking and the reverberation effect and, respectively, Kates and Weiss [8], Saunders and Kates [9], gave solutions that also consider these two effects. Since 1997, an extension of the articulation index calculation has been suggested. This extension is known as the SII [18], and comprises a variety of adverse listening conditions, such as noise masking, filtering, distortion, and low reverberation. Thus, we define a performance metric based on the SII where SNR i is the signal-to-noise ratio measured (in dB-SPL) in the ith third octave band; I i and A i are the weights for the importance of the band and the audibility function, respectively, as described by [18]. Thus, SNR weigthed is a weighted SNR, where the weights reflect speech intelligibility. As Greenberg et al. [28], we estimate speech reception threshold (SRT) improvement between the input, the omnidirectional microphone in our case, and the output of the two-stage adaptive beamformer or the SVD-based techniques Maj et al. [13] obtained a mean of 0.63 dB for the difference between the SRT values obtained by performing listening tests with subjects and the computed SNR SII . For this evaluation to be valid, a linear relation has to be assumed between the input and the output of the system. This means that the hearing aid must have a linear amplification and the noise reduction system, the two-stage adaptive beamformer or the SVD-based technique, must implement a linear transformation of the input signals. Also, to measure the performance, the system must not be in transient conditions. As for the linearity conditions, we know that the hearing aid has a linear amplification, and that the noise reduction systems introduce a linear transformation of the input signals. As for the no transient conditions, the evaluation is done with stationary speech weighted noise and we took care that the different algorithms were given enough time to converge.

Hearing aid
To study the behaviour of the SVD-based technique on speech intelligibility improvement, we have applied it to signals which were recorded with a behind-the-ear hearing aid. The hearing aid is a Danavox-163D hearing aid housing where two omnidirectional microphones (Knowles Electronics-EM4368) are mounted in an endfire array configuration spaced 2 cm apart. Furthermore, in the first signal processing stage, a software directional microphone [31] is created with the omnidirectional microphones ( Figure 1). The directional microphone signal is computed as the difference between the signal from the front microphone and the delayed-weighted signal of the rear microphone, resulting in a response comparable to a hardware directional microphone. The microphone parameters are the interport distance d, internal delay τ, and the weight factor for the back port is β( f ) = a · e − j2π f τ . The delay τ and the weight a have been chosen to give a hypercardioid spatial characteristic in anechoic conditions. The hearing aid has a linear amplification and does not have systems for compression or feedback control.

Signals
Two neighboring rooms were used for the experiments. The first room simulates the acoustics of a living room situation with a volume of 63 m 3 and a reverberation time of 0.76 s. The estimated direct-to-reverberant ratio is 5.20 dB at 1 m [32]. In this room, the hearing aid was positioned on a dummy head and loudspeakers (Yamaha CBX-S3) were situated at 1m from the dummy head (see Figures 4 and 5).
The signals of the two microphones of the hearing aid were amplified and connected to the second room where further signal processing took place. The signal of the microphones were amplified with a two-channel Larson Davis 2200 C amplifier and were digitized, at a sampling frequency of 16000 Hz, with a PC-platform using a Texas Instruments TMS320C40 digital signal processor (DSP) and two input channels with 16-bit analog-to-digital conversion (ADC). The amplifier of the hearing aid and the different devices have been checked to ensure that they were not saturating.
The speech and noise signals, which were sent to the loudspeakers, were stationary speech weighted noises, namely ICRA, BLU, NVA, and BRUGSE. The ICRA-signal (International Collegium of Rehabilitative Audiology) is based on the multilanguage long-term average speech spectrum (LTASS). It has been produced by the Hearing Aid Clinical Test Environment Standardization Work Group, and is a white noise signal filtered in close accordance with the LTASS [33] and ANSI S3.79 [18]. The signal is unmodulated random Gaussian noise representative of a male weighted idealized speech spectrum with a normal effort [34]. The BLUsignal, NVA-signal, and BRUGSE-signal are described and available on the Compact-Disk of Wouters [35]. They are also stationary speech-weighted noises but they do not have the same spectrum as the ICRA signal.

Optimization of the SVD-based technique
In these experiments, one loudspeaker (L 0 ) was situated in front of the dummy head (at 0 • ) and presents the speech material, and a second (L 90 ) at 90 • (on the side of the hearing aid) presents noise material (see Figure 4).
We used an ICRA-signal at 0 • for speech and a similar ICRA-signal at 90 • for noise, implying that the noise and speech signals have the same spectrum. These two signals were recorded separately, that is, the speech was recorded when only the loudspeaker in front of the dummy head was active, and the noise was recorded when only the loudspeaker on the side of the dummy head was active. This provides additional experimental flexibility. The two loudspeakers have the same frequency response and were matched to get a level of 70 dBSPL at the center of the dummy head. We recorded the signals during 10 seconds, thus we got two signals with  160000 samples for speech as well as for noise. From these signals, a noise frame was created only with the noise signal, and a speech-and-noise frame by the addition of the speech and the noise signals. We define a point inside the original 160000 samples frames referred to as "start speech." "Start speech" can be interpreted as a perfect "speech detection" and means that before the point "start speech" we have only noise (we use the noise frame) and after this point, we have speech and noise (we use the speech-and-noise frame).
With the SVD-based algorithm, the optimal filter w WF can be viewed as a two-channel filter, where each microphone is filtered with an N-taps filter A j , see formula (24).
The resulting estimated signal is computed by filtering, and summing the microphone signals s k and n k with the filters A j (see Figure 2). The SNR SII computation uses these two filtered signals separately to make the evaluation of the algorithm performance. For the two-stage adaptive beamformer the coefficients of the second filter were adapted on the noise signal until convergence and then they were kept fixed. Then, the speech and noise signals were filtered separately with the fixed coefficients, and the evaluation of the SNR SII performance was measured in the same fashion as for the SVDbased technique.

Optimal estimate
As already mentioned, W WF (formula (13)) contains a collection of 2N-taps filters. We need to make a choice for w WF to obtain an estimate for s(k). In Section 2.1.2, we have described three different methods, an error-covariance matrix based method, an averaging over all available estimates, and an arbitrary choice between the columns of the filter matrix W WF . The SVD-based algorithm was applied with a speechand-noise frame and noise frame of 2000 samples and the filter length N was 20 per channel.

Influence of the design parameters
The SVD-based algorithm depends on three parameters: the size of the noise frame (q), the size of the speech-and-noise frame (p), and the length of the filter for each channel (N). In the next paragraph, we describe the experiments to study the influence of these parameters on the speech intelligibility performance through the SNR SII .
The signals u k and n k are stationary, which implies that for different values of "start speech," we can expect to obtain the same SNR SII improvement. We applied the SVD-based algorithm with the strategy described in Section 2.3 for fourteen different values of "start speech." With the beamformer, the second stage adapted on the 10000 samples of the noise frame before the "start speech." We calculated the standard deviation and the mean of the improvement with the fourteen different values for "start speech" for both algorithms. Also, with the SVD-based technique, we varied the length of the speech-and-noise frame and the noise frame, while the filter size is kept fixed (N = 20).
Next, we investigate the influence of the filter size N for each channel. A size of 5000 samples for the speech-andnoise frame and a size of 15000 samples for the noise frame are chosen. The "start speech" is taken in the middle of the file, at sample 80000 (= 5 s).
It is instructive to also investigate the improvement that can be obtained with a single microphone configuration. The method is the same as for the two microphones configuration. We have chosen the middle column of the matrix W WF as the optimal filter, and we have varied the different lengths of the speech-and-noise frame, noise frame, and the size of the filter.
Finally, it is important to know if the algorithms distort or have an influence on the speech signal itself. To investigate this effect, we analyze the transfer function between the sys-tem input and output. For this and the next experiments, the "start speech" is taken in the middle of the file. The adaptation of the beamformer second filter was performed during 10000 samples of the noise period. With the SVD-based technique, we have calculated the filter with a length of 5000 samples for the speech frame and 15000 samples for the speech-andnoise frame. The length of the filter for each channel was 20.

One noise source
In this experiment, the recordings have been made with the same "ICRA" signals and under the same conditions as described in Section 2.6. The signals were recorded for different locations of the noise source, corresponding to angles between 0 • and 345 • in steps of 15 • (90 • is the side of the hearing aid). The desired speech source is always located at 0 • (see Figure 5).

Multiple noise sources
For scenarios with multiple and different noise sources, we used four loudspeakers (see Figure 4), one for the speech signal (L 0 ) and three for the noise signals (L 90 , L 180 , and L 270 ). For the speech signal, we have again used "ICRA" signals, but for the noises we have utilized the ICRA-signal, BLU-signal, NVA-signal, and BRUGSE-signal. The sum of the signals has been recorded with the same level (70 dBSPL) at the middle of the dummy head.

Diffuse noise
Until now, we have focused on localized noise sources. However, the sound field in a large room or in a car may appear as diffuse noise. A diffuse noise does not have a well-specified direction. To create the diffuse noise, we have made a setup as defined by Veit and Sander [36].
The reverberation time of the room was T 60 = 0.88 s for a speech-weighted noise. The estimated direct to reverberant ratio is 4.55 dB at 1 m. The point M is in the middle of eight loudspeakers. The level of each loudspeaker has been fitted to give the same sound level at the point M (61 dB) and when the eight loudspeakers work together we obtain a sound level of 70 dB at M. For the measurements, we have placed the center of the dummy head at the point M. To create the speech signal, a loudspeaker was put in front of the dummy head at 1 m at different angles between −45 • and +45 • in steps of 15 • . The two signal contributions, speech, and diffuse noise, were recorded separately.

Optimal estimate
The results are shown in Figure 6. The error covariance matrix method (ECM) clearly performs better than the averaging method and selects the optimal estimator (column 10). The improvement for each column of W WF shows two different parts, namely between the columns 1 to 20, and the columns 21 to 40. In fact, in the first part we have the data of the directional microphone, and in the second part we have the data of the omnidirectional microphone. In these two parts, the maximum improvement is reached when we take the column exactly in the middle, for the two kinds of microphones (omnidirectional and directional). Obviously, we have a better improvement with the directional part because the data of the directional microphone contains less noise than the data of the omnidirectional microphone when the noise source is located at 90 • (remember that SNR weigthed, input input in formula (26) corresponds to the front omnidirectional microphone). The ECM method shows that column 10 gives the best result. Hence, from now on (and for the sake of simplicity) we will always take column 10 of the W WF .

Influence of the design parameters
Figures 7 and 8 show the effect of the size of the noise frame and the speech-and-noise frame on the mean and the standard deviation of the SNR SII improvement. The values of the beamformer are taken as references, that is, 6.32 dB for the mean and 0.12 dB for the standard deviation. In the case of a large standard deviation, it is found that the signal has fluctuations which can correspond to unnatural sounds. The two-stage adaptive beamformer with this standard deviation (0.12 dB) is known to give satisfactory results in this respect. Up to 5000 samples, if we increase the size of the speech and noise frame, we increase the SNR SII improvement and the standard deviation drops. After 5000 samples, there is only a marginal improvement. For the noise frame, when we increase the length, we have the same effect as with the speech-and-noise frame. The SNR SII improvement increases and the standard deviation decreases. So the longer the size of the noise frame, the larger the SNR SII improvement. To obtain the same performance as the beamformer (with adaptation of the second filter on 10000 samples) for the standard deviation, we need to take a noise frame longer than 10000 samples (15000 samples). Furthermore, for the size of frame that we used, the SVD-based algorithm performs slightly worse (0.4 dB) than the beamformer. Figure 9 shows the influence of the filter size N for each channel. For a length N > 20, the SNR SII improvement stays Size of filter N by channel SNR SII improvment (dB) Figure 9: Influence of the filter size for each channel on the SNR SII improvement. The length of the speech frame is 5000, the length of the noise frame is 15000.
roughly the same, while for N < 20 the SNR SII is seen to drop significantly.
In the case of a single microphone configuration, if the noise frame is equal to 15000 samples, the filter length is 20 and the length of the speech-and-noise frame is varied from 5000 samples to 20000 samples by steps of 5000 samples, an SNR SII improvement < 0.1 dB is found. If the speech-andnoise frame is equal to 5000 samples, the filter length to 20 and the length of the noise frame is varied from 5000 samples to 20000 samples by steps of 5000 samples, again an SNR SII improvement < 0.1 dB is found. Finally, when the speechand-noise frame is equal to 5000 samples, the noise frame is equal to 15000 samples and the length of the filter is varied from 10 to 50 taps, no significant SNR SII improvement is measured. The results of these different configurations show that the SVD-based algorithm for a single microphone configuration does not significantly improve the SNR SII , which is a good predictor for the speech intelligibility. Figure 10 shows the transfer function between the directional microphone and the output of the two algorithms. The SVD-based technique performs as well as the beamformer and the transfer function for speech is around 0 dB for the frequencies between 500 Hz and 6500 Hz. Thus, we can conclude that we do not have any distortion of the speech signal for each algorithm.
From these experiments, we can conclude that the SVDbased technique roughly performs as well as the two-stage adaptive beamformer. Indeed, with a filter size of 2 × 20 coefficients for the SVD-based technique and 10 + 30 coefficients for the beamformer, the speech intelligibility improvement is only 0.4 dB higher for the beamformer. Furthermore, this improvement can be obtained with a similar adaptation time for the coefficients (10000 samples) for both algorithms. However, to obtain a similar standard deviation in the SNR SII we prefer to take the adaptation time a bit longer for the SVD-based technique. Angle relative to the direction of the speech source ( • ) SNR SII (dB) Figure 11: The curves show the SNR SII at the omnidirectional microphone (-) as a function of angle relative to the direction of the speech source, at the directional microphone (· · · ), at the output of the two-stage adaptive beamformer (−−) and at the output of the SVD-based technique (−·). Figure 11 shows the behaviour of the omnidirectional microphone, the software directional microphone, the output of the two-stage adaptive beamformer and the output of the SVD-based technique. Ideally, the omnidirectional microphone has the same sensitivity for all angles (around 0 dB in our case). However, the sensitivity is seen to be a function of the angle, mainly due to the effect of the dummy head. Figure 11 shows that the beamformer and the SVDbased procedure have a very similar behaviour, and so we can say that the improvement is the same for the two algorithms. These algorithms always give a larger improvement than the omnidirectional and the directional microphone. Remarkably, the directional microphone can perform worse than the omnidirectional microphone, namely between angles +285 • and +345 • .

Multiple noise sources
For scenarios with multiple and different noise sources, the results are shown in Table 1. Evidently, with two and three noise sources, both the beamformer and the SVD-based algorithm perform worse than in the one noise source case. These two strategies give better results than the directional microphone and have roughly the same SNR SII improvement for the different situations. As in the one noise source configuration, the head diffraction contributes to the improvement. Indeed, the SNR SII improvement is larger when we have a noise source at 270 • instead of 180 • . Also, we can observe a better SNR SII improvement when the noise source does not contain an ICRA-signal spectrum.

Diffuse noise
In this section, we worked with high reverberations in the room and hence we cannot apply the SNR SII procedure properly [18]. For this reason we modified the SNR weighted , as introduced by Greenberg et al. [28] and defined by (27), where now we only applied the weights for the importance of the band for the speech intelligibility. Thus, the SNR weighted becomes and the SNR improvement, between the signal of the front omnidirectional microphone (input) and the signal of a  Figure 12: In a diffuse noise case, the SNR AI improvement between the omnidirectional microphone and the directional microphone (· · · ), the two-stage adaptive beamformer (−·) and the SVD-based technique (-).
noise reduction algorithm (output), is given by The SNR AI improvement of the SVD-based technique is always better than the directional microphone and the twostage adaptive beamformer (see Figure 12). For angles above 0 • the two-stage adaptive beamformer performs worse than the directional microphone and the omnidirectional microphone, in the case where the speaker is at +45 • . The difference between the directional microphone and the SVD-based technique is small but still important for hearing-aid users, because in critical listening conditions (close to 50 per cent of speech understood by the listener) an improvement of 1 dB in SNR corresponds to an increase of speech understanding of about 15% in every day speech communication [37].

COMPUTATIONAL COMPLEXITY
The computational complexity of the two-stage adaptive beamformer equals (2 · N 1 + 7 · N 2 ) × f s operations (multiplications or additions) per second (ops/s). N 1 represents the size of the fixed filter, N 2 the size of the adaptive filter and f s the sampling frequency. In our case, N 1 = 10 taps and N 2 = 30 taps, thus the two-stage adaptive beamformer has a computational complexity of 3.7 Mops/s at sampling frequency 16.000 Hz.
Doclo and Moonen [23] suggests an implementation using a recursive and approximate SVD-based technique. The cost of calculation of this technique equals (17.5 · N 2 /dg + 4 · N 2 /df ) × f s ops/s. N represents the size of the filters per channel, dg and df indicate the number of samples between respectively two GSVD and filter updates and they trade off convergence speed and cost of calculation. An optimal size of N = 20 taps has been found (see Section 2.6.2) and if dg = df = 1 the cost of calculation equals 550 Mops/s. Also, an interesting implementation of the SVD-based technique has been proposed by Spriet [38], which is based on the Doclo's implementation. The difference between both implementations is that Doclo presents a full band implementation and Spriet a subband implementation where an important reduction of the computational complexity can be obtained.

CONCLUSION
In this paper, we have assessed the performance of a SVDbased multimicrophone enhancement procedure in the context of two microphone hearing aids. Through the different evaluations, the SVD-based noise reduction technique of [17] is seen to achieve roughly the same improvement of the speech intelligibility as the two-stage adaptive beamformer of [15] in the case of located noise sources. However, in the diffuse noise case, the SVD-based technique performs always better than the two-stage adaptive beamformer and the directional microphone. The SVD-based algorithm, furthermore, works without initializations nor assumptions about a look direction, unlike the two-stage adaptive beamformer. Indeed, with the beamformer, we have to give a look direction to the algorithm, which is not a simple task, and then assume that the speaker is always in front of the listener. This beamformer initialization is a function of the microphone characteristics and displacement and should be adapted to each and every hearing aid.
An important result is that the SVD-based technique does not distort speech during noise suppression. Indeed, the transfer function between the unprocessed and the processed speech signals is flat (0 dB) and we get roughly the same behaviour as for the beamformer.
The main disadvantage of the SVD-based technique is the cost of calculation but even this can be reduced to an acceptable level [23]. In the future, the technique will be implemented in real time on a PC-platform, and perceptual test will be done with normal hearing and hearing impaired listeners.