Dual-Channel Speech Enhancement by Superdirective Beamforming

In this contribution, a dual-channel input-output speech enhancement system is introduced. The proposed algorithm is an adaptation of the well-known superdirective beamformer including postﬁltering to the binaural application. In contrast to conventional beamformer processing, the proposed system outputs enhanced stereo signals while preserving the important interaural amplitude and phase di ﬀ erences of the original signal. Instrumental performance evaluations in a real environment with multiple speech sources indicate that the proposed computational e ﬃ cient spectral weighting system can achieve signiﬁcant attenuation of speech interferers while maintaining a high speech quality of the target signal.


INTRODUCTION
Speech enhancement by beamforming exploits spatial diversity of desired speech and interfering speech or noise sources by combining multiple noisy input signals.Typical beamformer applications are hands-free telephony, speech recognition, teleconferencing, and hearing aids.Beamformer realizations can be classified into fixed and adaptive.
A fixed beamformer combines the noisy signals of multiple microphones by a time-invariant filter-and-sum operation.The combining filters can be designed to achieve constructive superposition towards a desired direction (delayand-sum beamformer) or in order to maximize the SNR improvement (superdirective beamformer), for example, [1].As practical problems such as self-noise and amplitude or phase errors of the microphones limit the use of optimal beamformers, constrained solutions have been introduced that limit the directivity to the benefit of reduced susceptibility [2][3][4].Most fixed beamformer design algorithms assume the desired source to be positioned in the far field, that is, the distance between the microphone array and the source is much greater than the dimension of the array.Nearfield superdirectivity [5] additionally exploits amplitude differences between the microphone signals.Adaptive beamformers commonly consist of a fixed beamformer steered towards a desired direction and a time-varying branch, which adaptively steers beamformer spatial nulls towards interfering sources.Among various adaptive beamformers, the Griffiths-Jim beamformer [6], or extensions, for example, in [7,8], is most widely known.Adaptive beamformers can be considered less robust against distortions of the desired signal than fixed beamformers.
Beamforming for binaural input signals, that is, signals recorded by single microphones at the left and right ear, has found significantly less attention than beamformers for (linear) microphone arrays.An important application is the enhancement of speech in a difficult multitalker situation using binaural hearing aids.
Current hearing aids achieve a speech intelligibility improvement in difficult acoustic condition by the use of independent small endfire arrays, often integrated into behindthe-ear devices with low microphone distances around 1-2 cm.When hearing aids are used in combination with eyeglasses, larger arrays are feasible, which can also form a binaural enhanced signal [9].
Binaural noise reduction techniques get into attention, when space limitation forbids the use of multiple microphones in one device or when the enhancement benefits of two independent endfire arrays are to be combined with binaural processing benefit.In contrast to an endfire array, a binaural speech enhancement system must work with a dualchannel input-output signal, at best without modification of the interaural amplitude and phase differences in order not to disturb the original spatial impression.
Enhancement by exploiting coherence properties [10] of the desired source and the noise [3,11] has the ability to reduce diffuse noise to a high degree, however fails in suppressing sound from directional interferers, especially unwanted speech.Also, due to the adaptive estimation of the instantaneous coherence in frequency bands, musical tones can occur.In [12,13], a noise reduction system has been proposed, that applies a binaural processing model of the human ear.To suppress lateral noise sources, the interaural level and phase differences are compared to reference values for the frontal direction.Frequency components are attenuated by evaluation of the deviation from reference patterns.However, the system suffers severely from susceptibility to reverberation.In [14], the Griffiths-Jim adaptive beamformer [6] has been applied to binaural noise reduction in subbands, and listening tests have shown a performance gain in terms of speech intelligibility.However, the subband Griffiths-Jim approach requires a voice activity detection (VAD) for the filter adaptation which can cause cancellation of the desired speech when the VAD frequently fails especially at low signalto-noise ratios.
In [15], a two-microphone adaptive system is presented with the core of a modified Griffiths-Jim beamformer.By lowband-highband separation, a tradeoff is provided between array-processing benefit and binaural benefit by the choice of the cutoff frequency.In the lower band, the binaural signal is passed to the respective ear.The directional filter is only applied to the high-frequency regions, whose influence to sound localization and lateralization is considered less significant.Both adaptive algorithms from [14,15] have the ability to adaptively cancel out an interfering source.However, the beamformer adaptation procedure needs to be coupled to a voice activity detection (VAD) or correlationbased measure to counteract against possible target cancellation.
In this contribution, a full-band binaural input-output array that applies a binaural signal model and the wellknown superdirective beamformer as core is presented [16].The dual-channel system thus comprises the advantages of a fixed beamformer, that is, low risk of target cancellation and computational simplicity.
To deliver an enhanced stereo signal instead of a mono output, an efficient adaptive spectral weight calculation is introduced, in which the desired signal is passed unfiltered and which does not modify the perceptually important interaural time and phase differences of the target and residual noise signal.To further increase the performance, a well-known Wiener postfilter is also adapted for the binaural application under consideration of the same requirements.
The rest of the paper is organized as follow.In Section 2, the binaural signal model is introduced as a basis for the beamformer algorithm.Section 3 includes the proposed superdirective beamformer with dual-channel input and output as well as the adaptive postfilter.Finally, in Section 4 performance results are given in a real environment.

BINAURAL SIGNAL MODEL
For the derivation of binaural beamformers, an appropriate signal model is required.The microphone signals at the left and right ears do not only differ in the time difference depending on the position of the source relative to the head.Furthermore, the shadowing effect of the head causes significant intensity differences between the left-and right-ear microphone signals.Both effects are described by the headrelated transfer functions (HRTFs) [17].
Figure 1(a) shows a time signal s arriving at the microphones from the angle θ S in the horizontal plane.The time signals at the left and right microphones are denoted by y l , y r .The microphone signal spectra can be expressed by the HRTFs towards left and right ears D l (ω), D r (ω).As the beamformer will be realized in the DFT domain, a DFT representation of the spectra is chosen.At discrete DFT frequencies ω k with frequency index k, the left-and right-ear signal spectra are given by Here, S(ω k ) denotes the spectrum of the original signal s.For brevity, the frequency index k is used instead of ω k .
The acoustic transfer functions are illustrated in Figure 1.The shadowing effect of the head is described by multiplication of each spectral coefficient of the input spectrum S(k) with an angle and frequency-dependent physical amplitude factors α Analy. to extract the transfer vectors for a number of relevant spatial directions.Secondly, a binaural model is applied to approximate transfer vectors.

HRTF database
The first approach to extract interaural time differences and amplitude differences is to use a database of head-related impulse responses, for example, [18].This database comprises recordings of head-related impulse responses d l (θ n , i), d r (θ n , i) with time index i for several spatial directions with in-the-ear microphones using a Knowles Electronics Manikin for Auditory Research (KEMAR) head.For a given resolution of the azimuths, for example, 5 degrees, the values of α The left-and rightear delays can then be calculated using (3).For the extraction of the amplitude factors α phy l , α phy r , a frequency analysis is performed.Here, the same analysis should be applied as that of the frequency-domain realization of the beamformer.

HRTF model
Using binaural cues extracted from a database delivers fixed HRTFs.The real HRTFs will however vary greatly between the persons and also on a daily basis depending on the position of the hearing aids.An adjustment of the beamformer to the user without the demand to measure the customers HRTFs is desirable.This can be achieved by using a parametric binaural model.
In [19], binaural sound synthesis is performed using a two filter blocks that approximate the interaural time differences (ITDs) and the interaural intensity differences (IIDs), respectively, of a spherical head.Useful results have been obtained by cascading a delay element with a single-pole and single-zero head-shadow filter according to with ω 0 = c/a, where c is the speed of sound and a is the radius of the head.The model is determined by the angledependent parameters γ mod and τ mod with ( The parameters of the model are set to β min = 0.1, θ min = 150 • , which produces a fairly good approximation to the ideal frequency response of a rigid sphere (see [19]).The transfer vector D = [D l , D r ] T can be extracted from (4) with The model provides the radius of the spherical head a as parameter.It is set to 0.0875 m, which is commonly considered as the average radius for an adult human head.

Comparison of HRTF extraction methods
Figure 3 shows the normalized time differences τ norm l in dependence of the azimuth angle extracted from the HRTF database and by applying the binaural model.While the model-based approach delivers smaller absolute values, the time differences are very similar.
Figure 4 plots the normalized amplitude factors α norm l over the frequency for different azimuths using the HRTF database, while Figure 5 shows the normalized amplitude    Due to the high variance between persons, measurements of the targets person's HRTFs should at best be provided to a binaural speech enhancement algorithm.However, we think that a strenuous and time-consuming measurement for several angles is not feasible for many application scenarios, for example, not during the hearing aid fitting process.In case of the target person's HRTFs being unknown to the binaural algorithm, the fine structure of a specific HRTF cannot be exploited.Therefore, we prefer the model-based approach, which can be customized to some extent with little effort by choosing a different head radius, for example, during the hearing aid fitting process.In the following, the dual-channel input-output beamformer design will be illustrated only with underlying the model-based HRTF.

SUPERDIRECTIVE BINAURAL BEAMFORMER
In this section, the superdirective beamformer with Wiener postfilter is adapted for the binaural application.The proposed fixed beamformer uses superdirective filter design techniques in combination with the signal model to optimally enhance signals from a given desired spatial direction compared to all other directions.The enhancement of the beamformer and postfilter is then exploited to calculate spectral weights for left-and right-ear spectral coefficients under the constraint of the preservation of the interaural amplitude and phase differences.

Superdirective beamformer design in the DFT domain
For the computation of the next DFT, the window is shifted by R samples.These parameters are chosen to N = 256 and R = 112 at a sampling frequency of f s = 20 kHz.For the sake of brevity, the index λ is omitted in the following.
In the DFT domain, the beamformer is realized as multiplication of the input noisy DFT coefficients Y m , m ∈ {1, . . ., M}, with complex factors W m .The output spectral coefficient is given as The objective of the superdirective design of the weight vector W is to maximize the output SNR.This can be achieved by minimizing the output energy with the constraint of an unfiltered signal from the desired direction.The minimum variance distortionless response (MVDR) approach can be written as (see [1][2][3]) Here Φ MM denotes the cross-spectral-density matrix, If a homogenous isotropic noise field is assumed, then the elements of Φ MM are determined only by the distance d mn between microphones m and n [10]: The vector of coefficients can then be determined by gradient calculation or using Lagrangian multipliers to If a design should be performed with limited superdirectivity to avoid the loss of directivity by microphone mismatch, the design rule can be modified by inserting a tradeoff factor μ s [3], If μ s → ∞, then W → 1/D H , that is, a delay-and-sum beamformer results from the design rule.A more general approach to control the tradeoff between directivity and robustness is presented in [4].
The directivity of the superdirective beamformer strongly depends on the position of the microphone array towards the desired direction.If the axis of the microphone array is the same as the direction of arrival, an endfire array with higher directivity than for a broadside array, where the axis is orthogonal to the direction of arrival, is obtained.

Binaural superdirective coefficients
In the binaural application M = 2 microphones are used, the spectral coefficients are indexed by l and r to express left and right sides of the head.The superdirective design rule according to (13) requires the transfer vector for the desired direction T and the matrix of crosspower-spectral densities Φ 22 as inputs for each frequency bin k.The transfer vector can be extracted from (4) according to (6).On the other hand, the 2×2 cross-power-spectral density matrix Φ 22 (k) can be calculated using the head related coherence function.After normalization by Φ ll (k)Φ rr (k), where Φ ll (k) = Φ rr (k), the matrix is with the coherence function Beamformer ( 24) Figure 6: Superdirective binaural input-output beamformer.
The head-related coherence function is much lower than the value that could be expected from (11) when only taking the microphone distance between left and right ears into account [3].It can be calculated by averaging a number N of equidistant HRTFs across the horizontal plane, 0 ≤ θ < 2π, (16) In this work, an angular resolution of 5 degrees in the horizontal plane is used, that is, N = 72.

Dual-channel input-output beamformer
A beamformer that outputs a monaural signal would be unacceptable, because the benefit in terms of noise reduction is consumed by the loss of spatial hearing.We therefore propose to utilize the beamformer output for the calculation of spectral weights.Figure 6 shows a block diagram of the proposed superdirective stereo input-output beamformer in the frequency domain.
In analogy to (8), the input DFT coefficients are summed after complex multiplication by superdirective coefficients, The enhanced Fourier coefficients Z can then serve as reference for the calculation of weight factors G (as defined in the following), which output binaural enhanced spectra S l , S r via multiplication with the input spectra Y l , Y r .Afterwards, the enhanced dual-channel time signal is synthesized via IDFT and overlap add.
Regarding the weight calculation method, it is advantageous to determine a single real-valued gain for both leftand right-ear spectral coefficients.By doing so, the interaural time and amplitude differences will be preserved in the enhanced signal.Consequently, distortions of the spatial impression will be minimized in the output signal.Real-valued weight factors G super (k) are desirable in order to minimize distortions from the frequency-domain filter.In addition, a distortionless response for the desired direction should be guaranteed, that is, G super (θ s , k) To fulfil the demand of just one weight for both left-and right-ear sides, the weights are calculated by comparing the spectral amplitudes of the beamformer output to the sum of both input spectral amplitudes, To avoid amplification, the weight factor is upper-limited to one afterwards.To fulfil the distortionless response of the desired signal with (18), the MVDR design rule according to (13) has to be modified with a correction factor corr super : corr super (θ, k) is to be determined in the following.Assuming that a desired signal s arrives from θ s , that is, Also assume that the coefficient vector W has been designed for this angle θ s .Then, after insertion of ( 17) into (18), we obtained The demand G super != 1 for a signal from θ S yields The design of the superdirective coefficient vector W(θ s , k) for frequency bin k and desired angle θ s with tradeoff factor μ s is therefore

Directivity evaluation
Now, the performance of the beamformer is evaluated in terms of spatial directivity and directivity gain plots.The directivity pattern Ψ(θ s , θ, k) is defined as the squared transfer function for a signal that arrives from a certain spatial direction θ if the beamformer is designed for angle θ s .As a reference, Figure 7 plots the directivity pattern of a typical hearing aid first-order delay-and-subtract beamformer integrated, for example, in a single behind-the-ear device.In the example, the rear microphone signal is delayed 2/3 of the time, which a source from θ S = 0 • needs to travel from the front to the rear microphone, and is subtracted from the front microphone signal.The approach is limited to low microphone distances, typically lower than 2 cm, to avoid spectral notches caused by spatial aliasing.Also, the lower-frequency region needs to be excluded, because of its low signal-to-microphone-noise ratio caused by the subtract operation.
The behind-the-ear endfire beamformer can greatly attenuate signals from behind the hearing-impaired subjects but cannot differentiate between left-and right-ear sides.The dual-channel input-output beamformer behaves the opposite.Due to the binaural microphone position, the directivity shows a front-rear ambiguity.
In the case of the stereo input-output binaural beamformer, the directivity pattern is determined by the squared weight factors G 2 super , according to (18), that are applied to the spectral coefficients which can be written as Figure 8 shows the beam pattern for the desired direction θ s = 0 • .In this case, the superdirective design leads to the   special case of a simple delay-and-sum beamformer, that is, a broadside array with two elements.Thus, the achieved directivity is low at low frequencies.At higher frequencies, the phase difference generated by a lateral source becomes significant and causes a narrow main lobe along with sidelobes due to spatial aliasing.However, the side lobes are of lower magnitude due to the different amplitude transfer functions.
Figure 9 shows the directivity pattern for the desired angle θ s = −60 • .The design parameter was set to μ s = 10, that is, low degree of superdirectivity.Hence, approximately a delay-and-sum beamformer with amplitude modification is obtained.Because of significant interaural differences, the directivity is much higher compared to that of the frontal desired direction, especially signals from the opposite side will be highly attenuated.The main lobe is comparably large at all plotted frequencies.
Figure 10 shows that the directivity if the design parameter is adjusted for a maximum degree of superdirectivity, that is, μ s = 0.As expected, the directivity further increases especially for low frequencies and the main lobe becomes more narrow.
To measure the directivity of the dual-channel inputoutput system in a more compact way, the overall gain can be considered.It is defined as the ratio of the directivity towards the desired direction θ s and the average directivity.As only the horizontal plane is considered, the average directivity can be obtained by averaging over 0 ≤ θ < 2π with equidistant angles at a resolution of 5 degrees, that is, N = 72.The directivity gain DG is given as Figure 11 depicts the directivity gain as a function of the frequency for different desired directions with low degree of superdirectivity.The gain increases from 0 dB to up to 4-5.5 dB below 1 kHz depending on the desired direction.Since the microphone distance between the ears is comparably high with 17.5 cm, phase ambiguity causes oscillations in the frequency plot.Towards higher frequencies, the interaural amplitude differences gain more influence on the directivity gain.For θ S = 0 • , unbalanced amplitudes of the spectral coefficients of leftand right-ear sides decrease the gain in (18) towards high frequencies due to the simple addition of the coefficients in the numerator, while the denominator is dominated by one input spectral amplitude for a lateral signal.For lateral desired directions however, the interaural amplitude differences are exploited in the numerator with (18) resulting in directivity gain values up to 5 dB.
Figure 12 shows the directivity for the case that the coefficients are designed with respect to high degree of superdirectivity.Now, even at low frequencies, a gain of up to nearly 6 dB can be accomplished.

Multichannel postfilter
The superdirective beamformer produces the best possible signal-to-noise ratio for a narrowband input by minimizing the noise power subject to the constraint of a distortionless response for a desired direction [20].It can be shown [21] that the best possible estimate in the MMSE sense is the multichannel Wiener filter, which can be factorized into the superdirective beamformer followed by a single-channel Wiener postfilter.The optimum weight vector W opt (k) that Possible realizations of the Wiener postfilter are based on the observation that the noise correlation between the microphone signals is low [22,23].An improved performing algorithm is presented in [21], where the transfer function H post of the postfilter is estimated by the ratio of the output power spectral density Φ zz and the average input power spectral density of the beamformer Φ yy with

Adaptation to dual-channel input-output beamformer
In the following, the dual-channel input-output beamformer is extended by also adapting the formulation of the postfilter according to (27) into the spectral weighting framework.The goal is to find spectral weights with similar requirements as for the beamformer gains.Again, only one postfilter weight is to be determined for both left-and right-ear spectral coefficients in order not to disturb the original spatial impression, that is, the interaural amplitude and phase differences.Secondly, a source from a desired direction θ S should pass unfiltered, that is, the spectral postfilter weight for a signal from that direction should be one.In analogy to the optimal MMSE estimate according to (26) weights, G post postfilter weights are multiplicatively combined with the beamformer weights G super according to (18) to the resulting weights G(k): To realize the postfilter according to (27) in the spectral weighting framework, weights are calculated with 10 EURASIP Journal on Applied Signal Processing The desired angle-and frequency-dependent correction factor corr post will guarantee a distortionless response towards a signal from the desired direction θ S .For a signal from θ S , (29) can be rewritten as • corr post θ s , k . ( Since the beamformer coefficients have been designed with respect to W(θ S , k) H D(θ S , k) = α phy l (θ S , k) + α phy r (θ S , k), the spectral weights can be reformulated as Consequently, after insertion of (32) into (29), the resulting postfilter weight calculation for combination with the dualchannel input-output beamformer according to (18), (22) can finally be written as ( Again, to avoid amplification, the postfilter weight should be upper-limited to one.Figure 13 shows a block diagram of the resulting system with stereo input-output beamformer plus Wiener postfilter in the DFT domain.After the dual-channel beamformer processing, the postfilter weights are calculated according to (33) and are multiplicatively combined with the beamformer gains according to (28).The dual-channel output spectral coefficients S l (k), S r (k) are generated by multiplication of left-and right-side input coefficients Y l (k), Y r (k) with the respective weight G(k).Finally, the binaural enhanced time signals are resynthesized using IDFT and overlap add.

PERFORMANCE EVALUATION
In this section, the performance of the dual-channel inputoutput beamformer with postfilter is evaluated by a multitalker situation in a real environment.The performance of Beamformer ( 24) Figure 13: Superdirective input-output beamformer with postfiltering.
the system depends on various parameters of the real environment in which it is applied in.First of all, the unknown HRTFs of the target person, for example, a hearing-impaired person will deviate from the binaural model or from a preevaluated HRTF database.The noise reduction performance of the system, that relies on the erroneous database, will thus decrease.Secondly, reverberation will degrade the performance.
In order to evaluate the performance of the beamformer in a realistic environment, recordings of speech sources were made in a conference room (reverberation time T 0 ≈ 800 ms) with two source-target distances as depicted in Figure 14.All recordings were performed using a head measurement system (HMS) II dummy head with binaural hearing aids attached above the ears without taking special precautions to match exact positions.In the first scenario, the speech sources were located within a short distance of 0.75 m to the head.Also, the head was located at least 2.2 m away from the nearest wall.In the second scenario, the loudspeakers were moved 2 m away from the dummy head.Thus, the recordings from the two scenarios differ significantly in the direct-to-reverberation ratio.In the experiments, a desired speech source s 1 arrives from angle θ S1 towards which the beamformer is steered and an interfering speech signal s 2 arrives from angle θ S2 .The superdirectivity tradeoff factor was set to μ s = 0.5.
Firstly, the spectral attenuation of the desired and unwanted speech for one source-interferer configuration, θ S1 = −60 • , θ S2 = 30 • , at a distance of 0.75 m from the head is illustrated.The theoretical behavior of the beamformer without postfilter for that specific scenario is indicated by Figure 12.The desired source should pass unfiltered, while the interferer from θ S2 = 30 • should be frequencydependently attenuated.A lower degree of attenuation is expected at f = 1000 Hz due to spatial aliasing.
Figure 15 plots the measured results in the real environment.The attenuation of the interfering speech source varies mainly between 2-7 dB, while the desired source is also attenuated by 1-2 dB, more or less constant over the frequency.At frequencies below 700 Hz, the superdirectivity already allows a significant attenuation of the interferer.Due to spatial aliasing, the attenuation difference is very low around 1200 Hz.At high frequencies, the attenuation difference rises again because the beamformer can exploit the significance of the interaural amplitude differences here.

Intelligibility-weighted improvement
To judge the benefit of the frequency-dependent noise reduction gain like exemplarily shown in Figure 15, a speech intelligibility-weighted noise reduction gain is applied in the following.
For its calculation, a spectral noise reduction gain is determined as the difference between the power spectral density attenuation of the undesired source subtracted from the attenuation of the desired source, that is, Here, Φ s1s1 (ω) is the power spectral density of the desired speech and Φ s2s2 (ω) that of the unwanted speech.Φ s1 s1 is the power spectral density of the remaining components of s 1 after beamformer processing according to Figure 6 or Figure 13 and Φ s2 s2 is that of the remaining components of s 2 .
To judge the intelligibility improvement that is achieved by the frequency-dependent reduction gain, AG(ω) is grouped into critical bands AG(ω b ) and is weighted according to the ANSI's speech intelligibility index standard [24].The intelligibility-weighted gain can then be calculated by where a b are the weights according to [24] in the respective critical frequency band b.After evaluation of (34) and (35) for the left-and right-microphone signals, the intelligibilityweighted gains of left and right ears are averaged.Figure 16 plots the performance of the superdirective binaural input-output beamformer in terms of speech intelligibility-weighted gain for a desired speech source from 0 • and speech interferers from variable directions.The two plots in Figure 16 show the gain when all sources were located 0.75 m and 2 m away from the dummy head.
The binaural input-output superdirective beamformer only delivers about 0.6-1.2dB intelligibility-weighted improvement because of its comparably low directivity towards the frontal direction as depicted in decreased direct-to-reverberation ratio, the overall performance is all the time lower for the 2 m distance from the dummy head.
Figure 17 plots the performance of the superdirective binaural input-output beamformer when the desired signal arrives from θ = −60 • .Results for desired sources and interfering sources from a distance of 0.75 m and 2 m from the dummy head are shown.
For low angular differences to the desired direction, that is, θ = −90 • and θ = −30 • , the gain mostly stays below 1 dB.When the interfering speech source is located at the other side, the superdirective beamformer achieves the highest intelligibility-weighted gain, whose value is nearly 3 dB.
Due to the decreased direct-to-reverberation ratio at the distance of 2 m, the gain remains below 2 dB.Now, the influence of the additional binaural postfilter for the superdirective input-output beamformer is examined.Figures 18 and 19 show the intelligibility-weighted noise reduction gain of the superdirective stereo inputoutput beamformer with and without postfilter for desired directions θ S = 0 • and θ S = −60 • .
The postfilter provides an additional intelligibilityweighted gain which is proportional to that obtained by the superdirective binaural beamformer.For a desired signal from a lateral direction, the absolute improvement is much larger than for the frontal direction.performance for a lateral desired direction, θ S1 = −60 • , while the right part plots that for the frontal direction θ S1 = 0 • .Due to the formulation of the beamformer with postfilter as a single spectral weight for both left-and right-ear spectral coefficients, the proposed algorithm almost delivers the same improvement for the good-and bad-ear sides.The improvement at the ear side with the lower SNR is only slightly higher.

Speech quality of target source
To measure the speech quality of the target signal after processing, the segmental SNR is measured.Again, the target speech was mixed with interferers from other directions.The speech quality was then determined by applying the resulting filter on the target signal alone and calculating the segmental speech SNR between input and filtered output.Figure 21 plots the segmental speech SNR for the two considered desired angles, θ S1 = −60 • and θ S1 = 0 • .The speech quality of the target source is somewhat degraded due its attenuation caused by imperfect knowledge of the HRTFs as also depicted in Figure 15, however the speech SNR is always high at 15-25 dB.For the lateral desired direction, the target attenuation is always higher than for the frontal direction.

CONCLUSION
We have presented a dual-channel input-output algorithm for binaural speech enhancement, which consists of a superdirective beamformer and a postfilter with an underlying binaural signal model, and consists of a simple spectral weighting scheme.The system perfectly preserves the interaural amplitude and phase differences and passes a source from a desired direction unfiltered in principle.
Instrumental measurements and informal listening tests indicate a significant attenuation of speech interferers in a real environment, while the target source is only slightly degraded.The amount of interference rejection depends on the spatial position of the desired and unwanted sources, a high rejection of interfering noise or speech can particularly be expected for lateral desired directions.The proposed algorithms can efficiently be realized in real time.

. ( 2 )Figure 1 :
Figure 1: Acoustic transfer of a source from θ S towards the left and right ears.

Figure 2 :
Figure 2: Generation of physical binaural transfer cues α phy l , α phy r , τ phy l , τ phy r using a database of head-related impulse responses.

2 .
White noise is filtered with the impulse responses d l (θ n ), d r (θ n ) for the left and right ears.A maximum search of the cross-correlation function of the output signals delivers the relative time differences τ phy l , τ phy r .

Figure 3 :
Figure 3: Normalized time differences of left ear τ norm l (θ) using the HRTF database and the binaural model, respectively.

Figure 5 :
Figure 5: Normalized amplitude factors α norm l (θ, k) for different azimuth angles extracted from the binaural model.

Figure 16 :
Figure 16: Intelligibility-weighted gain according to (35) of superdirective stereo input-output beamformer for speech from θ S1 = 0 • and interfering speech from other directions (distance to dummy head 0.75 m and 2 m, resp.).

Figure 20 20 Figure 18 :
Figure 20  plots the intelligibility-weighted gains independently for both ear sides when applying the beamformer with postfiltering.The left part of Figure20shows the

Figure 21 :
Figure 21: Segmental speech SNR of target signal for two different desired directions.