Microphone Array Speech Processing
© Sven Nordholm et al. 2010
Received: 21 July 2010
Accepted: 21 July 2010
Published: 16 September 2010
Significant knowledge about microphone arrays has been gained from years of intense research and product development. There have been numerous applications suggested, for example, from large arrays (in the order of 100 elements) for use in auditoriums to small arrays with only 2 or 3 elements for hearing aids and mobile telephones. Apart from that, microphone array technology has been widely applied in speech recognition, surveillance, and warfare. Traditional techniques that have been used for microphone arrays include fixed spatial filters, such as, frequency invariant beamformers, optimal and adaptive beamformers. These array techniques assume either model knowledge or calibration signal knowledge as well as localization information for their design. Thus they usually combine some form of localisation and tracking with the beamforming. Today contemporary techniques using blind signal separation (BSS) and time frequency masking technique have attracted significant attention. Those techniques are less reliant on array model and localization, but more on the statistical properties of speech signals such as sparseness, non-Gaussianity, and non-stationarity. The main advantage that multiple microphones add from a theoretical perspective is the spatial diversity, which is an effective tool to combat interference, reverberation, and noise. The underpinning physical feature used is a difference in coherence in the target field (speech signal) versus the noise field. Viewing the processing in this way one can understand also the difficulty in enhancing highly reverberant speech given that we only can observe the received microphone signals.
This special issue contains contributions to traditional areas of research such as frequency invariant beamforming , hand-free operation of microphone arrays in cars , and source localisation . The contributions show new ways to study these traditional problems and give new insights into those problems. Small size arrays have always a lot of applications and interest for mobile terminals, hearing aids, and close up microphones . The novel way to represent small size arrays leads to a capability to suppress multiple interferers. Abnormalities in noise and speech stemming from processing are largely unavoidable, and using nonlinear processing results often in significant character change particularly in noise character. It is thus important to provide new insights into those phenomena particularly the so called musical noise . Finally, new and unusual use of microphone arrays is always interesting to see. Distributed microphone arrays in a sensor network  provide a novel approach to find snipers. This type of processing has good opportunities to grow in interest for new and improved applications.
The contributions found in this special issue can be categorized to three main aspects of microphone array processing: (i) microphone array design based on eigenmode decomposition [1, 4]; (ii) multichannel processing methods [2, 5]; and (iii) source localisation [3, 6].
The paper by Zhang et al., "Selective frequency invariant uniform circular broadband beamformer" , describes a design method for Frequency-Invariant (FI) beamforming. This problem is a well-known array signal processing technique used in many applications such as, speech acquisition, acoustic imaging and communications purposes. However, many existing FI beamformers are designed to have a frequency invariant gain over all angles. This might not be necessary and if a gain constraint is confined to a specific angle, then the FI performance over that selected region (in frequency and angle) can be expected to improve. Inspired by this idea, the proposed algorithm attempts to optimize the frequency invariant beampattern solely for the mainlobe and relax the FI requirement on the sidelobes. This sacrifice on performance in the undesired region is traded off for better performance in the desired region as well as reduced number of microphones employed. The objective function is designed to minimize the overall spatial response of the beamformer with a constraint on the gain being smaller than a predefined threshold value across a specific frequency range and at a specific angle. This problem is formulated as a convex optimization problem and the solution is obtained by using the Second-Order Cone Programming (SOCP) technique. An analysis of the computational complexity of the proposed algorithm is presented as well as its performance. The performance is evaluated via computer simulation for different number of sensors and different threshold values. Simulation results show that the proposed algorithm is able to achieve a smaller mean square error of the spatial response gain for the specific FI region compared to existing algorithms.
The paper by Derkx, "First-order azimuthal null-steering for the suppression of two directional interferers"  shows that an azimuth steerable first-order super directional microphone response can be constructed by a linear combination of three eigenbeams: a monopole and two orthogonal dipoles. Although the response of a (rotation symmetric) first-order response can only exhibit a single null, the paper studies a slice through this beampattern lying in the azimuthal plane. In this way, a maximum of two nulls in the azimuthal plane can be defined. These nulls are symmetric with respect to the main-lobe axis. By placing these two nulls on maximally two-directional sources to be rejected and compensating for the drop in level for the desired direction, these directional sources can be effectively rejected without attenuating the desired source. An adaptive null-steering scheme for adjusting the beampattern, which enables automatic source suppression, is presented. Closed-form expressions for this optimal null-steering are derived, enabling the computation of the azimuthal angles of the interferers. It is shown that the proposed technique has a good directivity index when the angular difference between the desired source and each directional interferer is at least 90 degrees.
In the paper by Takahashi et al. "Musical noise analysis in methods of integrating microphone array and spectral subtraction based on higher-order statistics" , an objective analysis on musical noise is conducted. The musical noise is generated by two methods of integrating microphone array signal processing and spectral subtraction. To obtain better noise reduction, methods of integrating microphone array signal processing and nonlinear signal processing have been researched. However, nonlinear signal processing often generates musical noise. Since such musical noise causes discomfort to users, it is desirable that musical noise is mitigated. Moreover, it has been recently reported that higher-order statistics are strongly related to the amount of musical noise generated. This implies that it is possible to optimize the integration method from the viewpoint of not only noise reduction performance but also the amount of musical noise generated. Thus, the simplest methods of integration, that is, the delay-and-sum beamformer and spectral subtraction, are analysed and the features of musical noise generated by each method are clarified. As a result, it is clarified that a specific structure of integration is preferable from the viewpoint of the amount of generated musical noise. The validity of the analysis is shown via a computer simulation and a subjective evaluation.
The paper by Freudenberger et al., "Microphone diversity combining for in-car applications" , proposes a frequency domain diversity approach for two or more microphone signals, for example, for in-car applications. The microphones should be positioned separately to ensure diverse signal conditions and incoherent recording of noise. This enables a better compromise for the microphone position with respect to different speaker sizes and noise sources. This work proposes a two-stage approach: In the first stage, the microphone signals are weighted with respect to their signal-to-noise ratio and then summed similar to maximum-ratio-combining. The combined signal is then used as a reference for a frequency domain least-mean-squares (LMS) filter for each input signal. The output SNR is significantly improved compared to coherence-based noise reduction systems, even if one microphone is heavily corrupted by noise.
The paper by Ichikawa et al., "DOA estimation with local-peak-weighted CSP" , proposes a novel weighting algorithm for Cross-power Spectrum Phase (CSP) analysis to improve the accuracy of direction of arrival (DOA) estimation for beamforming in a noisy environment. As a sound source, a human speaker is used, and as a noise source broadband automobile noise is used. The harmonic structures in the human speech spectrum can be used for weighting the CSP analysis, because harmonic bins must contain more speech power than the others and thus give us more reliable information. However, most conventional methods leveraging harmonic structures require pitch estimation with voiced-unvoiced classification, which is not sufficiently accurate in noisy environments. The suggested approach employs the observed power spectrum, which is directly converted into weights for the CSP analysis by retaining only the local peaks considered to be coming from a harmonic structure. The presented results show that the proposed approach significantly reduces the errors in localization, and it also shows further improvement when used with other weighting algorithms.
The paper by Lindgren et al., "Shooter localization in wireless microphone networks" , is an interesting combination of microphone array technology with distributed communications. By detecting the muzzle blast as well as the ballistic shock wave, the microphone array algorithm is able to locate the shooter in the case when the sensors are synchronized. However, in the distributed sensor case, synchronization is either not achievable or very expensive to achieve and therefore the accuracy of localization comes into question. Field trials are described to support the algorithmic development.
Thushara D. Abhayapala
Patrick A. Naylor
- Zhang X, Ser W, Zhang Z, Krishna AK: Selective frequency invariant uniform circular broadband beamformer. EURASIP Journal on Advances in Signal Processing 2010, 2010:-11.Google Scholar
- Freudenberger J, Stenzel S, Venditti B: Microphone diversity combining for In-car applications. EURASIP Journal on Advances in Signal Processing 2010, 2010:-13.Google Scholar
- Ichikawa O, Fukuda T, Nishimura M: DOA estimation with local-peak-weighted CSP. EURASIP Journal on Advances in Signal Processing 2010, 2010:-9.Google Scholar
- Derkx RMM: First-order adaptive azimuthal null-steering for the suppression of two directional interferers. EURASIP Journal on Advances in Signal Processing 2010, 2010:-16.Google Scholar
- Takahashi Yu, Saruwatari H, Shikano K, Kondo K: Musical-noise analysis in methods of integrating microphone array and spectral subtraction based on higher-order statistics. EURASIP Journal on Advances in Signal Processing 2010, 2010:-25.Google Scholar
- Lindgren D, Wilsson O, Gustafsson F, Habberstad H: Shooter localization in wireless sensor networks. Proceedings of the 12th International Conference on Information Fusion (FUSION '09), July 2009 404-411.Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.