Geometry descriptors of irregular microphone arrays related to beamforming performance

Yu, Jingjing; Donohue, Kevin D

doi:10.1186/1687-6180-2012-249

Research
Open access
Published: 27 November 2012

Geometry descriptors of irregular microphone arrays related to beamforming performance

Jingjing Yu¹ &
Kevin D Donohue¹

EURASIP Journal on Advances in Signal Processing volume 2012, Article number: 249 (2012) Cite this article

4053 Accesses
11 Citations
Metrics details

Abstract

Performance analysis for microphone arrays with irregular geometries typically requires direct computation of beamforming gains over the spatial and frequency ranges of interest. However, theses computations can be very consuming and limit synthesis methods for applications that require rapid answers, as in the case of surveillance and mobile platforms. A better understanding of microphone arrangements and their impact on performance can result in more efficient objective functions for optimizing array performance. This article, therefore, analyzes the relationship between irregular microphone geometries and spatial filtering performance with Monte Carlo simulations. Novel geometry descriptors are developed to capture the properties of irregular microphone distributions showing their impact on array performance. Performance metrics are computed from three-dimensional beam patterns through a delay and sum beamformer with a fixed number of microphones for irregular arrays and comparable regular arrays. Statistical analysis and Multi-way Analysis of Variance establish relationships between key performance metrics and proposed geometry descriptors. It is demonstrated that in conjunction with array centroid offset and dispersion, statistics of the microphone differential path distance can explain variations of performance metrics when steering at targets for immersive or near-field microphone applications.

1 Introduction

Microphone arrays use spatial diversity of element positions to capture acoustic signals and reduce degradation brought on by reverberation and noise. It is widely applied in speech enhancement, teleconferencing, talker tracking, hands-free human-machine interfaces, and acoustic surveillance systems [1, 2]. Because most of these applications involve separating desired signals from noise and estimating acoustic parameters, array performance is usually assessed by its ability to locate, track, and separate sound sources in the field of view (FOV) [1]. Critical factors affecting performance include acoustic environment, source spectral content, processing algorithm, and microphone geometry. For a fixed number of microphones it has been demonstrated that the array geometry is the dominant factor for performance [3–5]. However, previous studies have largely focused on regular geometries in far-field. These results are not as useful for immersive geometries that typically occur for surveillance and smart room applications. This article, therefore, focuses on the relationship between microphone distribution properties and spatial filtering performance that is more suited for cases when the focal point is close to the arrays and the arrays have irregular placements. Classes of irregular geometries for immersive environments are statistically analyzed through Monte Carlo simulations to identify key geometric characteristics related to array performance.

Regular arrays (elements arranged under a regular spacing constraint) have been considered in previous research, such as uniformly spaced linear, planar, and circular arrays [2]. Due to the regularity of element arrangements, their geometries are specified by a small parameter set, such as aperture and number of elements or their spacings, which are directly related to aspects of performance [2, 6]. In general, most of these analyses have been done for narrow-band far-field cases where spatial aliasing is directly related to microphone spacing and resolution to aperture. Irregular arrays, which diversify microphone positions, can potentially achieve better performance, as demonstrated in [3, 4, 7]. Instead of limited optimal range of signal frequency for regular arrays, irregular arrays can result in a more consistent performance over a broader range of frequencies, such as those associated with speech [6].

Although special arrays that deviate from simple Cartesian arrangements have been studied for better performance, such as spherical arrays to capture and render sound fields [8, 9], and minimum redundancy arrays to achieve maximum spatial resolution with fixed number of microphones [10–12], they still retain certain regularity of element placements and restricted by previous limitations of regular arrays. The study in this article constrains the microphone geometries to a plane, but allows for any arrangement of elements and compares geometries with similar relationships to the focal point. For example, Figure 1 shows three planar arrays with the same centroid and dispersion focused on a point 0.2 m below the array centroid. (Dispersion is analogous to aperture.) Array gains over the FOV were computed via simulation by moving a colored noise source of unit power with speech-like frequency distribution over the grid points of the FOV and then computing the received power from the beamformed focal point as described in [7, 13]. The microphone positions are superimposed over their array gains showing the irregular array in Figure 1(b) having larger gains at the non-focal points than regular array in Figure 1(a), while the irregular array in Figure 1(c) shows lower gains at non-focal points. These performance differences cannot be explained by previous analyses of regular geometries. Geometric descriptions for classes of irregular geometries have not been considered for arrays lacking a regular structure. The studies [14, 15] introduced optimization approaches for irregular geometries by minimizing the residues between desired gain pattern and actual pattern computed from each microphone position. However, it is still not clear what geometric properties are crucial for the superior beamforming performance of irregular arrays. Therefore, this article proposes novel geometry descriptors with a relationship to performance that are useful for explaining the performance differences between irregular arrays (as shown in Figure 1), and provides guidelines and insight for the irregular microphone cluster design.

To relate the geometric descriptions to performance, experiments are performed using Monte Carlo simulations to analyze three-dimensional beam patterns by uniformly distributed microphones over a planar design space. Since the main applications considered for the irregular arrays involve speech (as in the case of surveillance in a cocktail party environment), the excitation of the arrays need to compute the performance metrics is colored noise with the same spectral distribution as the band importance function used in the speech intelligibility index (SII) [5]. This provides a compact summary statistic that is relevant for application where speech intelligibility is important. Results show the primary geometry factor that explains array performance is the differential path distance (DPD) distribution between microphone pairs to the target/noise locations. Statistics are derived to assess the performance and array geometry parameters with fixed number of microphones and constant target source location. Delay and sum beamforming (DSB) using an inverse distance weighting is applied to generate the array gains.

This article is organized as follows: Section 2 presents the formulations for computing array gains based on DSB and indicates its relationship with microphone distribution. Section 3 introduces geometry descriptors to characterize array geometries and derives statistics related to the DPD distribution of microphone array. Section 4 analyzes the relationship between proposed geometry descriptors and key performance metrics based on Monte Carlo experiments, and demonstrates a strong correlation between these descriptors and array performance. Finally, “Conclusions” section summarizes the results and presents conclusions.

2 Problem formulation

In order to reveal the impact of microphone distributions on beamforming performance, this section presents the formulations to compute the three-dimensional array gains for microphone arrays relative to a given focal point. Parametric performance metrics are directly computed from these gain patterns.

Consider microphones and sound sources distributed in a three dimensional space. Let u(t;r_s) be the sound source located at position r_s, where r_s is a vector denoting the x, y, and z coordinates. The waveform received by the p^th microphone can be expressed as:

\begin{matrix} v (t; r_{s}, r_{p}) \int u (t; r_{s}, r_{p}) = \\ \int_{−∞}^{\infty} u (τ; r_{s}) h (t - τ; r_{s}, r_{p}) d τ, \end{matrix}

(1)

where h(.) represents the impulse response of propagation path from r_s to r_p. For a reverberant room, the impulse response can be given by:

h (t; r_{s}, r_{p}) = a_{s p 0} (t - τ_{s p 0}) + \sum_{n = 1}^{\infty} a_{spn} (t - τ_{spn}),

(2)

where a_spn(t) is the response related to the n^th propagation path, τ_spn is the corresponding time delay, and n = 0 represents the direct path from source to microphone. In frequency domain the received signal of Equation (1) can now be expressed as:

\begin{matrix} \hat{V} (ω; r_{s}, r_{p}) = \hat{U} (ω; r_{s}) {\hat{A}}_{spo} (ω) exp (- j ω τ_{s p 0}) \\ + \hat{U} (ω; r_{s}) \sum_{n = 1}^{\infty} {\hat{A}}_{spn} (ω) exp (- j ω τ_{spn}), \end{matrix}

(3)

where the hat notation expresses the Fourier transform of corresponding time-domain quantity. Denote the desired focal point as r_i and express the DSB output as:

\hat{G} (r_{i}, r_{s}) = \sum_{p = 1}^{P} B_{ip} \hat{V} (ω; r_{s}, r_{p}) exp (j ω τ_{ip});

(4)

where P is the total number of microphones, B_ip is a scalar representing the filter coefficient related to focal point r_i and microphone position r_p, and τ_ip is the corresponding time delay. For results in this article the coefficient was set to the inverse distance to the focal point as B_ip = 1/d_ip, where d_ip denotes the distance from r_i to r_p. The total output power of this filtered sum is computed by:

\begin{array}{l} S (r_{i}, r_{s}) = \int \sum_{p = 1}^{P} \sum_{q = 1}^{P} \\ B_{ip} B_{iq} \hat{V} (ω; r_{s}, r_{p}) {\hat{V}}^{*} (ω; r_{s}, r_{q}) exp (j ω (τ_{ip} - τ_{iq})) d ω . \end{array}

(5)

In order to obtain the simplified formulation that is useful for analysis and understanding the geometric relationship, consider only the direct paths in Equation (3). With the assumption that the beamformer coefficients and propagation attenuation product factors are uncorrelated with the path differentials, S(r_i,r_s) can be rewritten as:

\begin{matrix} S (r_{i}, r_{s}) = P^{2} \int ∣ \hat{U} (ω; r_{s}) ∣^{2} E [B_{ip} B_{iq} {\hat{A}}_{sp} (ω) {\hat{A}}_{sq}^{*} (ω)] \\ E [exp (j ω ((τ_{sq} - τ_{sp}) + (τ_{ip} - τ_{iq})))] d ω, \end{matrix}

(6)

where E[·] denotes the expected value operator over all microphone pairs generated by the double summation of Equation (5), and S(r_i,r_s) is the output power of beamformer targeting r_i with actual sound source at r_s. For multi-source applications, the total output power of beamformer can be obtained from the superposition of S(r_i,r_s) from each source. To investigate the beamforming performance in relation to array geometry, the time delays are expressed in terms of spatial distances and signal wavelengths:

\begin{matrix} S (r_{i}, r_{s}) = P^{2} \int ∣ \hat{U} (ω; r_{s}) ∣^{2} E [B_{ip} B_{iq} {\hat{A}}_{sp} (ω) {\hat{A}}_{sq}^{*} (ω)] \\ E [exp (j 2 π (\frac{d_{sq} - d_{sp}}{λ} + \frac{d_{ip} - d_{iq}}{λ}))] d ω, \end{matrix}

(7)

Where d_sp denotes the distance from sound source r_s to microphone position r_p, and d_ip denotes the distance from focal point r_i to microphone position r_p. Therefore, the formulation of beamforming power for sources in FOV is separated into three parts; the source power, propagation effects and beamforming weights, and the microphone distribution. For arrays with a fixed number of microphones and constant beamformer coefficients, S(r_i,r_s) only depends on the exponential terms averaged over all microphone pairs, which is directly related to the microphone positions and source signal frequencies. For the case where a signal source is located at the beamformer focal point, r_s = r_i, the arguments of the exponents are all 0, and the signal is enhanced by the coherent addition of complex exponential terms. Sources not located at the focal point, r_s ≠ r_i, will have reduced power due to the incoherent phases of exponential terms. The objective in selecting a microphone distribution is to minimize the average value of the exponential terms in Equation (7) when r_s ≠ r_i while maximizing the average when r_s = r_i for all possible target and noise positions in the FOV. Since summations will always be maximized when r_s = r_i (exponential arguments are all zero), the optimization strategy can be reduced to minimize the maximum value of S(r_i,r_s) when r_s ≠ r_i for all r_s and r_i in the FOV. A metric based on this notion is the Mainlobe-to-peak-sidelobe Ratio (MPSR), which is used in later simulations to assess performance.

Equation (7) identifies the phase terms responsible for minimizing the power gain when r_s ≠ r_i, and is related to the source wavelength and the DPD distribution over all (p,q) microphone pairs, given by:

Δ_{pq} (r_{i}, r_{s}) = (d_{sq} - d_{sp}) + (d_{ip} - d_{iq}),

(8)

where r_i is the focal point of beamformer (target position), and r_s is the interfering source position. Note that Δ_pq(r_i,r_s) is the exponential argument in Equation (7) without the wavelength scaling. Take the Array 1 in Figure 2 as the example, DPD from the right microphone pair to the sources is defined as (d₁-d₂) + (d₃-d₄).

Ideally, if the DPDs of a given microphone geometry and wavelength result in the complex exponential arguments distributed uniformly from-π to π over all pairwise microphones, the expected power is zero when targeting r_i[16]. That is to say, in order to minimize gains for the interference/noise sources (r_s ≠ r_i), the corresponding DPDs should be distributed as widely as possible relative to the source wavelength (incoherence). For the case of beamforming at the source, all the phase terms in Equation (7) will be close to zero (coherent), and result in a maximum power gain in the target position. Even if the sound source localization errors result in small dislocations between the true target source position and the beamformer focal point, $r_{s} = r_{i} + Δ r_{error} \approx r_{i}$ , as long as the DPD variance is much smaller than the wavelengths of significant speech signal frequencies, the phases of exponential arguments are still limited to a small range and result in significant coherent sums. Therefore, Equation (7) demonstrates the impact of the DPD distribution over all microphone pairs on the array's ability to enhance target and suppress noise signals. The optimal microphone geometry should provide a widely spread and even distributed DPDs relative to the source wavelengths for the noise source positions to decorrelate the noise from target signals. Statistics assessing the uniformity of DPD distributions are proposed in the next section as the novel geometry descriptors to explain the variations of array beamforming performance, especially for irregular arrays.

3 Proposed geometry descriptors

Analysis in previous section suggests a correlation between array beamforming gains and DPD distributions. This section proposes several geometric characterizations applicable to irregular arrays and related to array performance. In addition, descriptors for regular arrays, such as the aperture size and microphone spacings, will be generalized for irregular array geometries.

The array centroid offset is defined as the distance between array focal point r_i = (x_i,y_i,z_i) and the centroid of array elements given by:

L = \sqrt{{(x_{0} - x_{i})}^{2} + {(y_{0} - y_{i})}^{2} + {(z_{0} - z_{i})}^{2}},

(9)

where r₀ = (x₀,y₀,z₀) denotes array centroid:

r_{0} = (x_{0,} y_{0,} z_{0}) = (\frac{1}{P} \sum_{p = 1}^{P} x_{p}, \frac{1}{P} \sum_{p = 1}^{P} y_{p}, \frac{1}{P} \sum_{p = 1}^{P} z_{p}),

(10)

where P is the number of microphones and r_p = (x_p,y_p,z_p) denotes the position of the p^th microphone. Array dispersion, analogous to the aperture size, is a measure of average microphone spread about the centroid, computed by:

a = \sqrt{\frac{1}{P} \sum_{p = 1}^{P} [{(x_{p} - x_{0})}^{2} + {(y_{p} - y_{0})}^{2} + {(z_{p} - z_{0})}^{2}]}

(11)

Note that L and a can be applied to characterize both regular and irregular geometries, as shown in Figure 1. For regular arrays a directly impacts resolution (mainlobe width MLW), and determines the microphone spacing in conjunction with P, which affects sidelobe behavior. The distance L indicates whether sound sources are effectively located in the near-field (small L for immersive application), or far-field (large L), where the terms small and large are used relative to the source wavelengths. However, as illustrated by the examples in the introduction these descriptors are limited in their ability to explain the beamforming behavior when additional degrees of freedom are allowed as in the case of irregular arrays. Therefore, additional descriptors involving DPD distribution for all microphone pairs to points in the FOV are proposed as metrics.

From the analysis of Section 2 a limited DPD distribution increases the likelihood of unexpected coherence at non-target locations, especially when DPDs are less than a quarter wavelengths at significant signal frequencies. DPD distributions can be examined via histograms or characterized with various statistics. One potentially useful statistic is the standard deviation of the DPDs over all microphone pairs. In [16, 17] closed form expressions were presented for the expected value of the exponential terms in Equation (7). For a normal DPD distribution over all microphone pairs, the expected value of the exponential term is given by:

\begin{matrix} E [exp (j 2 π (\frac{Δ_{pq} (r_{i}, r_{s})}{λ}))] \\ = exp (- 2 {(π \frac{σ_{Δ} (r_{i}, r_{s})}{λ})}^{2}), \end{matrix}

(12)

where σ_Δ represents the DPD standard deviation. If the DPDs are uniformly distributed, the expected value becomes

\begin{matrix} E [exp (j 2 π (\frac{Δ_{pq} (r_{i}, r_{s})}{λ}))] \\ = s i n c (π \frac{\sqrt{12} σ_{Δ (r_{i}, r_{s})}}{λ}) . \end{matrix}

(13)

In both cases the expected value of the exponential terms approaches zero for increasing σ_Δ. When r_i = r_s, the DPDs are zero for all microphone pairs resulting an a DPD variance of 0. Thus, the scaling provided by the DPD exponential factor of Equation (7) is at a maximum of 1, which is desired when the source and focal point are identical. Consistent with previous conclusion, the more widely spread of DPDs (largeσ_Δ), the better ability of the array to extract target signal at r_i and decorrelate signals from noise source at r_s. Therefore, standard deviation is applied as an effective measure that can describe performance of irregular arrays. For particular focal and noise source locations, the DPD standard deviation is computed as:

σ_{Δ} (r_{i}, r_{s}) = \sqrt{\frac{1}{P^{2}} \sum_{p = 1}^{P} \sum_{q = 1}^{P} {(Δ_{pq} (r_{i}, r_{s}))}^{2}},

(14)

In addition, with the same standard deviation, the expected value of the exponential terms approaches zero for decreasing λ, representing better noise suppressing ability for the signals in high-frequency bands. Wider spread of DPDs is needed to decorrelate the signal source with low frequencies, such as male voice. In this article, in order to focus on the impact of DPDs derived from array geometry, colored noise generated by SII mode is applied as the excitation of the simulations to compute performance metrics.

From Equation (12) and (13), different DPD distributions can also impact the incoherence level of beamforming. Figure 2 provides a real case example of linear arrays. Figure 2(a) shows two linear arrays in a planar FOV with microphone positions denoted by O markers. Two sound sources represented by X markers are located in the FOV, while one source is considered as the target (focal point of beamformer) and the other is the noise source. Colored noise from each source is recorded separately, and the received signals of microphones are normalized by the average rms value over all channels and superimposed. The Signal-to-noise Ratio (SNR) is computed as the power ratio of beamformed signal from target source over that from noise source. The DPD histograms of both arrays are shown in Figure 2(b), (c), respectively. The beamforming SNR results are provided in Table 1. An analogous simulation of the array recording was also performed and presented in Table 1. For both the real and simulated recordings it can be seen that although these two arrays have the same σ_Δ, array 2 shows a 2 - 3 dB SNR improvement over array 1 for both targets due to the reason that array 2 provides a more uniform DPD distribution over the source spectrum, thus demonstrating a need for another statistic related to DPD diversity. In this article Pielou's evenness index [18], which is a normalized Shannon entropy, is introduced to numerically assess the diversity of DPD distribution as:

\begin{array}{l} J (r_{i}, r_{s}) = \frac{H (r_{i}, r_{s})}{H_{\max} (r_{i}, r_{s})} = \frac{- \sum_{k = 1}^{K} (p_{k} ln p_{k})}{- \sum_{k = 1}^{K} (\frac{1}{K} ln (\frac{1}{K}))} \\ = \frac{- \sum_{k = 1}^{K} (p_{k} ln p_{k})}{ln K}, \end{array}

(15)

where K is the total number of DPD bins for the histogram estimate, and p_k is the percentage of DPDs within the k^th bin, H(r_ir_s) is the Shannon entropy, and H_max(r_ir_s) is the maximum possible entropy for the given number of bins, which represents a ideal uniform distribution of DPDs. This normalization avoids the variations from different ranges of DPD distributions and different numbers of microphones. Note that, the DPD range is binned by constant intervals whose size should be associated with the quarter wavelengths of significant signal frequencies to result in reasonably smooth histograms of DPDs related to the incoherent level of phase terms of beamforming gain. For the results in this article, bin size is set to 0.1m , which is less than a quarter wavelength of the important frequency band around 800 Hz for male voice intelligibility [19].

Table 1 SNR results of linear arrays

Full size table

Therefore, four geometry descriptors {L, a, σ, J} are proposed to characterize both regular and irregular microphone distributions and study their impact on array beamforming performance. As summarized in Table 2, these descriptors depend on various geometric aspects of the application environment. Descriptors {L, a} are related to microphone coordinates or beamforming focal point. They are usually applied together as a basis for comparing similar arrays. The descriptors {σ, J} can vary with each array geometry instance and also depend on the characteristics of possible target and noise source distributions. This dependency brings the expectation of stronger correlation with array performance based on different acoustic scenes. In the next section, these proposed geometry descriptors are applied together to characterize different stochastic array geometries, and their relationships with key performance metrics of three-dimensional beam pattern are analyzed with Monte Carlo simulations.

Table 2 Dependencies of geometry descriptors

Full size table

4 Numerical simulations

4.1 Experimental setup

This section applies Monte Carlo experiments to evaluate the performance of irregular arrays based on the proposed geometry descriptors. The simulation flow chart is shown in Figure 3. The FOV is a 10 × 10 × 2m room and microphone positions are randomly generated with a uniformly distribution on the ceiling plane. Then the microphone coordinates are shifted and scaled to obtain the desired array centroid and dispersion. Array centroid values range from the center of ceiling to the edge at 1m intervals along x-axis, while five levels of dispersion are applied with each centroid. For each combination of centroid offset and dispersion level, 300 independent array distributions are generated by Monte Carlo experiments. The 3D beam pattern of each array is obtained by moving a sound source with constant power over all spatial points in FOV, while the focal point is fixed in the center of room. The DSB output power is computed for the source at each spatial point to form the 3D beam pattern.

As shown in Figure 3, two metrics are applied to assess array performance: (MLW) associated with resolution, and (MPSR) associated with noise suppression ability. In this article, the size of mainlobe is characterized by the dimensions of the surface consisting of spatial points with gains 3 dB below that at the focal point (maximum gain). Let x_δ, y_δ, and z_δ denote the projections of the 3dB mainlobe contour onto the x, y, and z axes, respectively. The MLW can then be expressed as:

B_{3 dB} = \sqrt{(x_{δ}^{2} + y_{δ}^{2} + z_{δ}^{2})} .

(16)

Let S(r_i,r_s) denote the power gain of the beamformer focused on r_i with a unit power source at r_s. The MPSR for a beamformer focused on r_i can be expressed as:

Γ_{i} = \frac{S (r_{i}, r_{i})}{max_{r_{s} \notin ML} [S_{o} (r_{i}, r_{s})]},

(17)

where S_o(r_i,r_s) denotes the local maxima of S(r_i,r_s)outside the 3dB mainlobe region (ML) in FOV. This metric represents the worst case leakage. In the Monte Carlo experiments, the maximum sidelobe level (denominator) is measured from the maximum local peak of gain pattern outside ML. Because there is normally a tradeoff between B_3dB and Γ_i, the common criterion to decide the optimal array beam pattern is to limit the MLW to a tolerable spatial resolution and maximize the MPSR in FOV. For given number of microphones, increasing dispersion about array centroid tends to result in higher sidelobe levels while sharpening the mainlobe. However, for a given class of randomly generated arrays with fixed centroid and dispersion, the sidelobe levels will also vary based on the DPD statistics.

In addition to array geometry, the signal frequency and the number of microphones are critical factors impacting performance. To make results more reflective for the performance where the primary sources are speech, the sound sources consist of colored noise with a spectrum equivalent to the band importance function from the SII, which emphasizes the frequency bands most important to human understanding of speech [5]. Because the impact of each geometry descriptor also depends on microphone number, irregular arrays with 16, 25, 36, 49 and 64 microphones are examined with comparable regular arrays and logarithmic arrays. The logarithmic array consists of three superimposed regular subarrays used for octaves from 800Hz to 3200Hz to generate a relative uniform frequency response over the important frequency bands. Statistical analyses of simulation results are presented in the next section to assess the impacts of proposed geometry descriptors and demonstrate their relationship with performance metrics in immersive or near-field applications.

4.2 Results and discussion

Plots from Monte Carlo simulations are presented to reveal relationships between each geometry descriptor and performance metrics. Figures 4, 5, 6 and 7 present the geometry descriptors versus MLW and MPSR, where the error bars span ± 1 standard deviation about the mean. For comparison sake a regular planar array and logarithmically spaced array with the same geometry descriptors are also marked in the figures.

Figure 4 indicates the impact of centroid offset on array performance. From Figures 4(a)(b), it can be seen that for fixed array dispersion, increasing the centroid offset increases the MLW and reduces MPSR, representing degradation of array performance. The standard deviation of MLW increases with the growing of centroid offset, while ±1dB variance of MPSR is observed for each centroid offset value with fixed dispersion. Logarithmic arrays show much larger increases in MLW than regular and irregular arrays because the microphone density is high near array centroid causing a longer mainlobe in the direction of the offset. Although better MPSR can be observed for logarithmic arrays with large centroid offset, it does not necessarily represent superior ability to suppress non-target sources. The lower sidelobe levels are primarily the result of FOV being included in a huge mainlobe. Therefore, logarithmic array has a major limitation on target space, and cannot adjust well to focal points away from array centroid. Figures 4(c)(d) show variations of performance metrics along centroid offset when dispersion is fixed at a small value. For the centroid offset values below 2.5m, the trends of MLW and MPSR over centroid offset levels are as expected with more sensitivity for arrays with smaller dispersion (microphones closer together on average) when compared to Figures 4(a)(b). For the centroid offset values beyond 2.5m (exceeding five times that of the dispersion), the MLW becomes very large relative to the size of FOV. The apparent improvement in the MPSR after this is artifactual because the mainlobe dominates the FOV pushing the significant sidelobes outside the FOV. The observed high MPSR values, therefore, cannot be associated with superior beamforming performance when the centroid offset is large relative to the dispersion. In every case there is a significant portion of randomly generated arrays that perform better than the logarithmic and regular arrays as seen by their marker positions relative to the standard deviation range of the irregular arrays.

Figure 5 presents the impact of array dispersion for a fixed centroid at the center of ceiling. It can be noted that small dispersions result in better MPSR for all geometries (closer average spacings between microphones); however, most of the irregular arrays perform better than either the regular or logarithmic arrays. With the centroid offset fixed, when array dispersion increases in the horizontal microphone plane, the MLW decreases along the horizontal direction; however, the MLW along vertical direction grows. This phenomenon is illustrated in Figure 6. When moving microphones away from the array centroid/target, the differential distances from each microphone to target point and the nearby locations reduce, resulting in higher coherent power for these points in Z-direction, thus extending the mainlobe. The sensitivity of these variations to dispersion is inversely related to the centroid offset. As the centroid offset becomes large relative to the dispersion, beamforming on a focal point is not practical (no longer an immersive environment). The array takes on more characteristics of a far-field array where the vertical direction MLW is so large that one only considers the angle or look direction instead of a focal point. In summary, for a fixed number of microphones there is tradeoff between MLW and MPSR that is dependent on the dispersion, as would be expected given the similarities between dispersion and aperture. In addition, by inspecting the standard deviation of error bars along each level of dispersion when array centroid is fixed, it can be seen that the variance of MLW increases with growing dispersion. A MPSR variance of ±1 ~ 1.5 dB is observed for each dispersion level with fixed centroid. Therefore, additional geometry descriptors based on the DPD distribution are expected to explain part of these variations of array performance.

Results in Figures 4 and 5 demonstrate the impact of geometry descriptors related to aperture and array distance from focal points, which are largely consistent with the expectations. In all cases a portion of the randomly generated irregular arrays was superior to the regular arrays. In order to resolve between classes of irregular arrays, the following paragraphs analyze geometry descriptors based on DPD statistics with fixed centroids and dispersions, and demonstrate their ability to identify classes of irregular geometries with similar performance properties.

For a fixed centroid offset and dispersion, Figure 7 shows a relationship between array geometry DPD statistics and performance. Figure 7(a)(b) presents the results for the arrays with similar centroid offset and dispersion values, while Figure 7(c)(d) presents arrays with small centroid offset and large dispersion. The results for the regular and logarithmic arrays are also plotted for reference. Figures 7(a)(b) demonstrate that larger DPD standard deviations and Pielou's evenness indices result in improved MPSR. These results are consistent with theoretical analysis indicating that wider and more evenly distributed DPDs create more incoherence in the phase terms of Equation (7) and suppress noise better. Pielou's evenness index shows more sensitivity to the MPSR than the standard deviation, primarily because with a fixed dispersion, the standard deviation has limited range. Note that the relative performance of logarithmic array in Figure 7(a) shows it with a very high standard deviation but not consistent with the trends of the irregular array, while for Pielou's index the MPSR of both the regular and logarithmic array are more consistent with irregular array performances.

When the array dispersion becomes much larger than centroid offset in Figure 7(c)(d), improvements of MPSR with increasing standard deviation or Pielou's index are not as dramatic. That is because arrays with large dispersion and small centroid offset typically generate a large DPD distribution spread (demonstrated by the increasing range of DPD standard deviation in Figure 7(c)) extending over many wavelengths in the useful frequency range. In these cases, Pielou's evenness index does not correlate as well with the beamforming gain as in Figure 7(b) because the 2π modularity of the exponential argument. For a frequency of interest, the DPDs scaled by the wavelength are mapped to the [−π, π] range by the modulo operation. The evenness index can be computed after this operation for frequency specific measures related to beamforming gains. In addition, results of Figure 7(c)(d) show that almost any irregular distribution will perform better than the regular geometry, and approximately 50% will perform better than the logarithmic array. Also, the relative performance of regular and logarithmic arrays is more consistent with the trends of the irregular array according to Pielou's evenness index than to standard deviation.

When the centroid offset becomes larger than three times of dispersion, the array takes on more characteristics of a far-field application. These cases do not fit with the primary focus of this analysis for immersive environments. The DPD variations are limited and inappreciable over the FOV relative to the signal wavelength and large centroid offset (indicated by the observed dropping range of Pielou's evenness indices). Variations in the microphone distributions will have little impact on performance, unlike for near-field applications. Centroid offset becomes the dominating factor affecting array beamforming performance, and the behavior of microphone array approaches the behavior of a single element in these far-field cases.

The results analyzed above demonstrate the impact of DPD distribution on array beamforming performance. Geometry descriptors based on the statistics of DPD distribution show a correlation with array performance when the focal points and microphone distributions are typical for immersive or near-field applications. These DPD statistics explained the variations in performance when array centroid offset and dispersion were fixed. For a fixed number of microphones, increases in dispersion improved resolution, but degraded noise suppression, while increases in centroid offset degraded both of these performance metrics. However, as shown in Figure 7, with fixed centroid and dispersion, ± 0.5 ~ 1dB variances of performance metrics are observed for each bin of DPD statistics. Although these variations of performance metrics partly result from the quantization errors of DPD statistics, other geometry parameters may exist that can further reduce these variations.

To further investigate the significance of the proposed geometry descriptors' impact on performance, Analysis of Variance (ANOVA) is applied, which is useful for investigating the effect of independent factors on observations [20]. The performance metric variation is partitioned into portions attributed to the effect of independent factor (between-group variation) and portions attributed to random error (within-group variation). An F statistic is computed using the ratio between these variances and tested for significance. Tables 3 and 4 show the three-way ANOVA results for MLW and MPSR values, respectively. Centroid offset, dispersion, DPD statistics, and their interactions are considered as the independent factors impacting the performance metrics. By examining the results, it can be seen that the p values for these three geometry descriptors and their interactions are all highly significant (all less than 0.01) for their impact on MLW and MPSR. In addition, high R² values indicate that 99.7% of the variation in MLW data can be accounted for by these independent factors, so does 82% data of MPSR. Therefore, it is demonstrated that proposed geometry descriptors, including centroid offset, dispersion and DPD statistics, have strong correlations with array performance.

Table 3 Three-way ANOVA results of MLW

Full size table

Table 4 Three-way ANOVA results of MPSR

Full size table

Finally, through statistical analysis and ANOVA the relationships between proposed geometry descriptors and array performance are established and demonstrated. However, because the number of microphones determines the number of DPDs, the impact of each geometry descriptor varies with the number of microphones. In order to analyze these differences, data collected from Monte Carlo experiments of irregular arrays with 16, 25, 36, 49 and 64 microphones are compared. All the experiments were performed in immersive environments with comparable values of centroid offset and dispersion. Table 5 provides the R² results of least squares method by fitting general linear model (GLM) of selected geometry descriptors on MPSR. It is noted that even with this simplest regression model, over 50% variation of MPSR can be accounted for by GLM{a, L, σ, J}. This percentage increases to 70% ~ 90% when applying higher-order fitting functions of geometry descriptors (nonlinear regression models). With increasing microphone number, better R-Square values are obtained.

Table 5 R ² results for GLMs of geometry descriptors on MPSR

Full size table

By comparing the results of GLM{a, L} derived from array apertures and positions with GLM{a, L, σ, J} taking account of DPD distributions' impact, at least 10% improvements of R² values are observed. Especially for the arrays with microphone density larger than 0.5 mic/m², the impact of {a, L} is greatly reduced due to the increasing possibilities of microphone arrangements with fixed centroid and dispersion, while the DPD statistics show stronger correlation with array performance. Furthermore, by comparing the trends of R² values of GLM{a, L, σ} and GLM{a, L, J} with increasing microphone number, DPD standard deviation assessing the spread of DPD distribution shows a little stronger correlation with MPSR for arrays with microphone density less than 0.2 mic/m², while Pielou's evenness index assessing the diversity of DPD distribution has greater impact on MPSR for array with density larger than 0.2 mic/m². The reason for this phenomenon is that low microphone density cannot provide enough DPD samples to measure the entropy (Pielou's evenness index), and DPD standard deviation representing the average spread of DPDs about zero is more reflective for characteristics of the DPD distribution related to the beamforming gain.

5 Conclusions

This article analyzes and identifies important characteristics for irregular microphone arrays that directly related to beamforming performance. Combined with descriptors analogous to traditional geometry parameters for regular arrays (i.e. array centroid and dispersion), novel geometry descriptors involving DPD statistics describe both regular and irregular arrays. Simulations demonstrated that irregular microphone geometries typically exceed the performance of regular geometries, and arrays with high DPD entropy and wide DPD spread correspond to arrays with better noise suppression ability. These results are primarily applicable for microphone arrays in near-field applications, such as in immersive environments.

The relationships between geometry descriptors and beamforming performance developed in this article can be applied directly as the objective functions in optimization procedures to find appropriate microphone distributions for given acoustic environments [7]. The results of this article were based on Monte Carlo experiments with planar microphone distributions, which are more applicable for indoor applications, such as audio surveillance systems. So far, the DPD statistics do not have simple geometric interpretations and must be computed based on all the microphone positions and desired focal points. While these statistics can easily be computed once a geometry is proposed, they cannot directly be used in closed-form analysis and optimizations. Other more direct geometric metrics as they relate to good values of proposed DPD statistics will be needed to guide ad-hoc microphone placements. Future work involving related closed-form relationships between geometry descriptors and key performance metrics could provide a simple and feasible solution for the optimization problems of microphone arrays.

References

Rabinkin D, Renomeron R, French J: Optimum sensor placement for array sound capture. Proceedings of SPIE 1997, 3162: 227-239.
Article Google Scholar
Benesty J, Chen J, Huang Y: Microphone Array Signal Processing. 1st edition. Edited by: Benesty J, Kellermann W. Springer, Berlin Heidelberg; 2008:39.
Google Scholar
Shanan S, Pomalaza-Raez C: The use of nonuniform element spacing in array processing algorithms. J. Acoust. Soc. Am 1989, 86: 1416-1418.
Article Google Scholar
Schjaer-Jacobsen H, Madsen K: Synthesis of nonuniformly spaced arrays using a general nonlinear minmax optimization method. IEEE Trans. Antennas Propagat 1976, AP-24: 501-506.
Article Google Scholar
Townsend P: Enhancements to the generalized sidelobe canceler for audio beamforming in an immersive environment, MS thesis. Department of Electrical Engineering, University of Kentucky; 2009.
Google Scholar
Brandstein M, Ward D: Microphone Arrays Signal Processing Techniques and Applications. Edited by: Lacroix A, Venetsanopoulos A, Brandstein M, Ward D. Springer, Berlin Heidelberg New York; 2001:3.
Google Scholar
Yu J, Donohue K: Performance for randomly described arrays. in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics; 2011.
Book Google Scholar
Meyer J, Elko G: A highly scalable spherical microphone array based on an orthonormal decomposition of the sound field. Paper presented at the IEEE ICASSP-02, Orlando, Florida, USA; 2002.
Google Scholar
Daniel J, Nicol R, Moreau S: Further investigations of high order ambisonics and wavefield synthesis for holophonic sound imaging. 114th Convention of Audio Engineering Society; 2003.
Google Scholar
Moffett A: Minimum-redundancy linear arrays. IEEE Trans. Antennas Propagat 1968, AP-16: 172-175.
Article Google Scholar
Chen W, Bar-Ness Y: Minimum redundancy array structure for interference cancellation. Antennas and Propagation Society International Symposium 1991, 1: 121-124.
Google Scholar
Pillai S, Bar-Ness Y, Haber F: A new approach to array geometry for improved spatial spectrum estimation. Proc Antennas and Propagation Society International Symposium IEEE 1985, 73: 1522-1524.
Google Scholar
Townsend P, Donohue K: Beamfield analysis for statistically described planar microphone arrays. in IEEE Southeastcon; 2009.
Book Google Scholar
Li ZB, Yiu KFC, Feng ZG: A hybrid descent method with genetic algorithm for microphone array placement design. Applied Soft Computing, Applied Soft Computing; 2012.
Google Scholar
Feng ZG, Yiu KFC, Nordholm SE: Placement design of microphone arrays in near-field broadband beamformers. IEEE Trans. on Signal processing 2012, 60(3):1195-1204.
Article MathSciNet Google Scholar
Donohue K, McReynolds K: A Ramamurthy, Sound source detection threshold estimation using negative coherent power. In IEEE Southeastcon; 2008.
Google Scholar
Donohue K, SaghanianNejadEsfahani S, Yu J: Constant false alarm rate sound source detection with distributed microphones. EURASIP J. Adv. Signal Process 2011. 10.1155/2011/656494
Google Scholar
Pielou E: The measurement of diversity in different types of biological collections. J. Theoret. Biol 1966, 13: 131-144.
Article Google Scholar
NTI Audio: Introduce to speech intelligibility. 2008.http://www.nti-audio.com/Portals/0/data/en/NTi-Audio-AppNote-AL1-Introducing-STIPA.pdf . Accessed 2011
Google Scholar
SAS Institute Inc: SAS/STAT(R) 9.22 user's guide. 2010.http://support.sas.com/documentatio n/cdl/en/statug/63347/HTML/default/viewer.htm#statug_anova_sect022.htm . Accessed 2011
Google Scholar

Download references

Acknowledgments

This study was supported in part through funding from the FBI Academy.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, 40506, USA
Jingjing Yu & Kevin D Donohue

Authors

Jingjing Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kevin D Donohue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingjing Yu.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Yu, J., Donohue, K.D. Geometry descriptors of irregular microphone arrays related to beamforming performance. EURASIP J. Adv. Signal Process. 2012, 249 (2012). https://doi.org/10.1186/1687-6180-2012-249

Download citation

Received: 09 March 2012
Accepted: 10 September 2012
Published: 27 November 2012
DOI: https://doi.org/10.1186/1687-6180-2012-249

Geometry descriptors of irregular microphone arrays related to beamforming performance

Abstract

1 Introduction

2 Problem formulation

3 Proposed geometry descriptors

4 Numerical simulations

4.1 Experimental setup

4.2 Results and discussion

5 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ original submitted files for images

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords