Skip to main content

Acoustic source localization in mixed field using spherical microphone arrays

Abstract

Spherical microphone arrays have been used for source localization in three-dimensional space recently. In this paper, a two-stage algorithm is developed to localize mixed far-field and near-field acoustic sources in free-field environment. In the first stage, an array signal model is constructed in the spherical harmonics domain. The recurrent relation of spherical harmonics is independent of far-field and near-field mode strengths. Therefore, it is used to develop spherical estimating signal parameter via rotational invariance technique (ESPRIT)-like approach to estimate directions of arrival (DOAs) for both far-field and near-field sources. In the second stage, based on the estimated DOAs, simple one-dimensional MUSIC spectrum is exploited to distinguish far-field and near-field sources and estimate the ranges of near-field sources. The proposed algorithm can avoid multidimensional search and parameter pairing. Simulation results demonstrate the good performance for localizing far-field sources, or near-field ones, or mixed field sources.

1 Introduction

Acoustic source localization using microphone arrays has many applications, such as video conferences, intelligent systems, and robotics. It has received great attention since almost four decades [1, 2]. In most of array signal processing applications, the wavefront is assumed to be planar, that is, all the sources are located in the far-field (FF) of an array. In this case, the parameter that characterizes a source location is its direction of arrival (DOA) [2]. In the near-field (NF) of an array, the range information should be integrated into the array signal model for accurately characterizing sources [3]. Although plane wave assumption can simplify the modeling and processing, it cannot hold in near-field applications and results in analysis errors. Moreover, in some practical applications, the signals collected by microphone arrays are often the mixture of far-field and near-field sources. Each source may be located in the near-field or far-field of an array [49]. The localization methods for the mixed field sources should discriminate far-field and near-field sources. Then for far-field sources, they only estimate DOA information. For near-field sources, range information is also estimated.

If an acoustic source locates in three-dimensional (3D) space, its position information is jointly described by range and bearing (azimuth and elevation). The geometric structure of a microphone array is very important for the localization performance. Currently, most localization techniques used a uniform linear array (ULA) or a uniform circular array (UCA) [1, 2, 57, 911] to estimate source positions. Planar arrays, such as cross array and uniform rectangular array (URA) are the straightforward extensions of the ULA and can estimate both azimuth and elevation [8, 12, 13]. ULAs cause a 180° ambiguity in the azimuth estimation. UCAs can provide 360° azimuthal coverage due to its circular symmetry in the azimuth plane. The main drawback of planar arrays including UCAs is that they provide a smaller aperture in the elevation plane compared to the azimuth plane, resulting in poor estimation of elevation angles [10]. Some arbitrary array configurations were investigated to localize sources [1417]. They were different from array uniformity that traditional localization approaches require. The array structure was selected according to some specific practical applications. Spherical microphone arrays have 3D symmetrical geometry configuration and can capture higher order sound field information. The 3D structure advantage facilitates more accurate sound source localization. Moreover, they can be analyzed within the mathematical framework of the spherical Fourier transform (SFT) which greatly simplifies processing in the space domain. Therefore, they have received considerable attention and have a wide variety of applications in the fields of source localization, beamforming, and acoustic analysis [1820]. In this paper, we aim to develop a novel algorithm able to accurately estimate the locations of mixed field sources using spherical arrays.

Many techniques were proposed to estimate DOAs of multiple acoustic sources. Multiple signal classification (MUSIC) and estimating signal parameter via rotational invariance techniques (ESPRIT) are two subspace techniques [21, 22]. The latter avoids multidimensional search in the parameter space. Goossens and Rogier proposed a unitary spherical ESPRIT algorithm based on the spherical phase-mode excitation that yielded accurate estimates with low computational complexity [23]. The eigenbeam (EB)-ESPRIT algorithm for spherical arrays was presented in [24] with its performance analysis for robust localization in reverberant environments. It only exploited the relation between a fixed order of spherical harmonics. Many approaches were based on beamforming [22]. Argentieri and Danes proposed an online beamspace MUSIC method with a beamforming scheme to localize sound sources in robotics [25]. Sun et al. proposed several steered beamformer-based and subspace-based localization techniques in the spherical EB domain [26]. They localized early reflections in room acoustic environments. Wu et al. used sparse recovery to localize sources and formulate super-resolution beamforming in the spherical harmonic domain [27]. However, when a source is close to the spherical array, the array signal model based on the far-field assumption is no longer valid. Independent component analysis (ICA) was used to estimate source locations. It employed higher order statistics and directly identified basis vectors containing the source location information. It was applied in near-field or far-field [28]. The ICA-based method was used to estimate DOAs for spherical microphone arrays [29, 30]. It fails to localize sources which are not statistically independent. Source localization can be considered as an overcomplete basis representation problem using a grid of spatial locations. Many sparse recovery methods were used to estimate source DOAs [31, 32]. If the source is in 3D space, the number of basis is large and the computational complexity is high. Some approaches assumed that one source was dominant over the others in some time-frequency zones [11, 33]. They extended the single-source DOA algorithm over these zones to estimate multiple source locations. They were based on the sparse representation of the observation signals in the time-frequency domain.

In many practical applications, the observations collected by an array may be either mixed far-field and near-field signals or multiple far-field signals or multiple near-field sources. Most of the above techniques localize sources in far-field or near-field. In recent years, source localization in mixed near-field and far-field has been developed using MUSIC algorithm [57], ESPRIT-like technique [8], or sparse signal reconstruction method [4] based on a linear array. Jiang et al. proposed a 3D source localization algorithm with a cross array [9]. First, the elevation angles are obtained based on the generalized ESPRIT method. Similar to the root MUSIC method, the range parameters are estimated with the elevation estimates. Finally, a MUSIC pseudo-spectrum function is used to get the azimuth angles with the elevation and range estimates. Due to the 3D symmetrical structure, spherical arrays have been widely used in far-field source localization [23, 24]. A spherical microphone array was used in the near-field, and a new close-talking microphone array was proposed in [34]. It can adaptively compensate for the distance and orientation of a near-field source. Fisher and Rafaely presented a near-field spherical microphone array and defined the near-field criterion in terms of the array order and radius [35]. They analyzed spherical microphone array capabilities in the near-field and designed a radial filter discriminating the distances between the sources incident from the same direction [36]. Although the aforementioned work considers the near-field processing of the spherical microphone array, near-field or mixed field source localization via a spherical microphone array has not yet been studied. Based on the recurrent relation of the spherical harmonics, only DOAs were estimated for mixed field sources simultaneously [37]. However, how to distinguish near-field and far-field sources and how to estimate the ranges of near-field sources were not considered.

The aim of our work is to develop a new method that is able to localize mixed far-field and near-field sources simultaneously using spherical arrays in free-field environment. It can avoid the parameter pairing problem and complex multidimensional search. Three-dimensional MUSIC method scans the azimuth, elevation, and range parameter space and brings very high computational complexity. Therefore, it is not practical for direct source localization in 3D space. First, we construct the mixed near-field and far-field array signal model in the spherical harmonics domain. The mixed steering matrix in the spherical harmonics domain only contains the source DOA and range information. Moreover, the DOA and range information is decoupled. Exploiting the recurrent relation between spherical harmonics, we extend the spherical ESPRIT method to simultaneously discriminate directions of multiple far-field and near-field sources. This avoids two-dimensional parameter space search and the azimuth and elevation pairing. Based on the estimated DOAs, the ranges of near-field sources can be easily obtained using the one-dimensional MUSIC algorithm with high resolution.

The remainder of this paper is organized as follows: Section 2 introduces mixed field array signal model in the spherical harmonics domain. A two-stage source localization with spherical arrays is developed in Section 3. Simulation results in Section 4 are presented to demonstrate the performance of the proposed algorithm. Conclusions are given in Section 5.

2 Array signal model for mixed field sources

To clarify the notations, scalars are denoted as italic letters (a, b, A, B, …), column vectors as lowercase boldface letters (a, b, …), and matrices as boldface capitals (A, B, …). The superscripts T, , and H denote transpose, complex conjugation, and conjugate transpose, respectively. diag() defines a diagonal matrix and arg() calculates the phase.

The spherical coordinate system is used to describe the positions of sensors and source signals in 3D space shown in Figure 1. A total of L identical and isotropic sensors mount on the rigid spherical surface with radius R. Each sensor element is unambiguously defined by its elevation θ l and azimuth ϕ l (l = 1, 2, …, L), measured from the positive z-axis and x-axis, respectively. Thus, R l  = (R, θ l , ϕ l ) describes the sensor position. Consider a point source located at r d  = (r d , ϑ d , φ d ), where r d is the distance measured from the center of the spherical array, and ϑ d and φ d represent the elevation and azimuth of the source, respectively. For the spherical microphone array with an order N, the near-field extent is suggested [35, 36]

Figure 1
figure 1

Spherical coordinate system for localization.

r N N k ,
(1)

where k is the wavenumber. The maximal wavenumber is

k max = N R .
(2)

By combining (1) and (2), the criterion for r d to be in the near-field can be written as

r d < r N k max k R .
(3)

Assuming there are D source signals impinging on the spherical array. The first D1 sources are assumed to be far-field signals, while the remaining D2 = D - D1 sources locate within the near-field extent of the array. In the presence of additive noises, the model in the space-frequency domain is represented as

x k = A F s F k + A N s N k + v k ,
(4)

where x k = x 1 k x 2 k x L k T denotes an observation vector composed of the pressure samples at each sensor at a frequency corresponding to the wavenumber k. v k = v 1 k v 2 k v L k T is a noise vector, and s F k = s 1 k s 2 k s D 1 k T and s N k = s D 1 + 1 k s D 1 + 2 k s D k T are far-field and near-field source signals, respectively. They are assumed to be statistically independent and well separated. A F C L × D 1 and A N C L × D 2 are the corresponding physical steering matrices assuming the array is in free-field. The D sources are incident from DOAs Φ d  = (ϑ d , φ d ), where d = 1, 2, …, D. Range information r d 2 is important only for near-field sources, where d2 = D1 + 1, D1 + 2, …, D. The objective is to estimate the azimuths and elevations for far-field sources and joint azimuth-elevation-ranges for near-field sources.

The localization algorithm is developed under the following assumptions:

  1. 1.

    The number of all sources is known.

  2. 2.

    The incident source signals are statistically independent.

  3. 3.

    The noise is zero-mean, complex circular Gaussian, and spatially uniform white, and is statistically independent of all the signals [6].

2.1 Spherical harmonic representation for spherical array processing

One advantage of spherical arrays is that it can be analyzed within the mathematical framework of the spherical Fourier transform which greatly simplifies processing in the space domain. Any square integrable function on a sphere g(θ, ϕ) can be denoted by g nm using the following SFT:

g nm = 0 2 π 0 π g θ , ϕ Y n m θ , ϕ * sin θ d θ d ϕ ,
(5)

where the integral covers the entire surface of the unit sphere S2, and Y n m θ , ϕ is the spherical harmonic of order n and degree m defined as

Y n m θ , ϕ = 2 n + 1 4 π n - m ! n + m ! P n m cos θ e im ϕ ,
(6)

where P n m cos θ is the associated Legendre polynomial [38]. The corresponding inverse Fourier transform is

g θ , ϕ = n = 0 m = - n n g nm Y n m θ , ϕ .
(7)

The spherical harmonics are orthonormal, i.e., [19],

0 2 π 0 π Y n m θ , ϕ Y n m θ , ϕ * sin θ d θ d ϕ = δ n n δ m m ,
(8)

where δnn = 1 for n = n, and δnn = 0 otherwise. Equation 8 is the orthonormality of continuous spherical harmonics. However, spherical microphone arrays perform spatial sampling of continuous functions defined on a sphere. Spatial sampling, similar to time-domain sampling, requires limited spatial bandwidth (limited harmonic order) to avoid aliasing [39].

Consider the highest order of the spherical microphone array is up to N. Ω l  = (θ l , ϕ l ) denotes the elevation and the azimuth of the l th sensor. Y(Ω) is defined as an L × (N + 1)2 spherical harmonic matrix as follows:

Y Ω = Y 0 0 Ω 1 Y 1 - 1 Ω 1 Y 1 0 Ω 1 Y 1 1 Ω 1 Y N N Ω 1 Y 0 0 Ω 2 Y 1 - 1 Ω 2 Y 1 0 Ω 2 Y 1 1 Ω 2 Y N N Ω 2 Y 0 0 Ω L Y 1 - 1 Ω L Y 1 0 Ω L Y 1 1 Ω L Y N N Ω L .
(9)

According to the inverse transform truncated up to order N in (7), the observations of the spherical microphone array can be expressed in the following form:

x k = Y Ω x nm k ,
(10)

where x nm (k) is a (N + 1)2 × 1 transform coefficient vector in the spherical harmonics domain. In the same way, the noise can be expressed as

v k = Y Ω v nm k ,
(11)

where v nm (k) is a (N + 1)2 × 1 transform coefficient vector in the spherical harmonics domain.

According to the spherical harmonic representation of the sound field [40, 41], when sources are in the far-field of an array, the element aF(l, d1, k) of the far-field steering matrix AF is independent of the source distance and can be expressed using spherical harmonics as

a F l , d 1 , k = n = 0 m = - n n b n kR Y n m ϑ d 1 , φ d 1 * Y n m θ l , ϕ l ,
(12)

where b n (kR) is the normalized far-field mode strength and depends on the sphere boundary [8], l = 1, 2, …, L, and d1 = 1, 2, …, D1. The term aF(l, d1, k) describes the transfer characteristics from the d1th far-field source to the l th sensor. For concise representation, define Φ d  = (ϑ d , φ d ) to denote the elevation and the azimuth of the d th source and define a (N + 1)2 × D1 spherical harmonic matrix including the DOA information of the far-field sources as

Y Φ F = Y 0 0 Φ 1 Y 0 0 Φ 2 Y 0 0 Φ D 1 Y 1 - 1 Φ 1 Y 1 - 1 Φ 2 Y 1 - 1 Φ D 1 Y 1 0 Φ 1 Y 1 0 Φ 2 Y 1 0 Φ D 1 Y 1 1 Φ 1 Y 1 1 Φ 2 Y 1 1 Φ D 1 Y N N Φ 1 Y N N Φ 2 Y N N Φ D 1 .
(13)

BF is defined as a (N + 1)2× (N + 1)2 diagonal matrix consisting of the far-field mode strength b n (kR), i.e.,

B F = diag ( b 0 kR , n = 0 b 1 kR , b 1 kR , b 1 kR n = 1 , , b N kR , , b N kR ) n = N .
(14)

Therefore, by combining (9), (12), (13), and (14), the far-field steering matrix can be represented as

A F = Y Ω B F Y Φ F .
(15)

If near-field sources impinge on the spherical array, the element aN(l, d2, k) of the near-field steering matrix AN can be expressed using spherical harmonics as

a N l , d 2 , k = n = 0 m = - n n b n d 2 kR , k r d 2 Y n m ϑ d 2 , φ d 2 * Y n m θ l , ϕ l ,
(16)

where b n d 2 kR , k r d 2 is the normalized near-field mode strength and depends on the sphere boundary and the source distance r d 2 . Similarly, aN(l, d2, k) represents the transfer characteristics from the d2th near-field source to the l th sensor. The relation of near-field and far-field mode strengths is

b n d kR , k r d = i - n - 1 k b n kR h n 2 k r d ,
(17)

where h n (2)(kr d ) represents the spherical Hankel function of the second kind [35, 36]. Similarly, YN) is defined as a (N + 1)2 × D2 matrix made up of the spherical harmonics of the near-field sources, i.e.,

Y Φ N = Y 0 0 Φ D 1 + 1 Y 0 0 Φ D 1 + 2 Y 0 0 Φ D Y 1 - 1 Φ D 1 + 1 Y 1 - 1 Φ D 1 + 2 Y 1 - 1 Φ D Y 1 0 Φ D 1 + 1 Y 1 0 Φ D 1 + 2 Y 1 0 Φ D Y 1 1 Φ D 1 + 1 Y 1 1 Φ D 1 + 2 Y 1 1 Φ D Y N N Φ D 1 + 1 Y N N Φ D 1 + 2 Y N N Φ D .
(18)

BN is a (N + 1)2 × D2 near-field mode strength matrix:

B N = b 0 D 1 + 1 kR , k r D 1 + 1 b 0 D 1 + 2 kR , k r D 1 + 2 b 0 D kR , k r D b 1 D 1 + 1 kR , k r D 1 + 1 b 1 D 1 + 2 kR , k r D 1 + 2 b 1 D kR , k r D b 1 D 1 + 1 kR , k r D 1 + 1 b 1 D 1 + 2 kR , k r D 1 + 2 b 1 D kR , k r D b 1 D 1 + 1 kR , k r D 1 + 1 b 1 D 1 + 2 kR , k r D 1 + 2 b 1 D kR , k r D b N D 1 + 1 kR , k r D 1 + 1 b N D 1 + 2 kR , k r D 1 + 2 b N D kR , k r D .
(19)

The near-field steering matrix is expressed by combining (9), (16), (18), and (19) as

A N = Y Ω B N Y Φ N ,
(20)

where is the Hadamard product.

2.2 Array signal model in the spherical harmonics domain

By combining (10), (11), (15), and (20) into (4), the array signal model is written as

Y Ω x nm k = Y Ω B F Y Φ F s F k + Y Ω × B N Y Φ N s N k + Y Ω v nm k .
(21)

According to the least squares criterion, the array signal model is constructed in the spherical harmonics domain as

x nm k = Y F Φ s F k + Y N Φ s N k + v nm k ,
(22)

where YF(Φ) = BFYF) and YN(Φ) = BNYN) are the new steering matrix of the far-field and near-field in the spherical harmonics domain, respectively. Equation 22 can be expressed in a compact form as

x nm k = Y FN Φ s FN k + v nm k ,
(23)

where Y FN Φ = Y F Φ Y N Φ C N + 1 2 × D is the new mixed steering matrix in the spherical harmonics domain and s FN k = s F T k s N T k T . In the spherical harmonics domain, the far-field and near-field mixed steering matrix is independent of the element positions of the sampled array. The following localization algorithm in Section 3 is developed based on the array signal model in (23).

3 Acoustic source localization algorithm

The source localization in mixed field aims to estimate 2D parameters {ϑ d , φ d } for far-field sources and 3D parameters {r d , ϑ d , φ d } for near-field sources given the array observations x(k). Based on the spherical harmonic model in (23), the common characteristics of the far-field and near-field sources lie in that only spherical harmonics in the mixed steering matrix contain the DOA information. That is, the mixed steering matrix contains the DOAs of all sources. The difference between these sources is whether mode strength depends on the source distance or not. Therefore, DOAs can be estimated using the recursive relationship of spherical harmonics. Based on the estimated DOAs, near-field range can be easily computed by conventional MUSIC algorithm.

3.1 DOA estimation

In order to exploit the recurrent relation between spherical harmonics and avoid complex search in the 3D parameter space, we develop a spherical ESPRIT-like algorithm, automatically estimating paired azimuth and elevation angles for multiple mixed source signals. Define μ = tan ϑe only containing the DOA information, where ϑ and φ are the elevation and azimuth angles of a source, respectively. According to the recursive relation for the associated Legendre polynomials of the adjacent three degrees (m - 1, m, and m + 1) [38] and the spherical harmonics definition in (6), we get the following relation that the DOA estimation depends on

2 m Y n m ϑ , φ + λ nm + Y n m + 1 ϑ , φ μ + λ nm - Y n m - 1 ϑ , φ μ = 0 ,
(24)

where λ nm ± = n m n ± m + 1 [42]. For a source whether in far-field or in near-field, the recurrent relationship in (24) is independent of the corresponding mode strength b n (kR) or b n d(kR, kr d ). That is, the relation is still satisfied as follows:

2 m b n kR Y n m ϑ , φ + λ nm + b n kR Y n m + 1 ϑ , φ μ + λ nm - b n kR Y n m - 1 ϑ , φ μ = 0 ,
(25)

or

2 m b n d kR , k r d Y n m ϑ , φ + λ nm + b n d kR , k r d Y n m + 1 ϑ , φ μ + λ nm - b n d kR , k r d Y n m - 1 ϑ , φ μ = 0 .
(26)

For a fixed order n, we choose all rows from YFN(Φ) consisting of elements Y n m, m = - n, - n + 1, …, n - 2, to construct a (2n - 1) × D matrix B n (-1), select m = - n + 1, - n + 2, …, n - 1 to construct the second (2n - 1) × D matrix B n (0), and choose m = - n + 2, - n + 3, …, n to construct the third (2n - 1) × D matrix B n (1). When the order n varies from 1 to N, Y FN - 1 Φ = B 1 - 1 T B 2 - 1 T B N - 1 T T C N 2 × D , the second chosen N2 × D sub-matrix is Y FN 0 Φ = B 1 0 T B 2 0 T B N 0 T T , and the third sub-matrix is Y FN 1 Φ = B 1 1 T B 2 1 T B N 1 T T . To exploit the recursive relationship of the three sub-matrices Y FN - 1 Φ , Y FN 0 Φ , and Y FN 1 Φ including all spherical harmonics up to order N, we define four diagonal matrices as follows:

Θ = diag μ 1 , μ 2 , , μ D ,
(27)
Γ = diag 0 n = 1 , - 1 , 0 , 1 n = 2 , , - N + 1 , , N - 1 n = N ,
(28)
Λ ± = diag λ 1 , 0 ± n = 1 , λ 2 , - 1 ± , λ 2 , 0 ± , λ 2 , 1 ± n = 2 , , λ N , - N + 1 ± , , λ N , N - 1 ± n = N ,
(29)

where Θ contains the DOA information of all incident sources, Γ and Λ± are the three N2 × N2 diagonal matrices. The recurrent relationship of the three sub-matrices Y FN q Φ (q = -1, 0, 1) is described as

2 Γ Y FN 0 Φ + Λ + Y FN 1 Φ Θ * + Λ - Y FN - 1 Φ Θ = 0 .
(30)

We cannot solve (30) directly to estimate the DOAs because Y FN q Φ is unknown. The available data are the sensor observations. The covariance matrix R nm (k) of the transform coefficient vector x nm (k) can be constructed as

R nm k = E x nm k x nm H k = Y FN Φ R s k Y FN H Φ + E v nm k v nm H k ,
(31)

where Rs(k) = E[s(k)sH(k)]. It can be estimated by the sample covariance matrix from the sensor samples. The eigenvalue decomposition (EVD) of R nm results in two orthogonal subspaces:

R nm = U H = U s Σ s U s H + U v Σ v U v H ,
(32)

where U s C N + 1 2 × D contains D eigenvectors spanning the signal subspace of R nm , and the diagonal matrix Σs contains the corresponding eigenvalues. Similarly, Uv denotes the noise subspace, and Σv is built from the remaining (N + 1)2 - D eigenvalues of R nm .

According to (31) and (32), Us spans the same range as that of the mixed steering matrix YFN(Φ). Therefore, the signal subspace Us can be transformed into the mixed steering matrix YFN(Φ), that is, Us = YFN(Φ)T, where T is a unique non-singular D × D matrix, called similarity transform matrix. Three sub-matrices U s q choosing from the signal subspace Us satisfy the same recurrent relationship as

2 Γ U s 0 + Λ + U s 1 ψ * + Λ - U s - 1 ψ = 0 ,
(33)

where ψ = T- 1ΘT. We can rewrite this equation in block matrix form:

2 Γ U s 0 + E ψ ¯ = 0 ,
(34)

where E = Λ - U s - 1 Λ + U s 1 C N 2 × 2 D , ψ ¯ = ψ T ψ H T C 2 D × D has the block conjugate structure. Equation 34 has the following solution:

ψ ¯ = - 2 E H E - 1 E H Γ U s 0 .
(35)

The elevation angle ϑ ^ d and the azimuth angle φ ^ d are easily estimated from the eigenvalues μ ^ 1 , μ ^ 2 , , μ ^ D of either the top or the bottom D × D sub-block of ψ ¯ as follows:

ϑ ^ d = arc tan | μ ^ d | , φ ^ d = arg μ ^ d ,
(36)

where d = 1, 2, …, D. When N2 < 2D, this procedure fails. Hence, the maximum number of sources that can be accurately estimated by this algorithm is D = N2/2, where the operation is the flooring operation. The three classical spatial sampling schemes for a spherical microphone array are equiangular, Gaussian, and nearly uniform sampling schemes. For a spherical microphone array with a given order N, the equiangular sampling scheme requires 4(N + 1)2 sensors, the Gaussian sampling scheme demands 2(N + 1)2 sensors, while the uniform sampling scheme only needs (N + 1)2 sensors [43]. Therefore, when L sensors collect the information of acoustic sources, the dimension of the spherical harmonics space (N + 1)2 ≤ L, that is, the dimension of localization in the spherical harmonics domain is lower than that of the element space. The maximal number of sources that can be uniquely estimated is the nearest integer less than or equal to N2/2.

3.2 Range estimation

The above DOA estimator can provide azimuth and elevation estimates of both far-field and near-field sources. However, it cannot discriminate far-field or near-field sources [37]. Only for near-field sources, range information must be estimated. When the source is in the near-field, the steering vector is dependent on the range in the spherical harmonics domain:

y N r d , Φ d = b 0 d kR , k r d Y 0 0 Φ d b 1 d kR , k r d Y 1 - 1 Φ d b N d kR , k r d Y N N Φ d T .
(37)

Therefore, the near-field MUSIC spectrum [44] is

p r d , Φ ^ d = 1 y N H r d , Φ ^ d I - U s U s H y N r d , Φ ^ d .
(38)

The near-field search extent is r d  (R, rN). The range estimate of the d th source is obtained by

  • r ^ d = max r d p r d , Φ ^ d .
    (39)

The far-field steering vector in the spherical harmonics domain is independent of the source range:

y F Φ d = b 0 kR Y 0 0 Φ d b 1 kR Y 1 - 1 Φ d b N kR Y N N Φ d T .
(40)

The far-field MUSIC spectrum [44] is

p Φ ^ d = 1 y F H Φ ^ d I - U s U s H y F Φ ^ d .
(41)

With the DOA estimates Φ ^ d = ϑ ^ d , φ ^ d (d = 1, 2, …, D) in (36), the range estimator calculates the MUSIC spectra according to (38) and (41). For a DOA estimate Φ ^ d , we compare p Φ ^ d and the peak of p r d , Φ ^ d ; if the former is larger than the latter, the source is the far-field one. Otherwise, the source is the near-field one and the estimated r ^ d in (39) automatically pairs with the DOA estimate Φ ^ d .

3.3 Algorithm summarization

The proposed two-stage algorithm can be summarized as follows:

Step 1. Array signal modeling: Apply spherical harmonic representation and construct mixed field array signal model in the spherical harmonics domain in (23).

Step 2. DOA estimation: Perform the EVD of R nm in (32) and choose three sub-matrices from the signal subspace. Construct the recurrent relation of these sub-matrices in (33) and estimate DOA information for all sources in (36).

Step 3. Range estimation: Based on the estimated DOAs, compute the far-field MUSIC spectrum in (41) and search the near-field MUSIC spectrum in (38) to discriminate far-field and near-field sources and obtain the pairing ranges for the near-field sources.

Remarks

  1. 1.

    For computational complexity, we mainly consider the implementation of EVD and one-dimensional (1D) MUSIC spectral search. In the spherical harmonics domain, the dimension of covariance matrix R nm is (N + 1)2 × (N + 1)2. The computational complexity of the proposed localization algorithm includes (a) the eigendecomposition of R nm , of order O N + 1 6 , and (b) 1D spectral search, of order O N + 1 4 g r , where g r is the search number conducted along the range axis [45].

  2. 2.

    Whether the incident sources are far-field ones, near-field ones, or their mixture, the spherical ESPRIT-like algorithm in the first stage can estimate all DOAs. When all incident sources are far-field ones, only the DOA information is enough. If all sources locate in near-field, 1D MUSIC spectral search can find out the pairing range parameters. When the mixed sources include far-field and near-field, based on the near-field spectral search, compute MUSIC spectra in (38) and (41) and distinguish the source being far-field or near-field one.

  3. 3.

    The proposed algorithm can localize mixed sources without parameter pairing and multidimensional search. In the first stage, the proposed algorithm estimates DOAs of mixed far-field and near-field sources. In the second stage, the DOA estimates are used to compute the 1D MUSIC spectra according to (38) and (41). The spectral peak corresponds to the pairing ranges. Therefore, parameter pairing can be avoided in the proposed method.

4 Simulations

In this section, we conduct some simulations in free-field environment to evaluate the proposed localization algorithm for narrowband and wideband sources. A 32-element uniform sampling [43] of spherical microphone array is chosen to estimate source locations. Its radius is assumed to be 10 cm. The highest spherical harmonics order is N = 4, so the near-field extent of the array is (R, 0.64λ), where λ is the wavelength. The maximum number of sources that can be detected by the algorithm is D = N2 / 2 = 8. The DOA (azimuth and elevation) and range estimations are scaled in units of degree and wavelength (or meter), respectively. The performance of the localization estimation is measured by the root-mean-square error (RMSE) of 1,000 independent Monte Carlo trials. In addition, the Bayesian Cramer-Rao bound (CRB) provides a lower bound on the variance of any estimated parameter and defines the ultimate accuracy. The CRB analysis in [46] assumed all the sources were from far-field. The CRB analysis in [47] assumed the incident sources were all from near-field. When both far-field and near-field sources coexisted, the CRB analysis was provided in [6].

4.1 Narrowband source localization

The first simulation demonstrates the performance of the proposed algorithm in localizing far-field and near-field sources. One far-field source and one near-field source are located at (r1, Φ1) = (∞, 45°, 68°) and (r2, Φ2) = (0.29λ, 60°, 122°), respectively. The number of snapshots and SNR are fixed at 128 and 15 dB, respectively. Firstly, the azimuths and elevations of the two sources are estimated. The RMSEs of the azimuth and elevation estimations are 0.07, 0.22, 0.03, and 0.10, respectively. Based on the DOA estimations and suppose the two sources both located in the near-field, MUSIC spectra for the two sources using (38) of the proposed algorithm are shown in Figure 2. Assume the two sources both located in the far-field, the MUSIC spectra of the two sources calculated by (41) are -12.39 and -4.73 dB, respectively. From the maximums of the MUSIC spectra, we can discriminate that one is the near-field source and the other is the far-field source. The range of the near-field source can be estimated from the MUSIC spectra. Therefore, the proposed algorithm can distinguish the near-field and far-field sources and performs well in localizing them.

Figure 2
figure 2

MUSIC spectra of source range.

  1. 1.

    RMSE versus SNR: The number of snapshots is set equal to 128. The two sources are localized in free-field and a rectangular room with the floor area 71 m2, the ceiling height 3 m, and the reverberation time 0.7 s [48], respectively. The array is placed in the center of the rectangular room. When SNR varies from 0 to 30 dB, the RMSEs of the azimuths, elevations, and range estimations are shown in Figure 3. When SNR increases, the RMSEs of the azimuths, elevations, and range estimations decrease. The localization performance degrades in reverberant environment compared with that in free-field. When the two sources are incident from the same direction (45°, 122°), the RMSEs of azimuths, elevations, and range estimations versus SNR in free-field are shown in Figure 4. From the figure, we can see that the proposed algorithm can localize the far-field and near-field source incident from the same direction. Moreover, the proposed algorithm has approximated estimation accuracy for the far-field and near-field sources with respect to azimuths and elevations. The elevation angle estimation accuracy is higher than the azimuth estimation accuracy. This is because spherical arrays can provide a larger aperture in the elevation plane than that in the azimuth plane.

Figure 3
figure 3

RMSEs of (a) azimuth, (b) elevation, and (c) range estimations versus SNR for mixed field.

Figure 4
figure 4

RMSEs of (a) azimuth, (b) elevation, and (c) range estimations from the same direction versus SNR for mixed field.

  1. 2.

    RMSE versus snapshot: A crossed array placed in the X-Z plane is used in the simulation. Each ULA branch of the array consists of 15 (M = 7) uniformly spaced omni-directional sensors with the inter-sensor spacing R / M. First, the elevation angles are obtained based on the generalized ESPRIT method. Then, the range parameters are estimated using the root MUSIC method based on the elevation estimates [9]. Finally, a 1D MUSIC pseudo-spectrum is used to search the azimuth angles with the elevation and range estimates. SNR is fixed at 15 dB. When the number of snapshots varies from 100 to 1,100, the average performances of 1,000 Monte Carlo runs are shown in Figure 5. The RMSEs of the azimuth, elevation, and range estimations decrease as the number of snapshots increases. The estimation performance of the spherical array is better than that of the cross array. This may be due to the symmetrical structure of the spherical array in 3D space and the simultaneous estimation for azimuth and elevation angles of the proposed method. Both our proposed algorithm and the three-step estimation method in [9] are based on the eigendecomposition of the array covariance matrix. The estimate of the covariance matrix affects the localization performance. Vershynin proposed that the sample size Q (N + 1)2 suffices to estimate the covariance by the sample covariance matrix [49]. Therefore, the localization performance of the two methods gets more stable with the larger number of the snapshots. The CRBs decrease proportional to the number of the snapshots. When the number of snapshots increases, the CRBs get more stable too.

Figure 5
figure 5

RMSEs of (a) azimuth, (b) elevation, and (c) range estimations versus snapshot number for mixed field.

  1. 3.

    RMSE versus angular gap: The snapshot number is set equal to 128. The SNR is fixed at 15 dB. When the direction of the far-field source is fixed, the azimuth and elevation of the near-field source both vary from 5° to 30°. The RMSE of directions and range estimations are demonstrated in Figure 6. The DOA estimation of the far-field source and the range estimation of the near-field source are insensitive to the angular gap. The azimuth estimation performance for the near-field source of the proposed algorithm changes with the increase of the angular gap. The RMSE of the elevation estimation for the near-field source gets slightly smaller.

Figure 6
figure 6

RMSEs of (a) azimuth, (b) elevation, and (c) range estimations versus angular gap for mixed field.

  1. 4.

    RMSE versus range: Let the number of snapshots and SNR be 128 and 10 dB, respectively. When the range of the near-field source varies from 0.22λ to 0.58λ, the RMSEs of DOAs and range estimations are shown in Figure 7. From the results, we can see that both of DOAs and range estimations of the near-field source are very sensitive to the varied range. The RMSEs of the azimuth, elevation, and range estimations for the near-field source, which is closer to the spherical array, are smaller than those of the source which is farther to the array. However, the location estimation of the far-field source is insensitive to the varied range of the near-field source.

Figure 7
figure 7

RMSEs of (a) azimuth, (b) elevation, and (c) range estimations versus range for mixed field.

The maximal number of sources which can be uniquely estimated by the proposed algorithm is 8. Four near-field sources are located at (r1, Φ1) = (0.16λ, 102°, 45°), (r2, Φ2) = (0.29λ, 122°, 60°), (r3, Φ3) = (0.44λ, 60°, 10°), and (r4, Φ4) = (0.51λ, 155°, 75°), respectively. Four far-field sources are located at (r5, Φ5) = (∞, 40°, 15°), (r6, Φ6) = (∞, 168°, 40°), (r7, Φ7) = (∞, 10°, 4°), and (r8, Φ8) = (∞, 140°, 70°), respectively. When the SNR is 40 dB, the estimated RMSEs associated with the eight sources can be seen in Table 1. They are lower than 1° except for the azimuth of source 7.

Table 1 RMSEs of azimuth, elevation, and range estimations for mixed field

In the second simulation, our proposed algorithm is used to localize pure far-field sources. Two sources are located at (r1, Φ1) = (∞, 60°, 122°) and (r2, Φ2) = (∞, 45°, 68°), respectively. The number of snapshots is set equal to 128. When SNR varies from 0 to 30 dB, the RMSEs of azimuths and elevations are shown in Figure 8. When SNR increases, the estimation performance of azimuth and elevation gets better.

Figure 8
figure 8

RMSEs of (a) azimuth and (b) elevation estimations versus SNR for far-field.

We consider two sources which are located at (r1, Φ1) = (∞, 45°, 22°) and (r2, Φ2) = (∞, 45°, 22° + Δφ), with Δφ varying from 0° to 90°. When SNR varies from 0 to 40 dB, the RMSEs of azimuths and elevations are shown in Figure 9. When the SNR is low, the angular estimation RMSEs are larger for all the azimuth differences than those in high SNRs. Moreover, the azimuth difference of the two sources has more effect in low SNRs than high SNRs. In low SNRs, when the two sources are close, the algorithm estimates the azimuth and elevation with larger RMSEs. In high SNRs, the azimuth difference has slight effect on the localization performance. Therefore, the algorithm can estimate two close sources accurately in high SNRs. When the second source has the same azimuth with the first one, its elevation is 45° + Δϑ, with Δϑ varying from 0° to 25°. The localization performance is shown in Figure 10. For different SNRs, the localization performance with the varying elevation is similar to that with varying azimuth. However, when the elevation of the second source is close to 90° in low SNRs, the estimation errors get larger because the elevation estimation is determined by the tangent function.

Figure 9
figure 9

RMSEs of azimuth and elevation estimations versus azimuth variation and SNR for far-field. (a) Azimuth RMSE of source 1. (b) Azimuth RMSE of source 2. (c) Elevation RMSE of source 1. (d) Elevation RMSE of source 2.

Figure 10
figure 10

RMSEs of azimuth and elevation estimations versus elevation variation and SNR for far-field. (a) Azimuth RMSE of source 1. (b) Azimuth RMSE of source 2. (c) Elevation RMSE of source 1. (d) Elevation RMSE of source 2.

In the third simulation, the proposed method is adopted to localize pure near-field sources. Two near-field sources are located at (r1, Φ1) = (0.22λ, 45°, 68°) and (r2, Φ2) = (0.44λ, 60°, 122°), respectively. When the ranges of the sources are within the near-field extent of the array, the RMSEs of the azimuths, elevations, and ranges are shown in Figure 11. From Figure 11, it can be seen that the RMSEs of the azimuths, elevations, and ranges for the first source, which is closer to the spherical array, are smaller than those of the second source. When the range of the second source varies from 0.2λ to 0.5λ, the angular and range estimations are shown in Figure 12. The angular and range estimation errors of the source closer to the array are smaller than those of the other source. When the range of the second source is closer to the boundary of the near-field extent, the azimuth, elevation, and range estimation errors get larger for almost all SNRs from 0 to 30 dB.

Figure 11
figure 11

RMSEs of (a) azimuth, (b) elevation, and (b) range estimations versus SNR for near-field.

Figure 12
figure 12

RMSEs of azimuth, elevation, and range estimations versus range difference and SNR for near-field. (a) Azimuth RMSE of source 1. (b) Azimuth RMSE of source 2. (c) Elevation RMSE of source 1. (d) Elevation RMSE of source 2. (e) Range RMSE of source 1. (f) Range RMSE of source 2.

4.2 Wideband source localization

The proposed algorithm can localize multiple wideband mixed field sources. A female signal and a male signal randomly chosen from TIMIT database [50] are incident from the direction (45°, 68°) and (60°, 122°). The male signal source is located in the far-field of the array. The female signal source moves to the near-field of the array with range r = 0.2 m. The signal sampling frequency is 16 kHz with 16 bits per sample. The observed data are decomposed into frequency bins using a short-time Fourier transform (STFT) of length 512 with a rectangular window. For each frequency bin, the sample covariance matrix R nm (k) is estimated with Q frequency snapshots as follows:

R nm k = 1 Q q = 1 Q R nm k , q = 1 Q q = 1 Q x ^ nm k , q x ^ nm H k , q ,
(42)

where x ^ nm k , q is the q th snapshot of x nm (k). The operating frequency bandwidth is limited by aliasing at the higher frequencies and measurement errors at the lower frequencies [43]. Therefore, the frequency bins we used are limited to around 2 ≤ kr ≤ N due to errors and spatial aliasing. The SNR is 10 dB. When the value of kr is within the extent (2, 4), the RMSEs of azimuth and elevation for two sources are smaller. Therefore, we choose the frequency bins satisfying the constraint 2 ≤ kr ≤ N to localize wideband sources.

The wideband source localization results for different SNRs are denoted in Figure 13. The near-field source angular estimates are more accurate than those of the far-field source. The length of the observed signals affects the calculation of the sample covariance matrix in (42) [51]. When the size of the available data is small, the number of snapshots Q for each frequency bin is small and estimating the covariance matrix correctly is harder. When the signal size is large, the correct sample covariance matrix estimation leads to better localization performance shown in Figure 14. The SNR is 10 dB. The RMSE changes of the near-field range estimation are consistent with those of azimuth and elevation estimates, because the range estimate is based on the angular estimates.

Figure 13
figure 13

RMSEs of angular and range estimations versus SNR for wideband mixed field sources. (a) Angular estimation RMSE versus SNR. (b) Range estimation RMSE versus SNR.

Figure 14
figure 14

RMSEs of angular and range estimations versus time for wideband mixed field sources. (a) Angular estimation RMSE versus time. (b) Range estimation RMSE versus time.

5 Conclusions

In this paper, we developed a two-stage source localization algorithm jointly estimating elevation, azimuth angles, and range for the mixed far-field and near-field sources using spherical array. In the first stage, the 3D localization algorithm estimated the DOAs of all mixed sources. In the second stage, 1D MUSIC method was used to distinguish far-field and near-field sources and provided the ranges of the near-field sources based on the estimated DOAs. The algorithm had good performance for azimuth, elevation, and range estimations. It had low computational cost because it avoided multidimensional search and did not require parameter pairing procedure. The estimation performance of the far-field sources was not sensitive to the varied range of the near-field sources. However, the RMSEs of the azimuths, elevations, and ranges for the near-field source, which was closer to the spherical array, were smaller than those of the source which was farther to the array. Spherical array had better performance for elevation estimation due to its larger aperture in the elevation plane than azimuth estimation. In our future work, we will develop range estimation algorithm without 1D search and incorporate a reverberated signal model to localize multiple sources in reverberant environment.

References

  1. Krim H, Viberg M: Two decades of array signal processing research: the parametric approach. IEEE Signal Process Mag 1996, 13(4):67-94. 10.1109/79.526899

    Article  Google Scholar 

  2. Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP J Appl Signal Process 2003, 4: 359-370.

    Article  MATH  Google Scholar 

  3. Liang L, Liu D: Passive localization of near-field sources using cumulant. IEEE Sensors J 2009, 9(8):953-960.

    Article  MathSciNet  Google Scholar 

  4. Wang B, Liu J, Sun X: Mixed sources localization based on sparse signal reconstruction. IEEE Signal Process Lett 2012, 19(8):487-490.

    Article  MathSciNet  Google Scholar 

  5. Liang L, Liu D: Passive localization of mixed near-field and far-field sources using two-stage MUSIC algorithm. IEEE Trans Signal Process 2010, 58(1):108-120.

    Article  MathSciNet  Google Scholar 

  6. He J, Swamy MNS, Ahmad MO: Efficient application of MUSIC algorithm under the coexistence of far-field and near-field sources. IEEE Trans Signal Process 2012, 60(4):2066-2070.

    Article  MathSciNet  Google Scholar 

  7. Wang B, Zhao Y, Liu J: Mixed-order MUSIC algorithm for localization of far-field and near-field sources. IEEE Signal Process Lett 2013, 20(4):311-314.

    Article  MathSciNet  Google Scholar 

  8. Jiang J, Duan F, Chen J, Li Y, Hua X: Mixed near-field and far-field sources localization using the uniform linear sensor array. IEEE Sensors J 2013, 13(8):3136-3143.

    Article  Google Scholar 

  9. Jiang J, Duan F, Chen J: Three-dimensional localization algorithm for mixed near-field and far-field sources based on ESPRIT and MUSIC method. Prog Electromagnetics Res 2013, 136: 435-456.

    Article  Google Scholar 

  10. Mathews CP, Zoltowski MD: Eigenstructure techniques for 2-D angle estimation with uniform circular arrays. IEEE Trans Signal Process 1994, 42(9):2395-2407. 10.1109/78.317861

    Article  Google Scholar 

  11. Pavlidi D, Griffin A, Puigt M, Mouchtaris A: Real-time multiple sound source localization and counting using a circular microphone array. IEEE Trans Audio Speech Lang Process 2013, 21(10):2193-2206.

    Article  Google Scholar 

  12. Sommerkorn G, Hampicke D, Klukas R, Richter A, Schneider A, Thomä R: Uniform rectangular antenna array design and calibration issues for 2-D ESPRIT application. In The 4th European Personal Mobile Communications Conference, Vienna. Morgan Kaufman, San Francisco; 2001:1-8.

    Google Scholar 

  13. Ioannides P, Balanis CA: Uniform circular and rectangular arrays for adaptive beamforming applications. IEEE Antennas Wireless Propagation Lett 2005, 4: 351-354. 10.1109/LAWP.2005.857039

    Article  Google Scholar 

  14. Smaragdis P, Boufounos P: Position and trajectory learning for microphone arrays. IEEE Trans Audio Speech Lang Process 2007, 15(1):358-368.

    Article  Google Scholar 

  15. Costa M, Koivunen V, Richter A: Low complexity azimuth and elevation estimation for arbitrary array configurations. In IEEE Int. Conf. Acoust., Speech, Signal Process, Taipei. IEEE, Piscataway; 2009:2185-2188.

    Google Scholar 

  16. Belloni F, Richter A, Koivunen V: D. O. A. Estimation, via manifold separation for arbitrary array structures. IEEE. Trans. Signal. Process. 2007, 55(10):4800-4810.

    Article  MathSciNet  Google Scholar 

  17. Filik T, Tuncer TE: A fast and automatically paired 2-dimensional direction-of-arrival estimation using arbitrary array geometry. In IEEE 17th Signal Process and Communications Applications Conference, Antalya. IEEE, Piscataway; 2009:556-559.

    Google Scholar 

  18. Park M, Rafaely B: Sound-field analysis by plane-wave decomposition using spherical microphone array. J Acoust Soc Am 2005, 118(5):3094-3103. 10.1121/1.2063108

    Article  Google Scholar 

  19. Yan S, Sun H, Svensson UP, Ma X, Hovem JM: Optimal modal beamforming for spherical microphone arrays. IEEE Trans Audio Speech Lang Process 2012, 19(2):361-371.

    Article  Google Scholar 

  20. Khaykin D, Rafaely B: Acoustic analysis by spherical microphone array processing of room impulse response. J Acoust Soc Am 2012, 132(1):261-270. 10.1121/1.4726012

    Article  Google Scholar 

  21. Teutsch H: Modal Array Signal Processing – Principles and Applications of Acoustic Wavefield Decomposition. Springer-Verlag, Heidelberg; 2007:134-146.

    MATH  Google Scholar 

  22. Cohen I, Benesty J, Gannot S: Speech Processing in Modern Communication. Springer-Verlag, Heidelberg; 2010:281.

    Book  MATH  Google Scholar 

  23. Goossens R, Rogier H: Unitary spherical ESPRIT: 2-D angle estimation with spherical arrays for scalar fields. IET Signal Process 2009, 3(2):221-231.

    Article  MathSciNet  Google Scholar 

  24. Sun H, Kellermann W, Mabande E: Robust localization of multiple sources in reverberant environments using EB-ESPRIT with spherical microphone arrays. In IEEE Int. Conf. Acoust., Speech, Signal Process, Prague. IEEE, Piscataway; 2011:117-120.

    Google Scholar 

  25. Argentieri S, Danes P: Broadband variations of the MUSIC high-resolution method for sound source localization robotics. In IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego. IEEE, Piscataway; 2007:2009-2014.

    Google Scholar 

  26. Sun H, Mabande E, Kowalczyk K, Kellermann W: Localization of distinct reflections in rooms using spherical microphone array eigenbeam processing. J Acoust Soc Am 2012, 131(4):2828-2840. 10.1121/1.3688476

    Article  Google Scholar 

  27. Wu PKT, Epain N, Jin C: A super-resolution beamforming algorithm for spherical microphone arrays using a compressive sensing approach. In Conf. Acoust., Speech, Signal Process, Vancouver. IEEE, Piscataway; 2013:649-653.

    Google Scholar 

  28. Sawada H, Mukai R, Araki S, Malcino S: Multiple source localization using independent component analysis. In IEEE Antennas and Propagation Society International Symposium, Washington DC. IEEE, Piscataway; 2005:81-84.

    Google Scholar 

  29. Epain N, Jin C: Independent component analysis using spherical microphone arrays. Acta Acustica United Acustica 2012, 98(1):91-102. 10.3813/AAA.918495

    Article  Google Scholar 

  30. Noohi T, Epain N, Jin C: Direction of arrival estimation for spherical microphone arrays by combination of independent component analysis and sparse recovery. In Conf. Acoust., Speech, Signal Process, Vancouver. IEEE, Piscataway; 2013:346-349.

    Google Scholar 

  31. Malioutov D, Cetin M, Willsky AS: A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans Signal Process 2005, 53(8):3010-3022.

    Article  MathSciNet  Google Scholar 

  32. Wei X, Yuan Y, Ling Q: DOA estimation using a greedy block coordinate descent algorithm. IEEE Trans Signal Process 2012, 60(4):2066-2070.

    Article  MathSciNet  Google Scholar 

  33. Swartling M, Sallberg B, Grbic N: Source localization for multiple speech sources using low complexity non-parametric source separation and clustering. Signal Process 2011, 91: 1781-1788. 10.1016/j.sigpro.2011.02.002

    Article  MATH  Google Scholar 

  34. Meyer J, Elko GW: Position independent close-talking microphone. Signal Process 2006, 86(6):1254-1259. 10.1016/j.sigpro.2005.05.036

    Article  MATH  Google Scholar 

  35. Fisher E, Rafaely B: The nearfield spherical microphone array. In IEEE Int. Conf. Acoust., Speech, Signal Process, Las Vegas. IEEE, Piscataway; 2008:5272-5275.

    Google Scholar 

  36. Fisher E, Rafaely B: Near-field spherical microphone array processing with radial filtering. IEEE Trans Audio Speech Lang Process 2011, 19(2):256-265.

    Article  Google Scholar 

  37. Huang Q, Song T: DOA estimation of mixed near-field and far-field sources using spherical array. In The 11th Int. Conf. on Signal Process, Beijing. IEEE, Piscataway; 2012:382-385.

    Google Scholar 

  38. Williams EG: Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography. Academic, New York; 1999:183.

    Book  Google Scholar 

  39. Rafaely B, Weiss B, Bachmat E: Spatial aliasing in spherical microphone arrays. IEEE Trans Signal Process 2007, 55(3):1003-1010.

    Article  MathSciNet  Google Scholar 

  40. Meyer J, Elko G: A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield. In IEEE Int. Conf. Acoust., Speech, Signal Process, Orlando. IEEE, Piscataway; 2002:II-1781-II-1784.

    Google Scholar 

  41. Abhayapala TD, Ward DB: Theory and design of high order sound field microphones using spherical microphone array. In IEEE Int. Conf. Acoust., Speech, Signal Process Orlando. IEEE, Piscataway; 2002:II-1949-II-1952.

    Google Scholar 

  42. Teutsch H, Kellermann W: Detection and localization of multiple wideband acoustic sources based on wavefield decomposition using spherical apertures. In IEEE Int Conf Acoust, Speech, Signal Process, Las Vegas. IEEE, Piscataway; 2008:5276-5279.

    Google Scholar 

  43. Rafaely B: Analysis and design of spherical microphone arrays. IEEE Trans Speech Audio Process 2005, 13(1):135-143.

    Article  Google Scholar 

  44. Schimidt RO: Multiple emitter location and signal parameter estimation. IEEE Trans Antennas Propag 1986, AP-34: 276-280.

    Article  Google Scholar 

  45. Wang Y, Chen J, Fang W: TST-MUSIC for joint DOA-delay estimation. IEEE Trans Signal Process 2001, 49(4):721-729. 10.1109/78.912916

    Article  Google Scholar 

  46. Stoica P, Larsson EG, Gershman AB: The stochastic CRB for array processing: a textbook derivation. IEEE Signal Process Lett 2001, 8(5):148-150.

    Article  Google Scholar 

  47. El Korso MN, Boyer R, Renaux A, Marcos S: Conditional and unconditional Cramér–Rao bounds for near-field source localization. IEEE Trans Signal Process 2010, 58(5):2901-2907.

    Article  MathSciNet  Google Scholar 

  48. Gardner B: A realtime multichannel room simulator. J Acoust Soc Am 1992, 92(4):2395.

    Article  Google Scholar 

  49. Vershynin R: How close is the sample covariance matrix to the actual covariance matrix? J Theor Probabil 2012, 25: 655-686. 10.1007/s10959-010-0338-z

    Article  MathSciNet  MATH  Google Scholar 

  50. Lamel LF, Kasel RH, Senneff S: Speech database development: design and analysis of the acoustic-phonetic corpus. In Proc. of the DARPA Speech Recognition Workshop. IET, Glasgow; 1986:100-109.

    Google Scholar 

  51. Puigt M, Vincent E, Deville Y: Validity of the independence assumption for the separation of instantaneous and convolutive mixtures of speech and music sources. In Int. Conf. ICA, LNCS, Brazil. Springer, New York; 2009:613-620.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their valuable comments. The work was supported by the National Natural Science Foundation (61001160), Innovation Program of Shanghai Municipal Education Commission (12YZ023), and Visiting Scholar Funding of Shanghai Municipal Education Commission of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinghua Huang.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ original submitted files for images

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, Q., Wang, T. Acoustic source localization in mixed field using spherical microphone arrays. EURASIP J. Adv. Signal Process. 2014, 90 (2014). https://doi.org/10.1186/1687-6180-2014-90

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/1687-6180-2014-90

Keywords