 Research
 Open access
 Published:
Acoustic source localization in mixed field using spherical microphone arrays
EURASIP Journal on Advances in Signal Processing volume 2014, Article number: 90 (2014)
Abstract
Spherical microphone arrays have been used for source localization in threedimensional space recently. In this paper, a twostage algorithm is developed to localize mixed farfield and nearfield acoustic sources in freefield environment. In the first stage, an array signal model is constructed in the spherical harmonics domain. The recurrent relation of spherical harmonics is independent of farfield and nearfield mode strengths. Therefore, it is used to develop spherical estimating signal parameter via rotational invariance technique (ESPRIT)like approach to estimate directions of arrival (DOAs) for both farfield and nearfield sources. In the second stage, based on the estimated DOAs, simple onedimensional MUSIC spectrum is exploited to distinguish farfield and nearfield sources and estimate the ranges of nearfield sources. The proposed algorithm can avoid multidimensional search and parameter pairing. Simulation results demonstrate the good performance for localizing farfield sources, or nearfield ones, or mixed field sources.
1 Introduction
Acoustic source localization using microphone arrays has many applications, such as video conferences, intelligent systems, and robotics. It has received great attention since almost four decades [1, 2]. In most of array signal processing applications, the wavefront is assumed to be planar, that is, all the sources are located in the farfield (FF) of an array. In this case, the parameter that characterizes a source location is its direction of arrival (DOA) [2]. In the nearfield (NF) of an array, the range information should be integrated into the array signal model for accurately characterizing sources [3]. Although plane wave assumption can simplify the modeling and processing, it cannot hold in nearfield applications and results in analysis errors. Moreover, in some practical applications, the signals collected by microphone arrays are often the mixture of farfield and nearfield sources. Each source may be located in the nearfield or farfield of an array [4–9]. The localization methods for the mixed field sources should discriminate farfield and nearfield sources. Then for farfield sources, they only estimate DOA information. For nearfield sources, range information is also estimated.
If an acoustic source locates in threedimensional (3D) space, its position information is jointly described by range and bearing (azimuth and elevation). The geometric structure of a microphone array is very important for the localization performance. Currently, most localization techniques used a uniform linear array (ULA) or a uniform circular array (UCA) [1, 2, 5–7, 9–11] to estimate source positions. Planar arrays, such as cross array and uniform rectangular array (URA) are the straightforward extensions of the ULA and can estimate both azimuth and elevation [8, 12, 13]. ULAs cause a 180° ambiguity in the azimuth estimation. UCAs can provide 360° azimuthal coverage due to its circular symmetry in the azimuth plane. The main drawback of planar arrays including UCAs is that they provide a smaller aperture in the elevation plane compared to the azimuth plane, resulting in poor estimation of elevation angles [10]. Some arbitrary array configurations were investigated to localize sources [14–17]. They were different from array uniformity that traditional localization approaches require. The array structure was selected according to some specific practical applications. Spherical microphone arrays have 3D symmetrical geometry configuration and can capture higher order sound field information. The 3D structure advantage facilitates more accurate sound source localization. Moreover, they can be analyzed within the mathematical framework of the spherical Fourier transform (SFT) which greatly simplifies processing in the space domain. Therefore, they have received considerable attention and have a wide variety of applications in the fields of source localization, beamforming, and acoustic analysis [18–20]. In this paper, we aim to develop a novel algorithm able to accurately estimate the locations of mixed field sources using spherical arrays.
Many techniques were proposed to estimate DOAs of multiple acoustic sources. Multiple signal classification (MUSIC) and estimating signal parameter via rotational invariance techniques (ESPRIT) are two subspace techniques [21, 22]. The latter avoids multidimensional search in the parameter space. Goossens and Rogier proposed a unitary spherical ESPRIT algorithm based on the spherical phasemode excitation that yielded accurate estimates with low computational complexity [23]. The eigenbeam (EB)ESPRIT algorithm for spherical arrays was presented in [24] with its performance analysis for robust localization in reverberant environments. It only exploited the relation between a fixed order of spherical harmonics. Many approaches were based on beamforming [22]. Argentieri and Danes proposed an online beamspace MUSIC method with a beamforming scheme to localize sound sources in robotics [25]. Sun et al. proposed several steered beamformerbased and subspacebased localization techniques in the spherical EB domain [26]. They localized early reflections in room acoustic environments. Wu et al. used sparse recovery to localize sources and formulate superresolution beamforming in the spherical harmonic domain [27]. However, when a source is close to the spherical array, the array signal model based on the farfield assumption is no longer valid. Independent component analysis (ICA) was used to estimate source locations. It employed higher order statistics and directly identified basis vectors containing the source location information. It was applied in nearfield or farfield [28]. The ICAbased method was used to estimate DOAs for spherical microphone arrays [29, 30]. It fails to localize sources which are not statistically independent. Source localization can be considered as an overcomplete basis representation problem using a grid of spatial locations. Many sparse recovery methods were used to estimate source DOAs [31, 32]. If the source is in 3D space, the number of basis is large and the computational complexity is high. Some approaches assumed that one source was dominant over the others in some timefrequency zones [11, 33]. They extended the singlesource DOA algorithm over these zones to estimate multiple source locations. They were based on the sparse representation of the observation signals in the timefrequency domain.
In many practical applications, the observations collected by an array may be either mixed farfield and nearfield signals or multiple farfield signals or multiple nearfield sources. Most of the above techniques localize sources in farfield or nearfield. In recent years, source localization in mixed nearfield and farfield has been developed using MUSIC algorithm [5–7], ESPRITlike technique [8], or sparse signal reconstruction method [4] based on a linear array. Jiang et al. proposed a 3D source localization algorithm with a cross array [9]. First, the elevation angles are obtained based on the generalized ESPRIT method. Similar to the root MUSIC method, the range parameters are estimated with the elevation estimates. Finally, a MUSIC pseudospectrum function is used to get the azimuth angles with the elevation and range estimates. Due to the 3D symmetrical structure, spherical arrays have been widely used in farfield source localization [23, 24]. A spherical microphone array was used in the nearfield, and a new closetalking microphone array was proposed in [34]. It can adaptively compensate for the distance and orientation of a nearfield source. Fisher and Rafaely presented a nearfield spherical microphone array and defined the nearfield criterion in terms of the array order and radius [35]. They analyzed spherical microphone array capabilities in the nearfield and designed a radial filter discriminating the distances between the sources incident from the same direction [36]. Although the aforementioned work considers the nearfield processing of the spherical microphone array, nearfield or mixed field source localization via a spherical microphone array has not yet been studied. Based on the recurrent relation of the spherical harmonics, only DOAs were estimated for mixed field sources simultaneously [37]. However, how to distinguish nearfield and farfield sources and how to estimate the ranges of nearfield sources were not considered.
The aim of our work is to develop a new method that is able to localize mixed farfield and nearfield sources simultaneously using spherical arrays in freefield environment. It can avoid the parameter pairing problem and complex multidimensional search. Threedimensional MUSIC method scans the azimuth, elevation, and range parameter space and brings very high computational complexity. Therefore, it is not practical for direct source localization in 3D space. First, we construct the mixed nearfield and farfield array signal model in the spherical harmonics domain. The mixed steering matrix in the spherical harmonics domain only contains the source DOA and range information. Moreover, the DOA and range information is decoupled. Exploiting the recurrent relation between spherical harmonics, we extend the spherical ESPRIT method to simultaneously discriminate directions of multiple farfield and nearfield sources. This avoids twodimensional parameter space search and the azimuth and elevation pairing. Based on the estimated DOAs, the ranges of nearfield sources can be easily obtained using the onedimensional MUSIC algorithm with high resolution.
The remainder of this paper is organized as follows: Section 2 introduces mixed field array signal model in the spherical harmonics domain. A twostage source localization with spherical arrays is developed in Section 3. Simulation results in Section 4 are presented to demonstrate the performance of the proposed algorithm. Conclusions are given in Section 5.
2 Array signal model for mixed field sources
To clarify the notations, scalars are denoted as italic letters (a, b, A, B, …), column vectors as lowercase boldface letters (a, b, …), and matrices as boldface capitals (A, B, …). The superscripts T, ∗, and H denote transpose, complex conjugation, and conjugate transpose, respectively. diag(⋅) defines a diagonal matrix and arg(⋅) calculates the phase.
The spherical coordinate system is used to describe the positions of sensors and source signals in 3D space shown in Figure 1. A total of L identical and isotropic sensors mount on the rigid spherical surface with radius R. Each sensor element is unambiguously defined by its elevation θ_{ l } and azimuth ϕ_{ l } (l = 1, 2, …, L), measured from the positive zaxis and xaxis, respectively. Thus, R_{ l } = (R, θ_{ l }, ϕ_{ l }) describes the sensor position. Consider a point source located at r_{ d } = (r_{ d }, ϑ_{ d }, φ_{ d }), where r_{ d } is the distance measured from the center of the spherical array, and ϑ_{ d } and φ_{ d } represent the elevation and azimuth of the source, respectively. For the spherical microphone array with an order N, the nearfield extent is suggested [35, 36]
where k is the wavenumber. The maximal wavenumber is
By combining (1) and (2), the criterion for r_{ d } to be in the nearfield can be written as
Assuming there are D source signals impinging on the spherical array. The first D_{1} sources are assumed to be farfield signals, while the remaining D_{2} = D  D_{1} sources locate within the nearfield extent of the array. In the presence of additive noises, the model in the spacefrequency domain is represented as
where \mathbf{x}\left(\mathit{k}\right)={\left[\begin{array}{cccc}\hfill {\mathit{x}}_{1}\left(\mathit{k}\right)\hfill & \hfill {\mathit{x}}_{2}\left(\mathit{k}\right)\hfill & \hfill \cdots \hfill & \hfill {\mathit{x}}_{\mathit{L}}\left(\mathit{k}\right)\hfill \end{array}\right]}^{\mathit{T}} denotes an observation vector composed of the pressure samples at each sensor at a frequency corresponding to the wavenumber k. \mathbf{v}\left(\mathit{k}\right)={\left[\begin{array}{cccc}\hfill {\mathit{v}}_{1}\left(\mathit{k}\right)\hfill & \hfill {\mathit{v}}_{2}\left(\mathit{k}\right)\hfill & \hfill \cdots \hfill & \hfill {\mathit{v}}_{\mathit{L}}\left(\mathit{k}\right)\hfill \end{array}\right]}^{\mathit{T}} is a noise vector, and {\mathbf{s}}_{\mathrm{F}}\left(\mathit{k}\right)={\left[\begin{array}{cccc}\hfill {\mathit{s}}_{1}\left(\mathit{k}\right)\hfill & \hfill {\mathit{s}}_{2}\left(\mathit{k}\right)\hfill & \hfill \cdots \hfill & \hfill {\mathit{s}}_{{\mathit{D}}_{1}}\left(\mathit{k}\right)\hfill \end{array}\right]}^{\mathit{T}} and {\mathbf{s}}_{\mathrm{N}}\left(\mathit{k}\right)={\left[\begin{array}{cccc}\hfill {\mathit{s}}_{{\mathit{D}}_{1}+1}\left(\mathit{k}\right)\hfill & \hfill {\mathit{s}}_{{\mathit{D}}_{1}+2}\left(\mathit{k}\right)\hfill & \hfill \cdots \hfill & \hfill {\mathit{s}}_{\mathit{D}}\left(\mathit{k}\right)\hfill \end{array}\right]}^{\mathit{T}} are farfield and nearfield source signals, respectively. They are assumed to be statistically independent and well separated. {\mathbf{A}}_{\mathrm{F}}\in {\mathit{C}}^{\mathit{L}\times {\mathit{D}}_{1}} and {\mathbf{A}}_{\mathrm{N}}\in {\mathit{C}}^{\mathit{L}\times {\mathit{D}}_{2}} are the corresponding physical steering matrices assuming the array is in freefield. The D sources are incident from DOAs Φ_{ d } = (ϑ_{ d }, φ_{ d }), where d = 1, 2, …, D. Range information {\mathit{r}}_{{\mathit{d}}_{2}} is important only for nearfield sources, where d_{2} = D_{1} + 1, D_{1} + 2, …, D. The objective is to estimate the azimuths and elevations for farfield sources and joint azimuthelevationranges for nearfield sources.
The localization algorithm is developed under the following assumptions:

1.
The number of all sources is known.

2.
The incident source signals are statistically independent.

3.
The noise is zeromean, complex circular Gaussian, and spatially uniform white, and is statistically independent of all the signals [6].
2.1 Spherical harmonic representation for spherical array processing
One advantage of spherical arrays is that it can be analyzed within the mathematical framework of the spherical Fourier transform which greatly simplifies processing in the space domain. Any square integrable function on a sphere g(θ, ϕ) can be denoted by g_{ nm } using the following SFT:
where the integral covers the entire surface of the unit sphere S^{2}, and {\mathit{Y}}_{\mathit{n}}^{\mathit{m}}\left(\mathit{\theta},\mathit{\varphi}\right) is the spherical harmonic of order n and degree m defined as
where {\mathit{P}}_{\mathit{n}}^{\mathit{m}}\left(\text{cos}\mathit{\theta}\right) is the associated Legendre polynomial [38]. The corresponding inverse Fourier transform is
The spherical harmonics are orthonormal, i.e., [19],
where δ_{nn′} = 1 for n = n^{′}, and δ_{nn′} = 0 otherwise. Equation 8 is the orthonormality of continuous spherical harmonics. However, spherical microphone arrays perform spatial sampling of continuous functions defined on a sphere. Spatial sampling, similar to timedomain sampling, requires limited spatial bandwidth (limited harmonic order) to avoid aliasing [39].
Consider the highest order of the spherical microphone array is up to N. Ω_{ l } = (θ_{ l }, ϕ_{ l }) denotes the elevation and the azimuth of the l th sensor. Y(Ω) is defined as an L × (N + 1)^{2} spherical harmonic matrix as follows:
According to the inverse transform truncated up to order N in (7), the observations of the spherical microphone array can be expressed in the following form:
where x_{ nm }(k) is a (N + 1)^{2} × 1 transform coefficient vector in the spherical harmonics domain. In the same way, the noise can be expressed as
where v_{ nm }(k) is a (N + 1)^{2} × 1 transform coefficient vector in the spherical harmonics domain.
According to the spherical harmonic representation of the sound field [40, 41], when sources are in the farfield of an array, the element a_{F}(l, d_{1}, k) of the farfield steering matrix A_{F} is independent of the source distance and can be expressed using spherical harmonics as
where b_{ n }(kR) is the normalized farfield mode strength and depends on the sphere boundary [8], l = 1, 2, …, L, and d_{1} = 1, 2, …, D_{1}. The term a_{F}(l, d_{1}, k) describes the transfer characteristics from the d_{1}th farfield source to the l th sensor. For concise representation, define Φ_{ d } = (ϑ_{ d }, φ_{ d }) to denote the elevation and the azimuth of the d th source and define a (N + 1)^{2} × D_{1} spherical harmonic matrix including the DOA information of the farfield sources as
B_{F} is defined as a (N + 1)^{2} × (N + 1)^{2} diagonal matrix consisting of the farfield mode strength b_{ n }(kR), i.e.,
Therefore, by combining (9), (12), (13), and (14), the farfield steering matrix can be represented as
If nearfield sources impinge on the spherical array, the element a_{N}(l, d_{2}, k) of the nearfield steering matrix A_{N} can be expressed using spherical harmonics as
where {\mathit{b}}_{\mathit{n}}^{{\mathit{d}}_{2}}\left(\mathit{kR},\mathit{k}{\mathit{r}}_{{\mathit{d}}_{2}}\right) is the normalized nearfield mode strength and depends on the sphere boundary and the source distance {\mathit{r}}_{{\mathit{d}}_{2}}. Similarly, a_{N}(l, d_{2}, k) represents the transfer characteristics from the d_{2}th nearfield source to the l th sensor. The relation of nearfield and farfield mode strengths is
where h_{ n }^{(2)}(kr_{ d }) represents the spherical Hankel function of the second kind [35, 36]. Similarly, Y(Φ_{N}) is defined as a (N + 1)^{2} × D_{2} matrix made up of the spherical harmonics of the nearfield sources, i.e.,
B_{N} is a (N + 1)^{2} × D_{2} nearfield mode strength matrix:
The nearfield steering matrix is expressed by combining (9), (16), (18), and (19) as
where ⊙ is the Hadamard product.
2.2 Array signal model in the spherical harmonics domain
By combining (10), (11), (15), and (20) into (4), the array signal model is written as
According to the least squares criterion, the array signal model is constructed in the spherical harmonics domain as
where Y_{F}(Φ) = B_{F}Y(Φ_{F}) and Y_{N}(Φ) = B_{N} ⊙ Y(Φ_{N}) are the new steering matrix of the farfield and nearfield in the spherical harmonics domain, respectively. Equation 22 can be expressed in a compact form as
where {\mathbf{Y}}_{\mathrm{FN}}\left(\mathrm{\Phi}\right)=\left[\begin{array}{cc}\hfill {\mathbf{Y}}_{\mathrm{F}}\left(\mathrm{\Phi}\right)\hfill & \hfill {\mathbf{Y}}_{\mathrm{N}}\left(\mathrm{\Phi}\right)\hfill \end{array}\right]\in {\mathit{C}}^{{\left(\mathit{N}+1\right)}^{2}\times \mathit{D}} is the new mixed steering matrix in the spherical harmonics domain and {\mathbf{s}}_{\mathrm{FN}}\left(\mathit{k}\right)={\left[\begin{array}{cc}\hfill {\mathbf{s}}_{\mathrm{F}}^{\mathit{T}}\left(\mathit{k}\right)\hfill & \hfill {\mathbf{s}}_{\mathrm{N}}^{\mathit{T}}\left(\mathit{k}\right)\hfill \end{array}\right]}^{\mathit{T}}. In the spherical harmonics domain, the farfield and nearfield mixed steering matrix is independent of the element positions of the sampled array. The following localization algorithm in Section 3 is developed based on the array signal model in (23).
3 Acoustic source localization algorithm
The source localization in mixed field aims to estimate 2D parameters {ϑ_{ d }, φ_{ d }} for farfield sources and 3D parameters {r_{ d }, ϑ_{ d }, φ_{ d }} for nearfield sources given the array observations x(k). Based on the spherical harmonic model in (23), the common characteristics of the farfield and nearfield sources lie in that only spherical harmonics in the mixed steering matrix contain the DOA information. That is, the mixed steering matrix contains the DOAs of all sources. The difference between these sources is whether mode strength depends on the source distance or not. Therefore, DOAs can be estimated using the recursive relationship of spherical harmonics. Based on the estimated DOAs, nearfield range can be easily computed by conventional MUSIC algorithm.
3.1 DOA estimation
In order to exploit the recurrent relation between spherical harmonics and avoid complex search in the 3D parameter space, we develop a spherical ESPRITlike algorithm, automatically estimating paired azimuth and elevation angles for multiple mixed source signals. Define μ = tan ϑe^{iφ} only containing the DOA information, where ϑ and φ are the elevation and azimuth angles of a source, respectively. According to the recursive relation for the associated Legendre polynomials of the adjacent three degrees (m  1, m, and m + 1) [38] and the spherical harmonics definition in (6), we get the following relation that the DOA estimation depends on
where {\mathit{\lambda}}_{\mathit{nm}}^{\pm}=\sqrt{\left(\mathit{n}\mp \mathit{m}\right)\left(\mathit{n}\pm \mathit{m}+1\right)}[42]. For a source whether in farfield or in nearfield, the recurrent relationship in (24) is independent of the corresponding mode strength b_{ n }(kR) or b_{ n }^{d}(kR, kr_{ d }). That is, the relation is still satisfied as follows:
or
For a fixed order n, we choose all rows from Y_{FN}(Φ) consisting of elements Y_{ n }^{m}, m =  n,  n + 1, …, n  2, to construct a (2n  1) × D matrix B_{ n }^{(1)}, select m =  n + 1,  n + 2, …, n  1 to construct the second (2n  1) × D matrix B_{ n }^{(0)}, and choose m =  n + 2,  n + 3, …, n to construct the third (2n  1) × D matrix B_{ n }^{(1)}. When the order n varies from 1 to N, {\mathbf{Y}}_{\mathrm{FN}}^{\left(1\right)}\left(\mathrm{\Phi}\right)={\left[\begin{array}{cccc}\hfill {\left({\mathbf{B}}_{1}^{\left(1\right)}\right)}^{\mathit{T}}\hfill & \hfill {\left({\mathbf{B}}_{2}^{\left(1\right)}\right)}^{\mathit{T}}\hfill & \hfill \cdots \hfill & \hfill {\left({\mathbf{B}}_{\mathit{N}}^{\left(1\right)}\right)}^{\mathit{T}}\hfill \end{array}\right]}^{\mathit{T}}\in {\mathit{C}}^{{\mathit{N}}^{2}\times \mathit{D}}, the second chosen N^{2} × D submatrix is {\mathbf{Y}}_{\mathrm{FN}}^{\left(0\right)}\left(\mathrm{\Phi}\right)={\left[\begin{array}{cccc}\hfill {\left({\mathbf{B}}_{1}^{\left(0\right)}\right)}^{\mathit{T}}\hfill & \hfill {\left({\mathbf{B}}_{2}^{\left(0\right)}\right)}^{\mathit{T}}\hfill & \hfill \cdots \hfill & \hfill {\left({\mathbf{B}}_{\mathit{N}}^{\left(0\right)}\right)}^{\mathit{T}}\hfill \end{array}\right]}^{\mathit{T}}, and the third submatrix is {\mathbf{Y}}_{\mathrm{FN}}^{\left(1\right)}\left(\mathrm{\Phi}\right)={\left[\begin{array}{cccc}\hfill {\left({\mathbf{B}}_{1}^{\left(1\right)}\right)}^{\mathit{T}}\hfill & \hfill {\left({\mathbf{B}}_{2}^{\left(1\right)}\right)}^{\mathit{T}}\hfill & \hfill \cdots \hfill & \hfill {\left({\mathbf{B}}_{\mathit{N}}^{\left(1\right)}\right)}^{\mathit{T}}\hfill \end{array}\right]}^{\mathit{T}}. To exploit the recursive relationship of the three submatrices {\mathbf{Y}}_{\mathrm{FN}}^{\left(1\right)}\left(\mathrm{\Phi}\right),{\mathbf{Y}}_{\mathrm{FN}}^{\left(0\right)}\left(\mathrm{\Phi}\right), and {\mathbf{Y}}_{\mathrm{FN}}^{\left(1\right)}\left(\mathrm{\Phi}\right) including all spherical harmonics up to order N, we define four diagonal matrices as follows:
where Θ contains the DOA information of all incident sources, Γ and Λ^{±} are the three N^{2} × N^{2} diagonal matrices. The recurrent relationship of the three submatrices {\mathbf{Y}}_{\mathrm{FN}}^{\left(\mathit{q}\right)}\left(\mathrm{\Phi}\right) (q = 1, 0, 1) is described as
We cannot solve (30) directly to estimate the DOAs because {\mathbf{Y}}_{\mathrm{FN}}^{\left(\mathit{q}\right)}\left(\mathrm{\Phi}\right) is unknown. The available data are the sensor observations. The covariance matrix R_{ nm }(k) of the transform coefficient vector x_{ nm }(k) can be constructed as
where R_{s}(k) = E[s(k)s^{H}(k)]. It can be estimated by the sample covariance matrix from the sensor samples. The eigenvalue decomposition (EVD) of R_{ nm } results in two orthogonal subspaces:
where {\mathbf{U}}_{\mathrm{s}}\in {\mathit{C}}^{{\left(\mathit{N}+1\right)}^{2}\times \mathit{D}} contains D eigenvectors spanning the signal subspace of R_{ nm }, and the diagonal matrix Σ_{s} contains the corresponding eigenvalues. Similarly, U_{v} denotes the noise subspace, and Σ_{v} is built from the remaining (N + 1)^{2}  D eigenvalues of R_{ nm }.
According to (31) and (32), U_{s} spans the same range as that of the mixed steering matrix Y_{FN}(Φ). Therefore, the signal subspace U_{s} can be transformed into the mixed steering matrix Y_{FN}(Φ), that is, U_{s} = Y_{FN}(Φ)T, where T is a unique nonsingular D × D matrix, called similarity transform matrix. Three submatrices {\mathbf{U}}_{\mathrm{s}}^{\left(\mathit{q}\right)} choosing from the signal subspace U_{s} satisfy the same recurrent relationship as
where ψ = T^{ 1}ΘT. We can rewrite this equation in block matrix form:
where \mathbf{E}=\left[{\mathbf{\Lambda}}^{}{\mathbf{U}}_{\mathrm{s}}^{\left(1\right)}\vdots {\mathbf{\Lambda}}^{+}{\mathbf{U}}_{\mathrm{s}}^{\left(1\right)}\right]\in {\mathit{C}}^{{\mathit{N}}^{2}\times 2\mathit{D}},\underset{\xaf}{\mathbf{\psi}}={\left[{\mathbf{\psi}}^{\mathit{T}}\vdots {\mathbf{\psi}}^{\mathit{H}}\right]}^{\mathit{T}}\in {\mathit{C}}^{2\mathit{D}\times \mathit{D}} has the block conjugate structure. Equation 34 has the following solution:
The elevation angle {\widehat{\mathit{\vartheta}}}_{\mathit{d}} and the azimuth angle {\widehat{\mathit{\phi}}}_{\mathit{d}} are easily estimated from the eigenvalues {\widehat{\mathit{\mu}}}_{1},{\widehat{\mathit{\mu}}}_{2},\dots ,{\widehat{\mathit{\mu}}}_{\mathit{D}} of either the top or the bottom D × D subblock of \underset{\xaf}{\mathbf{\psi}} as follows:
where d = 1, 2, …, D. When N^{2} < 2D, this procedure fails. Hence, the maximum number of sources that can be accurately estimated by this algorithm is D = ⌊N^{2}/2⌋, where the operation ⌊⋅⌋ is the flooring operation. The three classical spatial sampling schemes for a spherical microphone array are equiangular, Gaussian, and nearly uniform sampling schemes. For a spherical microphone array with a given order N, the equiangular sampling scheme requires 4(N + 1)^{2} sensors, the Gaussian sampling scheme demands 2(N + 1)^{2} sensors, while the uniform sampling scheme only needs (N + 1)^{2} sensors [43]. Therefore, when L sensors collect the information of acoustic sources, the dimension of the spherical harmonics space (N + 1)^{2} ≤ L, that is, the dimension of localization in the spherical harmonics domain is lower than that of the element space. The maximal number of sources that can be uniquely estimated is the nearest integer less than or equal to N^{2}/2.
3.2 Range estimation
The above DOA estimator can provide azimuth and elevation estimates of both farfield and nearfield sources. However, it cannot discriminate farfield or nearfield sources [37]. Only for nearfield sources, range information must be estimated. When the source is in the nearfield, the steering vector is dependent on the range in the spherical harmonics domain:
Therefore, the nearfield MUSIC spectrum [44] is
The nearfield search extent is r_{ d } ∈ (R, r_{N}). The range estimate of the d th source is obtained by

{\widehat{\mathit{r}}}_{\mathit{d}}=\underset{{\mathit{r}}_{\mathit{d}}}{max}\mathit{p}\left({\mathit{r}}_{\mathit{d}},{\widehat{\mathrm{\Phi}}}_{\mathit{d}}\right).(39)
The farfield steering vector in the spherical harmonics domain is independent of the source range:
The farfield MUSIC spectrum [44] is
With the DOA estimates {\widehat{\mathrm{\Phi}}}_{\mathit{d}}=\left({\widehat{\mathit{\vartheta}}}_{\mathit{d}},{\widehat{\mathit{\phi}}}_{\mathit{d}}\right) (d = 1, 2, …, D) in (36), the range estimator calculates the MUSIC spectra according to (38) and (41). For a DOA estimate {\widehat{\mathrm{\Phi}}}_{\mathit{d}}, we compare \mathit{p}\left({\widehat{\mathrm{\Phi}}}_{\mathit{d}}\right) and the peak of \mathit{p}\left({\mathit{r}}_{\mathit{d}},{\widehat{\mathrm{\Phi}}}_{\mathit{d}}\right); if the former is larger than the latter, the source is the farfield one. Otherwise, the source is the nearfield one and the estimated {\widehat{\mathit{r}}}_{\mathit{d}} in (39) automatically pairs with the DOA estimate {\widehat{\mathrm{\Phi}}}_{\mathit{d}}.
3.3 Algorithm summarization
The proposed twostage algorithm can be summarized as follows:
Step 1. Array signal modeling: Apply spherical harmonic representation and construct mixed field array signal model in the spherical harmonics domain in (23).
Step 2. DOA estimation: Perform the EVD of R_{ nm } in (32) and choose three submatrices from the signal subspace. Construct the recurrent relation of these submatrices in (33) and estimate DOA information for all sources in (36).
Step 3. Range estimation: Based on the estimated DOAs, compute the farfield MUSIC spectrum in (41) and search the nearfield MUSIC spectrum in (38) to discriminate farfield and nearfield sources and obtain the pairing ranges for the nearfield sources.
Remarks

1.
For computational complexity, we mainly consider the implementation of EVD and onedimensional (1D) MUSIC spectral search. In the spherical harmonics domain, the dimension of covariance matrix R _{ nm } is (N + 1)^{2} × (N + 1)^{2}. The computational complexity of the proposed localization algorithm includes (a) the eigendecomposition of R _{ nm }, of order \mathcal{O}\left({\left(\mathit{N}+1\right)}^{6}\right), and (b) 1D spectral search, of order \mathcal{O}\left({\left(\mathit{N}+1\right)}^{4}{\mathit{g}}_{\mathit{r}}\right), where g _{ r } is the search number conducted along the range axis [45].

2.
Whether the incident sources are farfield ones, nearfield ones, or their mixture, the spherical ESPRITlike algorithm in the first stage can estimate all DOAs. When all incident sources are farfield ones, only the DOA information is enough. If all sources locate in nearfield, 1D MUSIC spectral search can find out the pairing range parameters. When the mixed sources include farfield and nearfield, based on the nearfield spectral search, compute MUSIC spectra in (38) and (41) and distinguish the source being farfield or nearfield one.

3.
The proposed algorithm can localize mixed sources without parameter pairing and multidimensional search. In the first stage, the proposed algorithm estimates DOAs of mixed farfield and nearfield sources. In the second stage, the DOA estimates are used to compute the 1D MUSIC spectra according to (38) and (41). The spectral peak corresponds to the pairing ranges. Therefore, parameter pairing can be avoided in the proposed method.
4 Simulations
In this section, we conduct some simulations in freefield environment to evaluate the proposed localization algorithm for narrowband and wideband sources. A 32element uniform sampling [43] of spherical microphone array is chosen to estimate source locations. Its radius is assumed to be 10 cm. The highest spherical harmonics order is N = 4, so the nearfield extent of the array is (R, 0.64λ), where λ is the wavelength. The maximum number of sources that can be detected by the algorithm is D = N^{2} / 2 = 8. The DOA (azimuth and elevation) and range estimations are scaled in units of degree and wavelength (or meter), respectively. The performance of the localization estimation is measured by the rootmeansquare error (RMSE) of 1,000 independent Monte Carlo trials. In addition, the Bayesian CramerRao bound (CRB) provides a lower bound on the variance of any estimated parameter and defines the ultimate accuracy. The CRB analysis in [46] assumed all the sources were from farfield. The CRB analysis in [47] assumed the incident sources were all from nearfield. When both farfield and nearfield sources coexisted, the CRB analysis was provided in [6].
4.1 Narrowband source localization
The first simulation demonstrates the performance of the proposed algorithm in localizing farfield and nearfield sources. One farfield source and one nearfield source are located at (r_{1}, Φ_{1}) = (∞, 45°, 68°) and (r_{2}, Φ_{2}) = (0.29λ, 60°, 122°), respectively. The number of snapshots and SNR are fixed at 128 and 15 dB, respectively. Firstly, the azimuths and elevations of the two sources are estimated. The RMSEs of the azimuth and elevation estimations are 0.07, 0.22, 0.03, and 0.10, respectively. Based on the DOA estimations and suppose the two sources both located in the nearfield, MUSIC spectra for the two sources using (38) of the proposed algorithm are shown in Figure 2. Assume the two sources both located in the farfield, the MUSIC spectra of the two sources calculated by (41) are 12.39 and 4.73 dB, respectively. From the maximums of the MUSIC spectra, we can discriminate that one is the nearfield source and the other is the farfield source. The range of the nearfield source can be estimated from the MUSIC spectra. Therefore, the proposed algorithm can distinguish the nearfield and farfield sources and performs well in localizing them.

1.
RMSE versus SNR: The number of snapshots is set equal to 128. The two sources are localized in freefield and a rectangular room with the floor area 71 m^{2}, the ceiling height 3 m, and the reverberation time 0.7 s [48], respectively. The array is placed in the center of the rectangular room. When SNR varies from 0 to 30 dB, the RMSEs of the azimuths, elevations, and range estimations are shown in Figure 3. When SNR increases, the RMSEs of the azimuths, elevations, and range estimations decrease. The localization performance degrades in reverberant environment compared with that in freefield. When the two sources are incident from the same direction (45°, 122°), the RMSEs of azimuths, elevations, and range estimations versus SNR in freefield are shown in Figure 4. From the figure, we can see that the proposed algorithm can localize the farfield and nearfield source incident from the same direction. Moreover, the proposed algorithm has approximated estimation accuracy for the farfield and nearfield sources with respect to azimuths and elevations. The elevation angle estimation accuracy is higher than the azimuth estimation accuracy. This is because spherical arrays can provide a larger aperture in the elevation plane than that in the azimuth plane.

2.
RMSE versus snapshot: A crossed array placed in the XZ plane is used in the simulation. Each ULA branch of the array consists of 15 (M = 7) uniformly spaced omnidirectional sensors with the intersensor spacing R / M. First, the elevation angles are obtained based on the generalized ESPRIT method. Then, the range parameters are estimated using the root MUSIC method based on the elevation estimates [9]. Finally, a 1D MUSIC pseudospectrum is used to search the azimuth angles with the elevation and range estimates. SNR is fixed at 15 dB. When the number of snapshots varies from 100 to 1,100, the average performances of 1,000 Monte Carlo runs are shown in Figure 5. The RMSEs of the azimuth, elevation, and range estimations decrease as the number of snapshots increases. The estimation performance of the spherical array is better than that of the cross array. This may be due to the symmetrical structure of the spherical array in 3D space and the simultaneous estimation for azimuth and elevation angles of the proposed method. Both our proposed algorithm and the threestep estimation method in [9] are based on the eigendecomposition of the array covariance matrix. The estimate of the covariance matrix affects the localization performance. Vershynin proposed that the sample size Q ∼ (N + 1)^{2} suffices to estimate the covariance by the sample covariance matrix [49]. Therefore, the localization performance of the two methods gets more stable with the larger number of the snapshots. The CRBs decrease proportional to the number of the snapshots. When the number of snapshots increases, the CRBs get more stable too.

3.
RMSE versus angular gap: The snapshot number is set equal to 128. The SNR is fixed at 15 dB. When the direction of the farfield source is fixed, the azimuth and elevation of the nearfield source both vary from 5° to 30°. The RMSE of directions and range estimations are demonstrated in Figure 6. The DOA estimation of the farfield source and the range estimation of the nearfield source are insensitive to the angular gap. The azimuth estimation performance for the nearfield source of the proposed algorithm changes with the increase of the angular gap. The RMSE of the elevation estimation for the nearfield source gets slightly smaller.

4.
RMSE versus range: Let the number of snapshots and SNR be 128 and 10 dB, respectively. When the range of the nearfield source varies from 0.22λ to 0.58λ, the RMSEs of DOAs and range estimations are shown in Figure 7. From the results, we can see that both of DOAs and range estimations of the nearfield source are very sensitive to the varied range. The RMSEs of the azimuth, elevation, and range estimations for the nearfield source, which is closer to the spherical array, are smaller than those of the source which is farther to the array. However, the location estimation of the farfield source is insensitive to the varied range of the nearfield source.
The maximal number of sources which can be uniquely estimated by the proposed algorithm is 8. Four nearfield sources are located at (r_{1}, Φ_{1}) = (0.16λ, 102°, 45°), (r_{2}, Φ_{2}) = (0.29λ, 122°, 60°), (r_{3}, Φ_{3}) = (0.44λ, 60°, 10°), and (r_{4}, Φ_{4}) = (0.51λ, 155°, 75°), respectively. Four farfield sources are located at (r_{5}, Φ_{5}) = (∞, 40°, 15°), (r_{6}, Φ_{6}) = (∞, 168°, 40°), (r_{7}, Φ_{7}) = (∞, 10°, 4°), and (r_{8}, Φ_{8}) = (∞, 140°, 70°), respectively. When the SNR is 40 dB, the estimated RMSEs associated with the eight sources can be seen in Table 1. They are lower than 1° except for the azimuth of source 7.
In the second simulation, our proposed algorithm is used to localize pure farfield sources. Two sources are located at (r_{1}, Φ_{1}) = (∞, 60°, 122°) and (r_{2}, Φ_{2}) = (∞, 45°, 68°), respectively. The number of snapshots is set equal to 128. When SNR varies from 0 to 30 dB, the RMSEs of azimuths and elevations are shown in Figure 8. When SNR increases, the estimation performance of azimuth and elevation gets better.
We consider two sources which are located at (r_{1}, Φ_{1}) = (∞, 45°, 22°) and (r_{2}, Φ_{2}) = (∞, 45°, 22° + Δφ), with Δφ varying from 0° to 90°. When SNR varies from 0 to 40 dB, the RMSEs of azimuths and elevations are shown in Figure 9. When the SNR is low, the angular estimation RMSEs are larger for all the azimuth differences than those in high SNRs. Moreover, the azimuth difference of the two sources has more effect in low SNRs than high SNRs. In low SNRs, when the two sources are close, the algorithm estimates the azimuth and elevation with larger RMSEs. In high SNRs, the azimuth difference has slight effect on the localization performance. Therefore, the algorithm can estimate two close sources accurately in high SNRs. When the second source has the same azimuth with the first one, its elevation is 45° + Δϑ, with Δϑ varying from 0° to 25°. The localization performance is shown in Figure 10. For different SNRs, the localization performance with the varying elevation is similar to that with varying azimuth. However, when the elevation of the second source is close to 90° in low SNRs, the estimation errors get larger because the elevation estimation is determined by the tangent function.
In the third simulation, the proposed method is adopted to localize pure nearfield sources. Two nearfield sources are located at (r_{1}, Φ_{1}) = (0.22λ, 45°, 68°) and (r_{2}, Φ_{2}) = (0.44λ, 60°, 122°), respectively. When the ranges of the sources are within the nearfield extent of the array, the RMSEs of the azimuths, elevations, and ranges are shown in Figure 11. From Figure 11, it can be seen that the RMSEs of the azimuths, elevations, and ranges for the first source, which is closer to the spherical array, are smaller than those of the second source. When the range of the second source varies from 0.2λ to 0.5λ, the angular and range estimations are shown in Figure 12. The angular and range estimation errors of the source closer to the array are smaller than those of the other source. When the range of the second source is closer to the boundary of the nearfield extent, the azimuth, elevation, and range estimation errors get larger for almost all SNRs from 0 to 30 dB.
4.2 Wideband source localization
The proposed algorithm can localize multiple wideband mixed field sources. A female signal and a male signal randomly chosen from TIMIT database [50] are incident from the direction (45°, 68°) and (60°, 122°). The male signal source is located in the farfield of the array. The female signal source moves to the nearfield of the array with range r = 0.2 m. The signal sampling frequency is 16 kHz with 16 bits per sample. The observed data are decomposed into frequency bins using a shorttime Fourier transform (STFT) of length 512 with a rectangular window. For each frequency bin, the sample covariance matrix R_{ nm }(k) is estimated with Q frequency snapshots as follows:
where {\widehat{\mathbf{x}}}_{\mathit{nm}}\left(\mathit{k},\mathit{q}\right) is the q th snapshot of x_{ nm }(k). The operating frequency bandwidth is limited by aliasing at the higher frequencies and measurement errors at the lower frequencies [43]. Therefore, the frequency bins we used are limited to around 2 ≤ kr ≤ N due to errors and spatial aliasing. The SNR is 10 dB. When the value of kr is within the extent (2, 4), the RMSEs of azimuth and elevation for two sources are smaller. Therefore, we choose the frequency bins satisfying the constraint 2 ≤ kr ≤ N to localize wideband sources.
The wideband source localization results for different SNRs are denoted in Figure 13. The nearfield source angular estimates are more accurate than those of the farfield source. The length of the observed signals affects the calculation of the sample covariance matrix in (42) [51]. When the size of the available data is small, the number of snapshots Q for each frequency bin is small and estimating the covariance matrix correctly is harder. When the signal size is large, the correct sample covariance matrix estimation leads to better localization performance shown in Figure 14. The SNR is 10 dB. The RMSE changes of the nearfield range estimation are consistent with those of azimuth and elevation estimates, because the range estimate is based on the angular estimates.
5 Conclusions
In this paper, we developed a twostage source localization algorithm jointly estimating elevation, azimuth angles, and range for the mixed farfield and nearfield sources using spherical array. In the first stage, the 3D localization algorithm estimated the DOAs of all mixed sources. In the second stage, 1D MUSIC method was used to distinguish farfield and nearfield sources and provided the ranges of the nearfield sources based on the estimated DOAs. The algorithm had good performance for azimuth, elevation, and range estimations. It had low computational cost because it avoided multidimensional search and did not require parameter pairing procedure. The estimation performance of the farfield sources was not sensitive to the varied range of the nearfield sources. However, the RMSEs of the azimuths, elevations, and ranges for the nearfield source, which was closer to the spherical array, were smaller than those of the source which was farther to the array. Spherical array had better performance for elevation estimation due to its larger aperture in the elevation plane than azimuth estimation. In our future work, we will develop range estimation algorithm without 1D search and incorporate a reverberated signal model to localize multiple sources in reverberant environment.
References
Krim H, Viberg M: Two decades of array signal processing research: the parametric approach. IEEE Signal Process Mag 1996, 13(4):6794. 10.1109/79.526899
Chen JC, Yao K, Hudson RE: Acoustic source localization and beamforming: theory and practice. EURASIP J Appl Signal Process 2003, 4: 359370.
Liang L, Liu D: Passive localization of nearfield sources using cumulant. IEEE Sensors J 2009, 9(8):953960.
Wang B, Liu J, Sun X: Mixed sources localization based on sparse signal reconstruction. IEEE Signal Process Lett 2012, 19(8):487490.
Liang L, Liu D: Passive localization of mixed nearfield and farfield sources using twostage MUSIC algorithm. IEEE Trans Signal Process 2010, 58(1):108120.
He J, Swamy MNS, Ahmad MO: Efficient application of MUSIC algorithm under the coexistence of farfield and nearfield sources. IEEE Trans Signal Process 2012, 60(4):20662070.
Wang B, Zhao Y, Liu J: Mixedorder MUSIC algorithm for localization of farfield and nearfield sources. IEEE Signal Process Lett 2013, 20(4):311314.
Jiang J, Duan F, Chen J, Li Y, Hua X: Mixed nearfield and farfield sources localization using the uniform linear sensor array. IEEE Sensors J 2013, 13(8):31363143.
Jiang J, Duan F, Chen J: Threedimensional localization algorithm for mixed nearfield and farfield sources based on ESPRIT and MUSIC method. Prog Electromagnetics Res 2013, 136: 435456.
Mathews CP, Zoltowski MD: Eigenstructure techniques for 2D angle estimation with uniform circular arrays. IEEE Trans Signal Process 1994, 42(9):23952407. 10.1109/78.317861
Pavlidi D, Griffin A, Puigt M, Mouchtaris A: Realtime multiple sound source localization and counting using a circular microphone array. IEEE Trans Audio Speech Lang Process 2013, 21(10):21932206.
Sommerkorn G, Hampicke D, Klukas R, Richter A, Schneider A, Thomä R: Uniform rectangular antenna array design and calibration issues for 2D ESPRIT application. In The 4th European Personal Mobile Communications Conference, Vienna. Morgan Kaufman, San Francisco; 2001:18.
Ioannides P, Balanis CA: Uniform circular and rectangular arrays for adaptive beamforming applications. IEEE Antennas Wireless Propagation Lett 2005, 4: 351354. 10.1109/LAWP.2005.857039
Smaragdis P, Boufounos P: Position and trajectory learning for microphone arrays. IEEE Trans Audio Speech Lang Process 2007, 15(1):358368.
Costa M, Koivunen V, Richter A: Low complexity azimuth and elevation estimation for arbitrary array configurations. In IEEE Int. Conf. Acoust., Speech, Signal Process, Taipei. IEEE, Piscataway; 2009:21852188.
Belloni F, Richter A, Koivunen V: D. O. A. Estimation, via manifold separation for arbitrary array structures. IEEE. Trans. Signal. Process. 2007, 55(10):48004810.
Filik T, Tuncer TE: A fast and automatically paired 2dimensional directionofarrival estimation using arbitrary array geometry. In IEEE 17th Signal Process and Communications Applications Conference, Antalya. IEEE, Piscataway; 2009:556559.
Park M, Rafaely B: Soundfield analysis by planewave decomposition using spherical microphone array. J Acoust Soc Am 2005, 118(5):30943103. 10.1121/1.2063108
Yan S, Sun H, Svensson UP, Ma X, Hovem JM: Optimal modal beamforming for spherical microphone arrays. IEEE Trans Audio Speech Lang Process 2012, 19(2):361371.
Khaykin D, Rafaely B: Acoustic analysis by spherical microphone array processing of room impulse response. J Acoust Soc Am 2012, 132(1):261270. 10.1121/1.4726012
Teutsch H: Modal Array Signal Processing – Principles and Applications of Acoustic Wavefield Decomposition. SpringerVerlag, Heidelberg; 2007:134146.
Cohen I, Benesty J, Gannot S: Speech Processing in Modern Communication. SpringerVerlag, Heidelberg; 2010:281.
Goossens R, Rogier H: Unitary spherical ESPRIT: 2D angle estimation with spherical arrays for scalar fields. IET Signal Process 2009, 3(2):221231.
Sun H, Kellermann W, Mabande E: Robust localization of multiple sources in reverberant environments using EBESPRIT with spherical microphone arrays. In IEEE Int. Conf. Acoust., Speech, Signal Process, Prague. IEEE, Piscataway; 2011:117120.
Argentieri S, Danes P: Broadband variations of the MUSIC highresolution method for sound source localization robotics. In IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego. IEEE, Piscataway; 2007:20092014.
Sun H, Mabande E, Kowalczyk K, Kellermann W: Localization of distinct reflections in rooms using spherical microphone array eigenbeam processing. J Acoust Soc Am 2012, 131(4):28282840. 10.1121/1.3688476
Wu PKT, Epain N, Jin C: A superresolution beamforming algorithm for spherical microphone arrays using a compressive sensing approach. In Conf. Acoust., Speech, Signal Process, Vancouver. IEEE, Piscataway; 2013:649653.
Sawada H, Mukai R, Araki S, Malcino S: Multiple source localization using independent component analysis. In IEEE Antennas and Propagation Society International Symposium, Washington DC. IEEE, Piscataway; 2005:8184.
Epain N, Jin C: Independent component analysis using spherical microphone arrays. Acta Acustica United Acustica 2012, 98(1):91102. 10.3813/AAA.918495
Noohi T, Epain N, Jin C: Direction of arrival estimation for spherical microphone arrays by combination of independent component analysis and sparse recovery. In Conf. Acoust., Speech, Signal Process, Vancouver. IEEE, Piscataway; 2013:346349.
Malioutov D, Cetin M, Willsky AS: A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans Signal Process 2005, 53(8):30103022.
Wei X, Yuan Y, Ling Q: DOA estimation using a greedy block coordinate descent algorithm. IEEE Trans Signal Process 2012, 60(4):20662070.
Swartling M, Sallberg B, Grbic N: Source localization for multiple speech sources using low complexity nonparametric source separation and clustering. Signal Process 2011, 91: 17811788. 10.1016/j.sigpro.2011.02.002
Meyer J, Elko GW: Position independent closetalking microphone. Signal Process 2006, 86(6):12541259. 10.1016/j.sigpro.2005.05.036
Fisher E, Rafaely B: The nearfield spherical microphone array. In IEEE Int. Conf. Acoust., Speech, Signal Process, Las Vegas. IEEE, Piscataway; 2008:52725275.
Fisher E, Rafaely B: Nearfield spherical microphone array processing with radial filtering. IEEE Trans Audio Speech Lang Process 2011, 19(2):256265.
Huang Q, Song T: DOA estimation of mixed nearfield and farfield sources using spherical array. In The 11th Int. Conf. on Signal Process, Beijing. IEEE, Piscataway; 2012:382385.
Williams EG: Fourier Acoustics: Sound Radiation and Nearfield Acoustical Holography. Academic, New York; 1999:183.
Rafaely B, Weiss B, Bachmat E: Spatial aliasing in spherical microphone arrays. IEEE Trans Signal Process 2007, 55(3):10031010.
Meyer J, Elko G: A highly scalable spherical microphone array based on an orthonormal decomposition of the soundfield. In IEEE Int. Conf. Acoust., Speech, Signal Process, Orlando. IEEE, Piscataway; 2002:II1781II1784.
Abhayapala TD, Ward DB: Theory and design of high order sound field microphones using spherical microphone array. In IEEE Int. Conf. Acoust., Speech, Signal Process Orlando. IEEE, Piscataway; 2002:II1949II1952.
Teutsch H, Kellermann W: Detection and localization of multiple wideband acoustic sources based on wavefield decomposition using spherical apertures. In IEEE Int Conf Acoust, Speech, Signal Process, Las Vegas. IEEE, Piscataway; 2008:52765279.
Rafaely B: Analysis and design of spherical microphone arrays. IEEE Trans Speech Audio Process 2005, 13(1):135143.
Schimidt RO: Multiple emitter location and signal parameter estimation. IEEE Trans Antennas Propag 1986, AP34: 276280.
Wang Y, Chen J, Fang W: TSTMUSIC for joint DOAdelay estimation. IEEE Trans Signal Process 2001, 49(4):721729. 10.1109/78.912916
Stoica P, Larsson EG, Gershman AB: The stochastic CRB for array processing: a textbook derivation. IEEE Signal Process Lett 2001, 8(5):148150.
El Korso MN, Boyer R, Renaux A, Marcos S: Conditional and unconditional Cramér–Rao bounds for nearfield source localization. IEEE Trans Signal Process 2010, 58(5):29012907.
Gardner B: A realtime multichannel room simulator. J Acoust Soc Am 1992, 92(4):2395.
Vershynin R: How close is the sample covariance matrix to the actual covariance matrix? J Theor Probabil 2012, 25: 655686. 10.1007/s109590100338z
Lamel LF, Kasel RH, Senneff S: Speech database development: design and analysis of the acousticphonetic corpus. In Proc. of the DARPA Speech Recognition Workshop. IET, Glasgow; 1986:100109.
Puigt M, Vincent E, Deville Y: Validity of the independence assumption for the separation of instantaneous and convolutive mixtures of speech and music sources. In Int. Conf. ICA, LNCS, Brazil. Springer, New York; 2009:613620.
Acknowledgements
The authors would like to thank the editor and anonymous reviewers for their valuable comments. The work was supported by the National Natural Science Foundation (61001160), Innovation Program of Shanghai Municipal Education Commission (12YZ023), and Visiting Scholar Funding of Shanghai Municipal Education Commission of China.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Huang, Q., Wang, T. Acoustic source localization in mixed field using spherical microphone arrays. EURASIP J. Adv. Signal Process. 2014, 90 (2014). https://doi.org/10.1186/16876180201490
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/16876180201490