This paper addresses the problem of tracking a moving source, e.g., a robot, equipped with both receivers and a source, that is tracking its own location and simultaneously estimating the locations of multiple plane reflectors. We assume a noisy knowledge of the robot’s movement. We formulate this problem, which is also known as simultaneous localization and mapping (SLAM), as a hybrid estimation problem. We derive the extended Kalman filter (EKF) for both tracking the robot’s own location and estimating the room geometry. Since the EKF employs linearization at every step, we incorporate a regulated kinematic model, which facilitates a successful tracking. In addition, we consider the echo-labeling problem as solved and beyond the scope of this paper. We then develop the hybrid Cramér-Rao lower bound on the estimation accuracy of both the localization and mapping parameters. The algorithm is evaluated with respect to the bound via simulations, which shows that the EKF approaches the hybrid Cramér-Rao bound (CRB) (HCRB) as the number of observation increases. This result implies that for the examples tested in simulation, the HCRB is an asymptotically tight bound and that the EKF is an optimal estimator. Whether this property is true in general remains an open question.

1 Introduction

Consider a source that moves in an unknown area while simultaneously mapping the environment and tracking its own location, a problem known as SLAM. As a concrete example, we use a moving robot that emits sound in a room enclosure. Nevertheless, the analysis here applies to other similar scenarios, e.g., drone navigation [1], vehicles in urban areas [2], or marine vessels [3, 4] in ports. This problem is difficult since the robot must localize itself with respect to an incomplete map. The general SLAM problem has been the subject of substantial research with many different applications [5].

As implied by its name, SLAM consists of two components, localization and mapping. Source localization is a well-studied problem in acoustic signal processing. Several energy-based localization techniques exist [6, 7]. A common approach utilizes an array of microphones and their TDOA [8, 9]. Another technique exploits both the spatial and temporal information to localize and track a moving speaker [10]. Additional approaches and techniques are reviewed in [11]. In the case of mapping, a general solution for room geometry estimation is based on spatial room-impulse-responses and the image source model [12]. In addition, echo labeling is used to map the shape of polyhedral rooms [13, 14]. The latter treat the acoustic echoes as nodes in a graph.

While the mapping and the localization sub-problems can be considered well understood when separately treated, the combined SLAM problem is much more challenging. The reason is that localization and mapping depend on each other and are executed simultaneously in an intertwined way. The most common philosophy in addressing SLAM is the Bayesian theoretical-based approach [5]. The reader is referred to the review [15] and references therein for further study on the subject. Notable is the work in [16], which employs particle filtering and uses a moving microphone-array. Kalman-based approaches are in wide use for the tracking and mapping. These approaches provide a recursive solution to the localization problem and consistent estimates for mapping landmarks. This is done using statistical models for the speaker motion and the environment’s features [17]. A Fast-SLAM algorithm that integrates the Rao-Blackwell particle filter and the Kalman filter is presented in [18].

We consider a robot that emits an acoustic signal and is equipped with one microphone. While moving in a four-wall room, the robot tracks its own location and simultaneously estimates the room geometry (namely, mapping) using the acoustic echoes. For this problem, some solutions exist. Notable is [19] which estimates the shape of a room as well as determines the location and geometry of an array of microphones using Euclidean distance and matrix completion algorithms. In [20], a Bayesian filtering method is applied for movement tracking and room shape estimation. Most algorithms are based on particle filtering, which is computationally intense and may suffer from convergence problems.

The paper is organized as follows. In Section 2, we summarize the main methodology of the paper and briefly describe the experimental study. The problem is formulated in Section 3, which includes the robot’s kinematic model, the observation model, and the unknown parameters. In Section 4, we present an online SLAM algorithm, and in Section 5, we derive the HCRB (cf. Proposition 1). Simulation results of the proposed algorithm with respect to the derived HCRB are presented in Section 6. Summary and conclusions are given in Section 7.

2 Methods

In this paper, we address the problem of self-localizing a source and, simultaneously, inferring the room geometry using a single microphone and a single receiver that are mounted on a moving platform, e.g., a robot. The robot moves around the room according to given movement commands. The estimation is based on noisy distance measurements, corresponding to the first-order echoes, taken at finite set of points along the trajectory. While the movement commands are known, the overall robot trajectory is unknown. This follows from the fact that room geometry is unknown (and therefore, the robot position with respect to the room). In addition, we assume that the robot movement inaccurate w.r.t. the known movement commands. This can occur due to imperfect execution of the robot commands. In general, if nothing is known about the trajectory, this problem is not identifiable for some important classes of room geometries, i.e., parallelogram room-geometry cannot be uniquely determined from the distances between walls and trajectory points (cf. [21, 22], and for a comprehensive analysis, see [23]). Nevertheless, the robot movement commands provide partial information about the robot trajectory. We use these movement commands, as well as the distance measurements from each wall, to simultaneously map the room and localize the robot with respect to it. In contrast to particle filtering, we adopt a more analytic, computationally efficient approach, the EKF, which simultaneously tracks the robot moment and estimates the room geometry.

We formulate the problem as a hybrid estimation task that includes both random and deterministic unknown parameters. This combines both the classic and the Bayesian estimation philosophies, an approach that is very suitable for the problem at hand in which the room’s features are considered static and deterministic, while the robot movement as a random state-space process. To apply the Bayesian EKF approach to our hybrid tracking-mapping estimation problem, we carefully incorporate randomness into the deterministic room parameters (cf. Section 4).

Moreover, we derive the HCRB, which is an important tool in estimation theory [24–29], for the SLAM problem at hand. We provide a recursive expression for the HCRB (see [30] for an overview of recursive CRB-type bounds) which constitutes, at each time instance, a lower bound on the estimation error of the current location and of the room geometry. The advantage of the HCRB is that it is valid for both random and deterministic parameters. A related bound, the Bayesian CRB (BCRB), was derived for a different SLAM problem in [21]. While the BCRB is a valid bound on the random parameters^{Footnote 1} (e.g., receiver location), it is generally not useful with respect to the deterministic parameters in hybrid setups. To apply the BCRB, one must transform the hybrid estimation problem into a pure Bayesian one. This can be done by incorporating some prior distribution to the deterministic parameters, which makes the BCRB susceptible to mismodelling. It is also possible to treat the deterministic parameters as random with zero variance. However, this will violate the BCRB regularity conditions [30]. A different approach would be to discard the prior distribution on the random parameters, thus transforming the problem into a pure classic one, i.e., an estimation problem where all unknown parameters are deterministic unknown. The drawback is that by discarding the prior distribution, one loses the useful information that is embedded in the prior. Moreover, it may render the estimation problem ill-posed (see, e.g., [24]). In this paper, we consider the HCRB since it is more suitable for hybrid estimation problems (see the discussion about the HCRB and other CRB-type bounds in [25], Sect. I-A and Sect. III).

Finally, we evaluate the performance of the proposed algorithm by simulation results examination and by comparing them with the obtained HCRB. We demonstrate a very good match between the Monte-Carlo simulation and the analytical bound, i.e., we show in simulation that EKF approaches the HCRB.

Notation: Boldface lowercase letters denote vectors. We use boldface uppercase letters, e.g., F,C,J, to denote matrices. All vectors are column vectors and (·)^{⊤} denotes the transpose operation. Let g(x) be scalar function of a vector x, then, ∂g(x)/∂x denotes the gradient operation which is organized as a row vector. In the case where g(x) is a vector, ∂g(x)/∂x is the Jacobian matrix of g(x). Let A be a matrix, then, [A]_{ij} denotes its i,jth entry. The operator spectral norm operation is denoted by ∥·∥. Let \(\mathbf {A}, \mathbf {B}\in \mathbb {R}^{n\times m}\), then blockdiag(A,B) is the block-diagonal matrix whose upper-left and lower-right n×m blocks are equal to A and B, respectively.

3 Problem formulation

Consider a single omni-directional acoustic source and a microphone, both mounted on a robot at the same point on the x-y plane, but at different heights, i.e., with different z-coordinates. A fully-exciting signal that simplifies the subsequent correlation operation would be better than a speech signal for echo-labeling.

The robot moves in a four-wall, closed-room according to a series of movement commands. In this work, we consider 2D environments. An extension to 3D environments is possible, and it involves a characterization of the third dimension. Such an extension can be made by adding characterization equations for additional walls.

Each command instructs the robot to take a single step of a given size and an orientation angle that indicate the direction and length of the movement as depicted in Fig. 1 and further discussed in the sequel. In each step, the robot takes a measurement (a full description is given in Section 3.2) and simultaneously estimates its location and the room geometry. The robot’s location is denoted by

where x_{k} and y_{k} denote the x-coordinate and the y-coordinate, respectively, at step k. The coordinate system is determined by the robot’s first step, i.e., the positive direction of the x-axis coincides with the first step. In this paper, each time instant corresponds to a step; thus, we use the terms time-instant k and step k, interchangeably. The room shape is determined by the parameter vector

The latter denotes the robot’s movement command, where μ_{k} is the step size and ϕ_{k} is the orientation angle with respect to the positive edge of the horizontal axis, i.e., the angle of the robot first step is set as the zero-angle with respect to the x-axis of the coordinate system. The robot movement is depicted in Fig. 1. The noise w_{k} is zero-mean Gaussian signal with covariance matrix \(\mathbf {C}_{w} =\sigma _{w}^{2}\mathbf {I}_{2\times 2}\), which expresses inaccuracies in the execution of the robot’s movement commands. As a result, the exact location of the robot is an unknown parameter, to be estimated as part of the estimation problem at hand.

Few comments are in place regarding the matrix F. If it had not been for mathematical regularity conditions, the robot location would have better been characterized by

In words, the location at time k+1 would have been determined by the previous location and by the step taken at time k. This, however, induces an unstable kinematic model, where the noise variance is accumulated and diverges to infinity as k increases. While such models are generally tractable using the Kalman filter [31, 32], it may be problematic for the EKF [33] due to its embedded linearization. If the robot movement is not regulated, the linear approximation may be too crude and induce an unstable kinematic model that diverges the filter. Therefore, we incorporate the matrix F, whose sole purpose is to regulate the robot movement by enforcing stability on the model. It can be shown that the model is stable for ρ<1. However, to make this model realistic, ρ should be close to “1” and accordingly, F to be close to the identity matrix.

We conclude with defining the evolution of the unknown deterministic parameter θ. Since the room features (wall angles and offsets) are constant, we use the following static propagation model:

The observation at each time instant, k, is a vector z_{k}=[z_{1,k}⋯z_{4,k}]^{T} which measures the distance between the robot and each of the four walls, as shown in Fig. 2. It can be verified by straightforward geometrical considerations that the distance is related to the robot’s location and to the room’s features by the following nonlinear equation:

Note that z_{i,k}∀i∈1,…,4 is an estimate of the distance between the robot and wall i at time k, as we explain in the sequel. Moreover, h(x_{k},θ) is a nonlinear function that expresses the true distance of the robot from each wall in terms of the robot true coordinates. The vector v_{k} is a white Gaussian noise with covariance matrix \(\mathbf {C}_{v}=\sigma _{v}^{2}\mathbf {I}_{4\times 4}\), which depicts the measurement error induced by an imperfect distance estimation. The Gaussian assumption is reasonable in cases where the measurement error is small enough compared to the actual distance [10].

Definition 1

The SLAM problem, considered in this paper, is to jointly estimate θ and x_{k}, for each k, based on the state-space models given in (3), (7) and the measurement model (8).

From a practical point of view, the measurement model in (8) needs some prior processing. The TOA which corresponds to the distance between the robot and a wall [34, 35] must be estimated.

Let τ_{ik} be the signal propagation time from the robot to the wall and back to the robot. The latter is related to the distance by

where c is the speed of sound (approximately 340 m/s in air), and d_{ik} is the distance between the robot and wall i at time k. Furthermore, in addition to estimating τ_{i,k}, it is necessary to identify which echo corresponds to each wall. This problem is known as echo labeling. The reader is referred to [14, 20, 36, 37] for further insights. Because echo labeling is beyond the scope of this paper we assume, for simplicity, that the echoes are perfectly labeled^{Footnote 2}. Finally, we would like to stress that the loudspeaker and the microphone are co-located only in their 2D-projection into the x-y plain. In Section 6.2, we place them in different heights, i.e., the source-receiver location differs only in their z-coordinate, such that they remain co-located in the x-y plane.

4 Simultaneous localization and mapping: algorithm

This section presents a recursive algorithm based on the EKF [5] for the SLAM problem given in Definition 1. Note that this is not a Bayesian estimation problem, but rather a hybrid one. To estimate the deterministic parameter θ_{k} in (7) using the EKF, which is Bayesian, we incorporate randomness into θ_{k}. This, in addition, enables a more “lively” tracking results [38] that cannot be achieved by the algorithm initialization (see, e.g., [5]). Accordingly, we modify (7) as follows

where m_{k} is white Gaussian noise with covariance matrix R_{k}. The introduction of m_{k} in (10), as well as the initialization \(\hat {{{\boldmath {\theta }}}}_{0}\), incorporate an inaccurate prior probabilistic knowledge about θ. To avoid mismodelling, and guarantee that \(\hat {{{\boldmath {\theta }}}}_{0}\) is forgotten rapidly, the matrix R_{k} must be “large” at k=0 and ∥R_{k}∥ should decreases to zero as k increases.

The parameters λ_{1},λ_{2} act as decay coefficients that correspond to \(\{\alpha _{i}\}_{i=1}^{4}\) and \(\{b_{i}\}_{i=1}^{4}\), respectively, and determine the rate at which the corresponding driving noise decreases. After substituting θ with θ_{k} in (7), we are left with the nonlinear state-space model given by (3), (8), and (10), where the nonlinearity is due to (8). Similar to [5], it is possible to develop the EKF, which linearizes the measurement model and transition matrices [31, 39]. The algorithm consists of two steps for each iteration: prediction and update. In the prediction stage, the robot location is predicted according to (3) as follows

As discussed above, C_{w} is the covariance matrix w.r.t. the localization parameters and R_{k} is the randomness inserted into the mapping parameters.

In the update stage, the algorithm estimates the location and the room parameters as

where ∇h denotes the Jacobian of h at \(\hat {\mathbf {x}}_{k|k-1}\) and \(\hat {\boldsymbol {\theta }}_{k|k-1}\). In addition, the covariance matrix is updated considering the EKF extension [5]

The complete EKF algorithm is summarized in Algorithm 1.

5 The hybrid Cramér-Rao lower bound

In this section, we derive the HCRB for the SLAM problem of Definition 1. The objective is to bound the estimation error of x_{k} and θ from the measurements z_{k} for each step k, while treating the previous locations as a nuisance parameter. We denote the overall unknown parameter vector at time k by \({{\boldsymbol {\psi }}_{k}}=\left [\mathbf {x}_{0}^{\top },...,\mathbf {x}_{k}^{\top },\boldsymbol { \theta }^{\top }\right ]^{\top }\) and collect the set of all robot’s locations up to time k into \(\mathbf { x}^{k}=\left [\mathbf {x}_{0}^{\top },\mathbf {x}_{1}^{\top },...,\mathbf {x}_{k}^{\top }\right ]^{\top } \). Similarly, \(\mathbf {z}^{k}=\left [\mathbf {z}_{0}^{\top },\mathbf {z}_{1}^{\top },...,\mathbf { z}_{k}^{\top } \right ]^{\top } \) comprises all the measurements up to time k. Since x_{k} is random (3) while θ is deterministic, the estimation of ψ_{k} is a hybrid estimation problem, i.e., a combination of classic and Bayesian estimation philosophies [26, 30, 40]. The HCRB lower bound [24] asserts that for every unbiased estimator \(\hat {{\boldmath {\psi }}}_{k}(\mathbf {z}_{k}) \), i.e.,

where the expectation \({\mathbb E}\) is taken w.r.t. the joint probability density function (p.d.f.) of (z^{k},x^{k}), i.e. p(z^{k},x^{k};θ), where we used semicolon to distinguish between the deterministic vector θ, and the random vectors z^{k} and x^{k}. Note that \(\hat {{\boldmath {\psi }}}_{k}\left (\mathbf {z}_{k}\right)\) is a function of the measurements z_{k}. We omit the explicit dependency for brevity. The matrix

is known as the hybrid Fisher information matrix (HFIM). In our case, the dimension of the full HFIM, J_{k}, is \({\mathbb {R}}^{k+8\times k+8}\). Since we focus on performance evaluation of online algorithms, for each current time step k, we are interested in \(\left [ \mathbf {J}_{k}^{-1}\right ]^{\mathbf {x}_{k},\boldsymbol {\theta }}\), which is the lower-right 10×10 block of \(\mathbf {J}_{k}^{-1}\) that corresponds to the sub-vector \(\left [{\mathbf {x}_{k}^{\top },\boldsymbol {\theta }^{\top }} \right ]^{\top } \) of ψ_{k}. The following proposition presents a recursive expression for \(\left [ \mathbf {J}_{k}^{-1}\right ]^{\mathbf {x}_{k},\boldsymbol {\theta }}\) as a function of \(\left [ \mathbf {J}_{k-1}^{-1}\right ]^{\mathbf {x}_{k-1},\boldsymbol {\theta }}\).

The following comment about assumption (22) is in place. Consider first the deterministic parameters; we note that in the finite sample regime, there are useful estimators that do not satisfy (22), e.g., the maximum likelihood (ML) in the pure classic case and the joint ML/MAP in the hybrid case (see [25]). Nevertheless, in the large-sample regime, an estimator that does not satisfy (22) concerning the deterministic static parameters is not consistent. Since inconsistent estimators are useless in the large-sample regime for these parameters, the assumption in (22) is not restricting.^{Footnote 3} Now to the random parameters. If there is a bias, it must be a known-function of the deterministic parameters. Thus, if the estimate of the deterministic parameters is consistent, the bias becomes known in the large-sample regime, and therefore, could and should be subtracted from the estimator. This operation yields an asymptotically unbiased estimate.

Before presenting the bound, the following definitions are required. Let \(\left [\mathbf {J}_{k}^{-1}\right ]^{\mathbf {x}_{k},\boldsymbol {\theta }}\) be the lower-right 10×10 block of \(\mathbf {J}_{k}^{-1}\) defined in (24) and let

To calculate the HCRB, we will use the following recursive formula for \(\tilde {\mathbf {J}}_{k}^{\mathbf {x}_{k},\boldsymbol {\theta }}\), that was derived [41] for a general state-space model with additive Gaussian noise

Before calculating the HCRB, we note that \(\tilde {\mathbf {J}}_{k}\) defined in (25), henceforth dubbed the sub-hybrid Fisher information matrix (SHFIM), is not the actual HFIM. The full HFIM is a matrix which inverse is the complete HCRB on the entire parameter vector, while the inverse of \(\tilde {\mathbf {J}}_{k}\) in (25) is the sub-block of the HCRB that corresponds to \(\left [\mathbf {x}_{k}^{\top },{{\boldmath {\theta }}}^{\top }\right ]^{\top }\). Also note that the expression in (26) is recursive, where \(\tilde {\mathbf {J}}_{k}\) is expressed in terms of \(\tilde {\mathbf {J}}_{k-1}\). Thus, given \(\tilde {\mathbf {J}}_{k-1}\), it is sufficient to calculate the innovations \(\left \{\mathbf {D}_{k-1}^{ij}\right \}^{3}_{i,j=1}\), according to (27) in order to obtain the updated value of \(\tilde {\mathbf {J}}_{k}\). A road map and the proof summary is given in Table 1.

Proposition 1

The components of \(\tilde {\mathbf {J}}_{k}\) (cf. (25)-(26b)), are given by

for i=1,…,4 on top of each other where sign(x)=1 if x≥0. The second column is similarly obtained by replacing \(\left [\mathbf {D}^{23}_{k-1}\right ]_{i1}\) with

We would like to show that for the problem at hand (cf. Definition 1), (27a)-(27f) lead to \(\mathbf {D}^{ij}_{k}, i,j\in \{1,2,3\}\) given in Proposition 1. We begin by noting that our problem fits the general structure of the filtering problem [29] in the sense that both are hybrid with a linear state-space kinematic model and a nonlinear measurement model with additive Gaussian noise. This establishes (28c) and, in addition, it implies

Next, from (40a)–(40b), it follows that to derive \(\mathbf {D}_{k-1}^{ij}\) for i,j∈{2,3}, it is necessary to derive the Jacobian of h(x_{k},θ). For our kinematic and measurement model (3),(8), the Jacobian component with respect to x_{k} is given by

where ζ_{i} and χ_{i} are defined in (36) and (37), respectively. It follows that (46), (47), and (48) establish (35) thus establishing (34). □

6 Simulation results and discussion

In this section, we present numerical results of two simulation setups. In the first setup (Section 6.1), the signal is generated via pseudo-random sequences according to (3),(4), and (8), and the proposed EKF algorithm is compared to the HCRB. It will be shown that the proposed algorithm approaches the bound. Therefore, it is not compared to any additional estimation method. In the second setup (Section 6.2), the EKF algorithm is evaluated using audio chirp signals that were convoluted with a simulated room impulse response (RIR) using [42]. The result obtained in the second setup is not compared to the HCRB since the measurements do not suit the model under which the HCRB was developed. Explicitly, the HCRB was derived for noisy observation of the distances between the robot and the walls, whereas the observation in the second setup is the raw acoustic signal. To extract the distance from that measurement, it is necessary to preprocess it by an additional estimation pipeline. The reason we include the second setup is to show that the proposed algorithm is potentially applicable to actual problems. Nevertheless, we note that in real-life scenarios, this requires additional components, e.g., echo labeling block.

6.1 Evaluation of the EKF algorithm with respect to the HCRB

We consider a scenario in which the robot travels in an enclosed four-wall room, where each wall is characterized by an offset and angle as depicted in Fig 2. For each Monte-Carlo trial, the robot movement path was generated according to (3). For each k, the movement command u_{k} was defined by a constant step size μ=0.5 m and a uniformly distributed angle ϕ_{k}. In determining the robot step-size, we had the following consideration. Recall that for each k, the EKF uses a first-order series approximation of the distance function h around Φ_{k|k−1}, and note that a large step-size leads to a large variation in Φ. Thus, the linear approximation becomes crude if the step size is too large, causing the EKF to diverge. On the other hand, if the step-size is too small, the robot will not travel enough distance required for mapping. In addition, the parameter ρ in (4) was set to ρ=0.97. The variance of the robot step-error, w_{k} (cf. (3), and the subsequent paragraph), was taken as σ_{w}=0.02 m, which is 4% of the robot step-size. The measurements z_{k} were generated as follows. For each x_{k}, we calculated h(x_{k},θ) using (8), and then, to reflect the inaccuracy in measuring the distance, we added the Gaussian noise v_{k} with σ_{v}=0.02 m, which is also 4% of the step size. Note that these values, i.e., σ_{w}, and σ_{v} being 4%, are significant since it is accumulated as the algorithm progresses.

The experiment was repeated and averaged over 40 different rooms. All rooms are 4-wall rooms that differ in the angle between each adjacent wall. The angle was randomized from a uniform distribution in the range [− 5,5] degrees. Figures 3 and 4 depict the mean square error (MSE) of the estimation of the room geometry parameters, \(\left \{\alpha _{i},b_{i}\right \}_{i=1}^{4}\), using the proposed EKF algorithm. The MSE is defined as

Examining the result in Fig. 5, the estimation error of the EKF algorithm during the first steps, i.e., approximately until k=50, is lower than the HCRB. This low estimation error in the robot location follows from the fact that the initial robot location determines the origin of the coordinate system. Thus, with respect to this coordinate system, the robot location is known. As a result, we see a discrepancy between the localization results and the bound in the first 50 steps. Accordingly, this part of the graph should be ignored. This is not the case for the randomly initialized room geometry parameters as depicted in Figs. 3 and 4.

The results show that the HCRB is large for small values of k, which is expected in the case of a few observations. Moreover, the bound drops significantly as k increases up to k=20. This can be attributed to the significant contribution of each new measurement when the overall number of measurements is small. We note that at approximately k=80, the slope becomes very moderate. We also note that the estimation error approaches the bound as k is increased as depicted in Figs. 3, 4, and 5, and the estimation error approaches the HCRB, implying that the EKF is an asymptotically optimal estimator and that the HCRB is asymptotically tight.

Figure 6 depicts the estimated line-equation of one of the room’s walls, i.e., the line in the plane which is determined by the estimated angle and offset. Each line corresponds to a different value of k. The results show that the estimated line coincides with the true line as k increases.

6.2 Evaluation based on simulated room impulse response

In this simulation, the robot moves a small number of steps chosen randomly (up to 15 steps for each run) and estimates the mapping parameters of rectangular rooms. In Figs. 7 and 8, we present two examples of robot trajectories and the evolution of the room shape estimates, as the robot moves in the room, in comparison to the true room geometry. There is, however, a technical difficulty in carrying out the simulation using the RIR generator [42]. In the latter, it is impossible to place the emitting source and the microphone at the same point, an assumption that underlies our 2D model (c.f. Section 3). To overcome that, we fed the simulator with a three-dimensional room with dimensions 4×5×7 m. The idea is to only have horizontal reflections, as in a 2D room. To achieve that, a low reflection coefficient, of value 0.1, was set to the ceiling and floor, while the other four walls were attributed with a high reflection coefficient of 0.95, as mentioned. We used a feature of the RIR generator [42] that allows us to construct an RIR comprising only the direct-path and the first-order reflections (with pre-defined reflection coefficients). The reverberation-time, which indicates the decay of the reverberation tail is therefore meaningless. In addition, the robot walked horizontally, where the emitting source and the microphone were placed on the robot such that their heights (with respect to the room’s floor) were 3 m and 5 m, respectively, as depicted in Fig. 9. This way, the source and microphone were not at the same point, but their projections, as well as the projection of the distances to the walls, induced a 2D room in which the source and microphone were collocated. To project these distances, one has to discard the first peak, which corresponds to the direct-path between the microphone and source, and to take into account the incidence angle to each wall at every step. Explicitly, taking the midpoint between the microphone and the source as the robot position, we calculate the distance to each wall by compensating and incident angle, as depicted in Fig. 9.

In more explicit terms, let \(d^{\prime }_{i,k}, 1\leq i\leq 4\) be the distance between wall-i and the robot after step k and \(\hat {d}'_{i,k}\) be its corresponding estimate, extracted from the TOA. Then, z_{k} is set as \(z_{ik}=\hat {d}_{ik}\), where \(\hat {d}_{ik}\) is the two dimensional distance, given by \(\hat {d}_{ik}=\hat {d}'_{ik}\cos \left (\xi _{ik}\right),\) and ξ_{ik} is the incidence-angle^{Footnote 4}. The result demonstrates that the estimates improve as the number of steps increases. Additional scenarios were simulated for several trajectories demonstrating similar results. Note that the orientation of the coordinate system is determined by the first robot step and is different in each scenario.

The run time of each EKF estimation step on i7 Intel processor, implemented using MatlabⒸ, took 260 mSec. Convergence time depends on the model and measurement errors described in (3), (10), and (8), as well as the decay coefficients in (12). As mentioned before, this simulation is not compared to the HCRB since it involves a prior stage that is not taken into consideration while calculating the bound. This simulation study is merely added to demonstrate that the proposed algorithm may be applied to real-life scenarios, provided that additional algorithmic blocks are implemented.

Since a small number of steps are taken, we set ρ=1, where ρ is given in (4), and σ_{w}=0.05 m (c.f. Section 3.1). The measurements z_{k} were generated as follows. At each step, k, we extracted the corresponding RIR using [42], with only the first-order reflections simulated (although unrealistic, it still provides an interesting scenario for the current study. Full reverberation regime is left for future research.). The RIR was convoluted with a linear chirp signal of 128 mSec long, with start and stop frequencies 0.5 KHz and 5 KHz, respectively. Sampling rate was set to 16 KHz. The output was then contaminated with a white, additive Gaussian noise. The signal-to-noise ratio (SNR) at the microphone is set to 5.6 dB. The received signal was correlated with the known chirp signal, and from the result, we picked the TOA of all peaks that were higher than one ninth of the strongest peak. In the rare cases where less than four peaks satisfied this condition, we took the four strongest. The simulation reflection coefficient was set to 0.95, which is relatively high. This was done to ensure that all echoes corresponding to the walls’ direct paths would be discovered. If no matching echo is found for a specific wall, the mapping parameters (of that wall) cannot be estimated at the current robot step. We then solved the echo labeling^{Footnote 5} problem and set \(\hat {\tau }_{ik}\), the estimate of the TOA between the robot and wall-i (cf. Section 3.2), as half the TOA of the corresponding echo. Finally, we calculated the distance \(\hat {d}_{ik}\) according to (9) and concatenated all distances in the measurement vector z_{k}, i.e., \(z_{ik}= \hat {d}_{ik}, i=1,\ldots,4\).

7 Conclusions

In this paper, we considered the self-localization of a source that travels in an unknown environment, while simultaneously estimating its geometry from noisy time-of-arrival observations. We derived a recursive algorithm, based on the EKF, for this problem as well as the corresponding HCRB. Simulation results showed that the algorithm achieves the HCRB as the number of observations increases. Although this is not a formal proof, it implies that asymptotically, the EKF is an optimal estimator and that the HCRB a tight bound.

As a concrete application, we considered the case where the time-of-arrival is extracted from acoustic echoes and demonstrated the applicability of the proposed EKF algorithm.

An important direction for future research is to test theoretically if the tightness of the HCRB, as well as the optimally of the EKF that were observed in the simulation, is valid in general. The main challenge here is to extend the results about the tightness of HCRB [25] to cases where it is calculated recursively. Another important extension is to incorporate model order selection techniques to determine the number of walls, prior to applying the proposed EKF algorithm. In addition, other type of mismodelling could be considered [43], e.g., if the assumed kinematic model is not accurate.

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.

Notes

The BCRB, unlike its classic counterpart the CRB, is in general, not an asymptotically tight bound.

A possible scenario where this assumption might hold is if, during the algorithm initialization, the source remains in the same Voronoi region (a region in which the set of distances from the source to the walls maintains its order). Once the algorithm converges, it is possible to identify with a low probability of error, which Voronoi region the source is in, from which it is possible to label the echoes.

For further reading on different types of unbiased conditions, the reader is referred to [25].

Note that ξ_{ik}, is not an unknown parameter in the original problem (cf. Definition 1), but rather an artifact of the setup in Fig. 9, designated to overcome the constraint that the source and microphone cannot be collocated with the RIR simulator implementation that we have used. Thus, we used the robot’s true location at time k to calculate ξ_{ik}, since otherwise we would have incorporated an additional nuisance parameter.

Since echo-labeling is beyond the scope of this paper, we assume throughout that the labeling is given. In this simulation, we used the exact robot location to label the echoes, by picking the label combination that minimizes the square error between the true TOA and the estimated TOA.

Abbreviations

SLAM:

Simultaneous localization and mapping

RIR:

Room impulse response

SNR:

Signal-to-noise ratio

EKF:

Extended Kalman filter

CRB:

Cramér-Rao bound

FIM:

Fisher information matrix

HFIM:

Hybrid Fisher information matrix

MDS:

Markovian dynamic system

TOA:

Time of arrival

HCRB:

Hybrid CRB

BCRB:

Bayesian CRB

MSE:

Mean square error

SHCRB:

Sub-hybrid Cramér-Rao bound

SHFIM:

Sub-hybrid Fisher information matrix

TDOA:

Time difference of arrival

p.d.f.:

Probability density function

References

X. Chang, C. Yang, J. Wu, X. Shi, Z. Shi, in IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM). A surveillance system for drone localization and tracking using acoustic arrays (IEEESheffield, 2018), pp. 573–577.

A. Collin, A. T. Espinoza, in IEEE International Conference on Vehicular Electronics and Safety (ICVES). SLAM-based performance quantification of sensing architectures for autonomous vehicles (IEEEMadrid, 2018), pp. 1–6.

C. Bibby, I. Reid, in IEEE International Conference on Robotics and Automation. A hybrid slam representation for dynamic marine environments (IEEEAnchorage, 2010), pp. 257–264.

F. Demim, A. Nemra, H. Abdelkadri, A. Bazoula, K. Louadj, M. Hamerlain, in The IEEE 25th International Conference on Systems, Signals and Image Processing (IWSSIP). Slam problem for autonomous underwater vehicle using SVSF filter (IEEEMaribor, 2018), pp. 1–5.

X. Sheng, Y. -H. Hu, in Information Processing in Sensor Networks, ed. by F. Zhao, L. Guibas. Energy based acoustic source localization (SpringerBerlin, Heidelberg, 2003), pp. 285–300.

S. T. Birchfield, R. Gangishetty, in Proceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, vol. 4. Acoustic localization by interaural level difference (IEEEPhiladelphia, 2005), pp. 1109–11124.

N. D. Gaubitch, W. B. Kleijn, R. Heusdens, in IEEE International Conference on Acoustics, Speech and Signal Processing. Auto-localization in ad-hoc microphone arrays (IEEEVancouver, 2013), pp. 106–110.

I. Dokmanić, L. Daudet, M. Vetterli, in 22nd European Signal Processing Conference (EUSIPCO). How to localize ten microphones in one finger snap (IEEELisbon, 2014), pp. 2275–2279.

S. Tervo, T. Tossavainen, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3D room geometry estimation from measured impulse responses (IEEEKyoto, 2012), pp. 513–516.

I. Dokmanić, Y. M. Lu, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Can one hear the shape of a room: the 2-D polygonal case (IEEEPrague, 2011), pp. 321–324.

I. Jager, R. Heusdens, N. D. Gaubitch, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Room geometry estimation from acoustic echoes using graph-based echo labeling (IEEEShanghai, 2016).

T. S. Ho, Y. C. Fai, E. S. L. Ming, in 10th Asian Control Conference (ASCC). Simultaneous localization and mapping survey based on filtering techniques (IEEESabah, 2015).

C. Evers, A. H. Moore, P. A. Naylor, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Acoustic simultaneous localization and mapping (A-SLAM) of a moving microphone array and its surrounding speakers (IEEEShanghai, 2016), pp. 6–10.

M. W. M. G. Dissanayake, P. Newman, S. Clark, H. F. Durrant-Whyte, M. Csorba, A solution to the simultaneous localization and map building (SLAM) problem. IEEE Trans. Robot. Autom.17(3), 229–241 (2001).

M. Montemerlo, S. Thrun, D. Koller, B. Wegbreit, in Eighteenth National Conference on Artificial Intelligence. FastSLAM: a factored solution to the simultaneous localization and mapping problem (American Association for Artificial IntelligenceMenlo Park, CA, USA, 2002), pp. 593–598.

I. Dokmanić, L. Daudet, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). From acoustic room reconstruction to SLAM (Shanghai, 2016), pp. 6345–6349.

M. Krekovic, I. Dokmanić, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). EchoSLAM: simultaneous localization and mapping with acoustic echoes (Shanghai, 2016), pp. 11–15.

C. Gentner, T. Jost, W. Wang, S. Zhang, A. Dammann, U. C. Fiebig, Multipath assisted positioning with simultaneous localization and mapping. IEEE Trans. Wirel. Commun.15(9), 6104–6117 (2016).

M. Kreković, I. Dokmanić, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Omnidirectional bats, point-to-plane distances, and the price of uniqueness (New Orleans, 2017), pp. 3261–3265.

M. Krekovic, I. Dokmanić, M. Vetterli, Shapes from echoes: uniqueness from point-to-plane distance matrices. IEEE Trans. Sig. Proc.68:, 2480–2498 (2020).

Y. Rockah, P. Schultheiss, Array shape calibration using sources in unknown locations–part I: far-field sources. IEEE Trans. Acoust. Sig. Process. 35(3), 286–299 (1987).

H. Messer, in Fourth IEEE Workshop on Sensor Array and Multichannel Processing (SAM). The hybrid Cramér-Rao lower bound - from practice to theory (Waltham, 2006), pp. 304–307.

S. Bay, B. Geller, A. Renaux, J. Barbot, J. Brossier, On the hybrid Cramér Rao bound and its application to dynamical phase estimation. IEEE Sig. Process. Lett.15:, 453–456 (2008).

L. Xu, X. R. Li, in 12th International Conference on Information Fusion. Hybrid Cramér-Rao lower bound on tracking ground moving extended target (IEEESeattle, 2009), pp. 1037–1044.

R. G. Brown, P. Y. C. Hwang, Introduction to random signals and applied kalman filtering: with matlab exercises and solutions, 3rd edn. (John Wiley & Sons, Reading, Massachusetts, 1997).

E. A. Wan, R. Van Der Merwe, in Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium. The unscented Kalman filter for nonlinear estimation (IEEELake Louise, 2000), pp. 153–158.

D. Hertz, Time delay estimation by combining efficient algorithms and generalized cross-correlation methods. IEEE Trans. Acoust. Speech Sig. Process. 34(1), 1–7 (1986).

M. Crocco, A. Trucco, A. Del Bue, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Room reflectors estimation from sound by greedy iterative approach (IEEECalgary, 2018), pp. 6877–6881.

S. Gannot, M. Moonen, in The International Workshop on Acoustic Echo and Noise Control (IWAENC). On the application of the unscented Kalman filter to speech processing (Kyoto, Japan, 2003), pp. 8–11.

S. Haykin, Kalman filtering and neural networks, vol. 47 (John Wiley & Sons, New York, 2004).

F. Gini, R. Reggiannini, On the use of Cramér-Rao-like bounds in the presence of random nuisance parameters. IEEE Trans. Commun.48(12), 2120–2126 (2000).

C. Ren, J. Galy, E. Chaumette, F. Vincent, P. Larzabal, A. Renaux, Recursive hybrid Cramér Rao bound for discrete-time Markovian dynamic systems. IEEE Sig. Process. Lett.22(10), 1543–1547 (2015).

MV, YN, and SG developed the model. Experimental testing was carried out by MV and YN. The paper was written by MV, YS, and SG, with MV the first author. The authors read and approved the final manuscript.

All authors agree to publish the submitted paper in this journal.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Sharon Gannot is a Member of EURASIP

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Veisman, M., Noam, Y. & Gannot, S. The hybrid Cramér-Rao lower bound for simultaneous self-localization and room geometry estimation.
EURASIP J. Adv. Signal Process.2021, 3 (2021). https://doi.org/10.1186/s13634-020-00702-6