 Research
 Open Access
 Published:
The hybrid CramérRao lower bound for simultaneous selflocalization and room geometry estimation
EURASIP Journal on Advances in Signal Processing volume 2021, Article number: 3 (2021)
Abstract
This paper addresses the problem of tracking a moving source, e.g., a robot, equipped with both receivers and a source, that is tracking its own location and simultaneously estimating the locations of multiple plane reflectors. We assume a noisy knowledge of the robot’s movement. We formulate this problem, which is also known as simultaneous localization and mapping (SLAM), as a hybrid estimation problem. We derive the extended Kalman filter (EKF) for both tracking the robot’s own location and estimating the room geometry. Since the EKF employs linearization at every step, we incorporate a regulated kinematic model, which facilitates a successful tracking. In addition, we consider the echolabeling problem as solved and beyond the scope of this paper. We then develop the hybrid CramérRao lower bound on the estimation accuracy of both the localization and mapping parameters. The algorithm is evaluated with respect to the bound via simulations, which shows that the EKF approaches the hybrid CramérRao bound (CRB) (HCRB) as the number of observation increases. This result implies that for the examples tested in simulation, the HCRB is an asymptotically tight bound and that the EKF is an optimal estimator. Whether this property is true in general remains an open question.
Introduction
Consider a source that moves in an unknown area while simultaneously mapping the environment and tracking its own location, a problem known as SLAM. As a concrete example, we use a moving robot that emits sound in a room enclosure. Nevertheless, the analysis here applies to other similar scenarios, e.g., drone navigation [1], vehicles in urban areas [2], or marine vessels [3, 4] in ports. This problem is difficult since the robot must localize itself with respect to an incomplete map. The general SLAM problem has been the subject of substantial research with many different applications [5].
As implied by its name, SLAM consists of two components, localization and mapping. Source localization is a wellstudied problem in acoustic signal processing. Several energybased localization techniques exist [6, 7]. A common approach utilizes an array of microphones and their TDOA [8, 9]. Another technique exploits both the spatial and temporal information to localize and track a moving speaker [10]. Additional approaches and techniques are reviewed in [11]. In the case of mapping, a general solution for room geometry estimation is based on spatial roomimpulseresponses and the image source model [12]. In addition, echo labeling is used to map the shape of polyhedral rooms [13, 14]. The latter treat the acoustic echoes as nodes in a graph.
While the mapping and the localization subproblems can be considered well understood when separately treated, the combined SLAM problem is much more challenging. The reason is that localization and mapping depend on each other and are executed simultaneously in an intertwined way. The most common philosophy in addressing SLAM is the Bayesian theoreticalbased approach [5]. The reader is referred to the review [15] and references therein for further study on the subject. Notable is the work in [16], which employs particle filtering and uses a moving microphonearray. Kalmanbased approaches are in wide use for the tracking and mapping. These approaches provide a recursive solution to the localization problem and consistent estimates for mapping landmarks. This is done using statistical models for the speaker motion and the environment’s features [17]. A FastSLAM algorithm that integrates the RaoBlackwell particle filter and the Kalman filter is presented in [18].
We consider a robot that emits an acoustic signal and is equipped with one microphone. While moving in a fourwall room, the robot tracks its own location and simultaneously estimates the room geometry (namely, mapping) using the acoustic echoes. For this problem, some solutions exist. Notable is [19] which estimates the shape of a room as well as determines the location and geometry of an array of microphones using Euclidean distance and matrix completion algorithms. In [20], a Bayesian filtering method is applied for movement tracking and room shape estimation. Most algorithms are based on particle filtering, which is computationally intense and may suffer from convergence problems.
The paper is organized as follows. In Section 2, we summarize the main methodology of the paper and briefly describe the experimental study. The problem is formulated in Section 3, which includes the robot’s kinematic model, the observation model, and the unknown parameters. In Section 4, we present an online SLAM algorithm, and in Section 5, we derive the HCRB (cf. Proposition 1). Simulation results of the proposed algorithm with respect to the derived HCRB are presented in Section 6. Summary and conclusions are given in Section 7.
Methods
In this paper, we address the problem of selflocalizing a source and, simultaneously, inferring the room geometry using a single microphone and a single receiver that are mounted on a moving platform, e.g., a robot. The robot moves around the room according to given movement commands. The estimation is based on noisy distance measurements, corresponding to the firstorder echoes, taken at finite set of points along the trajectory. While the movement commands are known, the overall robot trajectory is unknown. This follows from the fact that room geometry is unknown (and therefore, the robot position with respect to the room). In addition, we assume that the robot movement inaccurate w.r.t. the known movement commands. This can occur due to imperfect execution of the robot commands. In general, if nothing is known about the trajectory, this problem is not identifiable for some important classes of room geometries, i.e., parallelogram roomgeometry cannot be uniquely determined from the distances between walls and trajectory points (cf. [21, 22], and for a comprehensive analysis, see [23]). Nevertheless, the robot movement commands provide partial information about the robot trajectory. We use these movement commands, as well as the distance measurements from each wall, to simultaneously map the room and localize the robot with respect to it. In contrast to particle filtering, we adopt a more analytic, computationally efficient approach, the EKF, which simultaneously tracks the robot moment and estimates the room geometry.
We formulate the problem as a hybrid estimation task that includes both random and deterministic unknown parameters. This combines both the classic and the Bayesian estimation philosophies, an approach that is very suitable for the problem at hand in which the room’s features are considered static and deterministic, while the robot movement as a random statespace process. To apply the Bayesian EKF approach to our hybrid trackingmapping estimation problem, we carefully incorporate randomness into the deterministic room parameters (cf. Section 4).
Moreover, we derive the HCRB, which is an important tool in estimation theory [24–29], for the SLAM problem at hand. We provide a recursive expression for the HCRB (see [30] for an overview of recursive CRBtype bounds) which constitutes, at each time instance, a lower bound on the estimation error of the current location and of the room geometry. The advantage of the HCRB is that it is valid for both random and deterministic parameters. A related bound, the Bayesian CRB (BCRB), was derived for a different SLAM problem in [21]. While the BCRB is a valid bound on the random parameters^{Footnote 1} (e.g., receiver location), it is generally not useful with respect to the deterministic parameters in hybrid setups. To apply the BCRB, one must transform the hybrid estimation problem into a pure Bayesian one. This can be done by incorporating some prior distribution to the deterministic parameters, which makes the BCRB susceptible to mismodelling. It is also possible to treat the deterministic parameters as random with zero variance. However, this will violate the BCRB regularity conditions [30]. A different approach would be to discard the prior distribution on the random parameters, thus transforming the problem into a pure classic one, i.e., an estimation problem where all unknown parameters are deterministic unknown. The drawback is that by discarding the prior distribution, one loses the useful information that is embedded in the prior. Moreover, it may render the estimation problem illposed (see, e.g., [24]). In this paper, we consider the HCRB since it is more suitable for hybrid estimation problems (see the discussion about the HCRB and other CRBtype bounds in [25], Sect. IA and Sect. III).
Finally, we evaluate the performance of the proposed algorithm by simulation results examination and by comparing them with the obtained HCRB. We demonstrate a very good match between the MonteCarlo simulation and the analytical bound, i.e., we show in simulation that EKF approaches the HCRB.
Notation: Boldface lowercase letters denote vectors. We use boldface uppercase letters, e.g., F,C,J, to denote matrices. All vectors are column vectors and (·)^{⊤} denotes the transpose operation. Let g(x) be scalar function of a vector x, then, ∂g(x)/∂x denotes the gradient operation which is organized as a row vector. In the case where g(x) is a vector, ∂g(x)/∂x is the Jacobian matrix of g(x). Let A be a matrix, then, [A]_{ij} denotes its i,jth entry. The operator spectral norm operation is denoted by ∥·∥. Let \(\mathbf {A}, \mathbf {B}\in \mathbb {R}^{n\times m}\), then blockdiag(A,B) is the blockdiagonal matrix whose upperleft and lowerright n×m blocks are equal to A and B, respectively.
Problem formulation
Consider a single omnidirectional acoustic source and a microphone, both mounted on a robot at the same point on the xy plane, but at different heights, i.e., with different zcoordinates. A fullyexciting signal that simplifies the subsequent correlation operation would be better than a speech signal for echolabeling.
The robot moves in a fourwall, closedroom according to a series of movement commands. In this work, we consider 2D environments. An extension to 3D environments is possible, and it involves a characterization of the third dimension. Such an extension can be made by adding characterization equations for additional walls.
Each command instructs the robot to take a single step of a given size and an orientation angle that indicate the direction and length of the movement as depicted in Fig. 1 and further discussed in the sequel. In each step, the robot takes a measurement (a full description is given in Section 3.2) and simultaneously estimates its location and the room geometry. The robot’s location is denoted by
where x_{k} and y_{k} denote the xcoordinate and the ycoordinate, respectively, at step k. The coordinate system is determined by the robot’s first step, i.e., the positive direction of the xaxis coincides with the first step. In this paper, each time instant corresponds to a step; thus, we use the terms timeinstant k and step k, interchangeably. The room shape is determined by the parameter vector
where (α_{i},b_{i}) represent the angle and offset of the ith wall, as depicted in Fig. 2.
The robot kinematic model
We assume that the robot travels according to the following linear model
where k is a time index, F is the statetransition matrix, given by
and
The latter denotes the robot’s movement command, where μ_{k} is the step size and ϕ_{k} is the orientation angle with respect to the positive edge of the horizontal axis, i.e., the angle of the robot first step is set as the zeroangle with respect to the xaxis of the coordinate system. The robot movement is depicted in Fig. 1. The noise w_{k} is zeromean Gaussian signal with covariance matrix \(\mathbf {C}_{w} =\sigma _{w}^{2}\mathbf {I}_{2\times 2}\), which expresses inaccuracies in the execution of the robot’s movement commands. As a result, the exact location of the robot is an unknown parameter, to be estimated as part of the estimation problem at hand.
Few comments are in place regarding the matrix F. If it had not been for mathematical regularity conditions, the robot location would have better been characterized by
In words, the location at time k+1 would have been determined by the previous location and by the step taken at time k. This, however, induces an unstable kinematic model, where the noise variance is accumulated and diverges to infinity as k increases. While such models are generally tractable using the Kalman filter [31, 32], it may be problematic for the EKF [33] due to its embedded linearization. If the robot movement is not regulated, the linear approximation may be too crude and induce an unstable kinematic model that diverges the filter. Therefore, we incorporate the matrix F, whose sole purpose is to regulate the robot movement by enforcing stability on the model. It can be shown that the model is stable for ρ<1. However, to make this model realistic, ρ should be close to “1” and accordingly, F to be close to the identity matrix.
We conclude with defining the evolution of the unknown deterministic parameter θ. Since the room features (wall angles and offsets) are constant, we use the following static propagation model:
Measurement model
The observation at each time instant, k, is a vector z_{k}=[z_{1,k}⋯z_{4,k}]^{T} which measures the distance between the robot and each of the four walls, as shown in Fig. 2. It can be verified by straightforward geometrical considerations that the distance is related to the robot’s location and to the room’s features by the following nonlinear equation:
Note that z_{i,k}∀i∈1,…,4 is an estimate of the distance between the robot and wall i at time k, as we explain in the sequel. Moreover, h(x_{k},θ) is a nonlinear function that expresses the true distance of the robot from each wall in terms of the robot true coordinates. The vector v_{k} is a white Gaussian noise with covariance matrix \(\mathbf {C}_{v}=\sigma _{v}^{2}\mathbf {I}_{4\times 4}\), which depicts the measurement error induced by an imperfect distance estimation. The Gaussian assumption is reasonable in cases where the measurement error is small enough compared to the actual distance [10].
Definition 1
The SLAM problem, considered in this paper, is to jointly estimate θ and x_{k}, for each k, based on the statespace models given in (3), (7) and the measurement model (8).
From a practical point of view, the measurement model in (8) needs some prior processing. The TOA which corresponds to the distance between the robot and a wall [34, 35] must be estimated.
Let τ_{ik} be the signal propagation time from the robot to the wall and back to the robot. The latter is related to the distance by
where c is the speed of sound (approximately 340 m/s in air), and d_{ik} is the distance between the robot and wall i at time k. Furthermore, in addition to estimating τ_{i,k}, it is necessary to identify which echo corresponds to each wall. This problem is known as echo labeling. The reader is referred to [14, 20, 36, 37] for further insights. Because echo labeling is beyond the scope of this paper we assume, for simplicity, that the echoes are perfectly labeled^{Footnote 2}. Finally, we would like to stress that the loudspeaker and the microphone are colocated only in their 2Dprojection into the xy plain. In Section 6.2, we place them in different heights, i.e., the sourcereceiver location differs only in their zcoordinate, such that they remain colocated in the xy plane.
Simultaneous localization and mapping: algorithm
This section presents a recursive algorithm based on the EKF [5] for the SLAM problem given in Definition 1. Note that this is not a Bayesian estimation problem, but rather a hybrid one. To estimate the deterministic parameter θ_{k} in (7) using the EKF, which is Bayesian, we incorporate randomness into θ_{k}. This, in addition, enables a more “lively” tracking results [38] that cannot be achieved by the algorithm initialization (see, e.g., [5]). Accordingly, we modify (7) as follows
where m_{k} is white Gaussian noise with covariance matrix R_{k}. The introduction of m_{k} in (10), as well as the initialization \(\hat {{{\boldmath {\theta }}}}_{0}\), incorporate an inaccurate prior probabilistic knowledge about θ. To avoid mismodelling, and guarantee that \(\hat {{{\boldmath {\theta }}}}_{0}\) is forgotten rapidly, the matrix R_{k} must be “large” at k=0 and ∥R_{k}∥ should decreases to zero as k increases.
where \(\sigma _{0}^{2}\) is large and Λ is the following diagonal matrix
The parameters λ_{1},λ_{2} act as decay coefficients that correspond to \(\{\alpha _{i}\}_{i=1}^{4}\) and \(\{b_{i}\}_{i=1}^{4}\), respectively, and determine the rate at which the corresponding driving noise decreases. After substituting θ with θ_{k} in (7), we are left with the nonlinear statespace model given by (3), (8), and (10), where the nonlinearity is due to (8). Similar to [5], it is possible to develop the EKF, which linearizes the measurement model and transition matrices [31, 39]. The algorithm consists of two steps for each iteration: prediction and update. In the prediction stage, the robot location is predicted according to (3) as follows
and the room parametervector is predicted according to (10) as follows:
The joint covariance matrix of the localization and mapping parameters is
where
As discussed above, C_{w} is the covariance matrix w.r.t. the localization parameters and R_{k} is the randomness inserted into the mapping parameters.
In the update stage, the algorithm estimates the location and the room parameters as
where
and K_{k} is the Kalman gain, which is given by
The matrix S_{k} is calculated as
where ∇h denotes the Jacobian of h at \(\hat {\mathbf {x}}_{kk1}\) and \(\hat {\boldsymbol {\theta }}_{kk1}\). In addition, the covariance matrix is updated considering the EKF extension [5]
The complete EKF algorithm is summarized in Algorithm 1.
The hybrid CramérRao lower bound
In this section, we derive the HCRB for the SLAM problem of Definition 1. The objective is to bound the estimation error of x_{k} and θ from the measurements z_{k} for each step k, while treating the previous locations as a nuisance parameter. We denote the overall unknown parameter vector at time k by \({{\boldsymbol {\psi }}_{k}}=\left [\mathbf {x}_{0}^{\top },...,\mathbf {x}_{k}^{\top },\boldsymbol { \theta }^{\top }\right ]^{\top }\) and collect the set of all robot’s locations up to time k into \(\mathbf { x}^{k}=\left [\mathbf {x}_{0}^{\top },\mathbf {x}_{1}^{\top },...,\mathbf {x}_{k}^{\top }\right ]^{\top } \). Similarly, \(\mathbf {z}^{k}=\left [\mathbf {z}_{0}^{\top },\mathbf {z}_{1}^{\top },...,\mathbf { z}_{k}^{\top } \right ]^{\top } \) comprises all the measurements up to time k. Since x_{k} is random (3) while θ is deterministic, the estimation of ψ_{k} is a hybrid estimation problem, i.e., a combination of classic and Bayesian estimation philosophies [26, 30, 40]. The HCRB lower bound [24] asserts that for every unbiased estimator \(\hat {{\boldmath {\psi }}}_{k}(\mathbf {z}_{k}) \), i.e.,
the estimation error is lower bounded as follows:
where the expectation \({\mathbb E}\) is taken w.r.t. the joint probability density function (p.d.f.) of (z^{k},x^{k}), i.e. p(z^{k},x^{k};θ), where we used semicolon to distinguish between the deterministic vector θ, and the random vectors z^{k} and x^{k}. Note that \(\hat {{\boldmath {\psi }}}_{k}\left (\mathbf {z}_{k}\right)\) is a function of the measurements z_{k}. We omit the explicit dependency for brevity. The matrix
is known as the hybrid Fisher information matrix (HFIM). In our case, the dimension of the full HFIM, J_{k}, is \({\mathbb {R}}^{k+8\times k+8}\). Since we focus on performance evaluation of online algorithms, for each current time step k, we are interested in \(\left [ \mathbf {J}_{k}^{1}\right ]^{\mathbf {x}_{k},\boldsymbol {\theta }}\), which is the lowerright 10×10 block of \(\mathbf {J}_{k}^{1}\) that corresponds to the subvector \(\left [{\mathbf {x}_{k}^{\top },\boldsymbol {\theta }^{\top }} \right ]^{\top } \) of ψ_{k}. The following proposition presents a recursive expression for \(\left [ \mathbf {J}_{k}^{1}\right ]^{\mathbf {x}_{k},\boldsymbol {\theta }}\) as a function of \(\left [ \mathbf {J}_{k1}^{1}\right ]^{\mathbf {x}_{k1},\boldsymbol {\theta }}\).
The following comment about assumption (22) is in place. Consider first the deterministic parameters; we note that in the finite sample regime, there are useful estimators that do not satisfy (22), e.g., the maximum likelihood (ML) in the pure classic case and the joint ML/MAP in the hybrid case (see [25]). Nevertheless, in the largesample regime, an estimator that does not satisfy (22) concerning the deterministic static parameters is not consistent. Since inconsistent estimators are useless in the largesample regime for these parameters, the assumption in (22) is not restricting.^{Footnote 3} Now to the random parameters. If there is a bias, it must be a knownfunction of the deterministic parameters. Thus, if the estimate of the deterministic parameters is consistent, the bias becomes known in the largesample regime, and therefore, could and should be subtracted from the estimator. This operation yields an asymptotically unbiased estimate.
Before presenting the bound, the following definitions are required. Let \(\left [\mathbf {J}_{k}^{1}\right ]^{\mathbf {x}_{k},\boldsymbol {\theta }}\) be the lowerright 10×10 block of \(\mathbf {J}_{k}^{1}\) defined in (24) and let
To calculate the HCRB, we will use the following recursive formula for \(\tilde {\mathbf {J}}_{k}^{\mathbf {x}_{k},\boldsymbol {\theta }}\), that was derived [41] for a general statespace model with additive Gaussian noise
where \(\left \{\mathbf {D}_{k1}^{ij}\right \}^{3}_{i,j=1}\) are defined by
Before calculating the HCRB, we note that \(\tilde {\mathbf {J}}_{k}\) defined in (25), henceforth dubbed the subhybrid Fisher information matrix (SHFIM), is not the actual HFIM. The full HFIM is a matrix which inverse is the complete HCRB on the entire parameter vector, while the inverse of \(\tilde {\mathbf {J}}_{k}\) in (25) is the subblock of the HCRB that corresponds to \(\left [\mathbf {x}_{k}^{\top },{{\boldmath {\theta }}}^{\top }\right ]^{\top }\). Also note that the expression in (26) is recursive, where \(\tilde {\mathbf {J}}_{k}\) is expressed in terms of \(\tilde {\mathbf {J}}_{k1}\). Thus, given \(\tilde {\mathbf {J}}_{k1}\), it is sufficient to calculate the innovations \(\left \{\mathbf {D}_{k1}^{ij}\right \}^{3}_{i,j=1}\), according to (27) in order to obtain the updated value of \(\tilde {\mathbf {J}}_{k}\). A road map and the proof summary is given in Table 1.
Proposition 1
The components of \(\tilde {\mathbf {J}}_{k}\) (cf. (25)(26b)), are given by
where the matrix F is defined in (4) and \(\mathbf {D}_{k1}^{11},\mathbf {D}_{k1}^{12}\in {\mathbb {R}}^{2\times 2}\). Furthermore,
where
The matrix \(\mathbf {D}_{k1}^{23}\in {\mathbb {R}}^{8\times 2}\) is given as follows: its first column is obtained by stacking
for i=1,…,4 on top of each other where sign(x)=1 if x≥0. The second column is similarly obtained by replacing \(\left [\mathbf {D}^{23}_{k1}\right ]_{i1}\) with
where
Finally, \(\mathbf {D}_{k1}^{33}\) is the following block diagonal matrix
where
i=1,2,3,4, and
with
Proof
We would like to show that for the problem at hand (cf. Definition 1), (27a)(27f) lead to \(\mathbf {D}^{ij}_{k}, i,j\in \{1,2,3\}\) given in Proposition 1. We begin by noting that our problem fits the general structure of the filtering problem [29] in the sense that both are hybrid with a linear statespace kinematic model and a nonlinear measurement model with additive Gaussian noise. This establishes (28c) and, in addition, it implies
Next, from (40a)–(40b), it follows that to derive \(\mathbf {D}_{k1}^{ij}\) for i,j∈{2,3}, it is necessary to derive the Jacobian of h(x_{k},θ). For our kinematic and measurement model (3),(8), the Jacobian component with respect to x_{k} is given by
where G(α) is defined in (30). Thus, by substituting (41) and (30) into (40a), we establish (29). To show (31) and (32), we note that
where
and
From (30) and (41), \(\mathbf {D}_{k1}^{23}\) can be written as
Then, a bruteforce derivation of (45) establishes (31) and (32).
To complete the proof, it remains to show (34). It can be shown that
for i=1,2,3,4 where γ_{i} is defined in (44). Taking the expectation while noting that x_{k} and y_{k} are statistically independent, it follows that
where ζ_{i} and χ_{i} are defined in (36) and (37), respectively. It follows that (46), (47), and (48) establish (35) thus establishing (34). □
Simulation results and discussion
In this section, we present numerical results of two simulation setups. In the first setup (Section 6.1), the signal is generated via pseudorandom sequences according to (3),(4), and (8), and the proposed EKF algorithm is compared to the HCRB. It will be shown that the proposed algorithm approaches the bound. Therefore, it is not compared to any additional estimation method. In the second setup (Section 6.2), the EKF algorithm is evaluated using audio chirp signals that were convoluted with a simulated room impulse response (RIR) using [42]. The result obtained in the second setup is not compared to the HCRB since the measurements do not suit the model under which the HCRB was developed. Explicitly, the HCRB was derived for noisy observation of the distances between the robot and the walls, whereas the observation in the second setup is the raw acoustic signal. To extract the distance from that measurement, it is necessary to preprocess it by an additional estimation pipeline. The reason we include the second setup is to show that the proposed algorithm is potentially applicable to actual problems. Nevertheless, we note that in reallife scenarios, this requires additional components, e.g., echo labeling block.
Evaluation of the EKF algorithm with respect to the HCRB
We consider a scenario in which the robot travels in an enclosed fourwall room, where each wall is characterized by an offset and angle as depicted in Fig 2. For each MonteCarlo trial, the robot movement path was generated according to (3). For each k, the movement command u_{k} was defined by a constant step size μ=0.5 m and a uniformly distributed angle ϕ_{k}. In determining the robot stepsize, we had the following consideration. Recall that for each k, the EKF uses a firstorder series approximation of the distance function h around Φ_{kk−1}, and note that a large stepsize leads to a large variation in Φ. Thus, the linear approximation becomes crude if the step size is too large, causing the EKF to diverge. On the other hand, if the stepsize is too small, the robot will not travel enough distance required for mapping. In addition, the parameter ρ in (4) was set to ρ=0.97. The variance of the robot steperror, w_{k} (cf. (3), and the subsequent paragraph), was taken as σ_{w}=0.02 m, which is 4% of the robot stepsize. The measurements z_{k} were generated as follows. For each x_{k}, we calculated h(x_{k},θ) using (8), and then, to reflect the inaccuracy in measuring the distance, we added the Gaussian noise v_{k} with σ_{v}=0.02 m, which is also 4% of the step size. Note that these values, i.e., σ_{w}, and σ_{v} being 4%, are significant since it is accumulated as the algorithm progresses.
The experiment was repeated and averaged over 40 different rooms. All rooms are 4wall rooms that differ in the angle between each adjacent wall. The angle was randomized from a uniform distribution in the range [− 5,5] degrees. Figures 3 and 4 depict the mean square error (MSE) of the estimation of the room geometry parameters, \(\left \{\alpha _{i},b_{i}\right \}_{i=1}^{4}\), using the proposed EKF algorithm. The MSE is defined as
and calculated via MonteCarlo. The figures also depict the corresponding HCRBs
Figure 5 depicts the estimation error of the robot’s location
and compares it to the HCRB
Examining the result in Fig. 5, the estimation error of the EKF algorithm during the first steps, i.e., approximately until k=50, is lower than the HCRB. This low estimation error in the robot location follows from the fact that the initial robot location determines the origin of the coordinate system. Thus, with respect to this coordinate system, the robot location is known. As a result, we see a discrepancy between the localization results and the bound in the first 50 steps. Accordingly, this part of the graph should be ignored. This is not the case for the randomly initialized room geometry parameters as depicted in Figs. 3 and 4.
The results show that the HCRB is large for small values of k, which is expected in the case of a few observations. Moreover, the bound drops significantly as k increases up to k=20. This can be attributed to the significant contribution of each new measurement when the overall number of measurements is small. We note that at approximately k=80, the slope becomes very moderate. We also note that the estimation error approaches the bound as k is increased as depicted in Figs. 3, 4, and 5, and the estimation error approaches the HCRB, implying that the EKF is an asymptotically optimal estimator and that the HCRB is asymptotically tight.
Figure 6 depicts the estimated lineequation of one of the room’s walls, i.e., the line in the plane which is determined by the estimated angle and offset. Each line corresponds to a different value of k. The results show that the estimated line coincides with the true line as k increases.
Evaluation based on simulated room impulse response
In this simulation, the robot moves a small number of steps chosen randomly (up to 15 steps for each run) and estimates the mapping parameters of rectangular rooms. In Figs. 7 and 8, we present two examples of robot trajectories and the evolution of the room shape estimates, as the robot moves in the room, in comparison to the true room geometry. There is, however, a technical difficulty in carrying out the simulation using the RIR generator [42]. In the latter, it is impossible to place the emitting source and the microphone at the same point, an assumption that underlies our 2D model (c.f. Section 3). To overcome that, we fed the simulator with a threedimensional room with dimensions 4×5×7 m. The idea is to only have horizontal reflections, as in a 2D room. To achieve that, a low reflection coefficient, of value 0.1, was set to the ceiling and floor, while the other four walls were attributed with a high reflection coefficient of 0.95, as mentioned. We used a feature of the RIR generator [42] that allows us to construct an RIR comprising only the directpath and the firstorder reflections (with predefined reflection coefficients). The reverberationtime, which indicates the decay of the reverberation tail is therefore meaningless. In addition, the robot walked horizontally, where the emitting source and the microphone were placed on the robot such that their heights (with respect to the room’s floor) were 3 m and 5 m, respectively, as depicted in Fig. 9. This way, the source and microphone were not at the same point, but their projections, as well as the projection of the distances to the walls, induced a 2D room in which the source and microphone were collocated. To project these distances, one has to discard the first peak, which corresponds to the directpath between the microphone and source, and to take into account the incidence angle to each wall at every step. Explicitly, taking the midpoint between the microphone and the source as the robot position, we calculate the distance to each wall by compensating and incident angle, as depicted in Fig. 9.
In more explicit terms, let \(d^{\prime }_{i,k}, 1\leq i\leq 4\) be the distance between walli and the robot after step k and \(\hat {d}'_{i,k}\) be its corresponding estimate, extracted from the TOA. Then, z_{k} is set as \(z_{ik}=\hat {d}_{ik}\), where \(\hat {d}_{ik}\) is the two dimensional distance, given by \(\hat {d}_{ik}=\hat {d}'_{ik}\cos \left (\xi _{ik}\right),\) and ξ_{ik} is the incidenceangle^{Footnote 4}. The result demonstrates that the estimates improve as the number of steps increases. Additional scenarios were simulated for several trajectories demonstrating similar results. Note that the orientation of the coordinate system is determined by the first robot step and is different in each scenario.
The run time of each EKF estimation step on i7 Intel processor, implemented using MatlabⒸ, took 260 mSec. Convergence time depends on the model and measurement errors described in (3), (10), and (8), as well as the decay coefficients in (12). As mentioned before, this simulation is not compared to the HCRB since it involves a prior stage that is not taken into consideration while calculating the bound. This simulation study is merely added to demonstrate that the proposed algorithm may be applied to reallife scenarios, provided that additional algorithmic blocks are implemented.
Since a small number of steps are taken, we set ρ=1, where ρ is given in (4), and σ_{w}=0.05 m (c.f. Section 3.1). The measurements z_{k} were generated as follows. At each step, k, we extracted the corresponding RIR using [42], with only the firstorder reflections simulated (although unrealistic, it still provides an interesting scenario for the current study. Full reverberation regime is left for future research.). The RIR was convoluted with a linear chirp signal of 128 mSec long, with start and stop frequencies 0.5 KHz and 5 KHz, respectively. Sampling rate was set to 16 KHz. The output was then contaminated with a white, additive Gaussian noise. The signaltonoise ratio (SNR) at the microphone is set to 5.6 dB. The received signal was correlated with the known chirp signal, and from the result, we picked the TOA of all peaks that were higher than one ninth of the strongest peak. In the rare cases where less than four peaks satisfied this condition, we took the four strongest. The simulation reflection coefficient was set to 0.95, which is relatively high. This was done to ensure that all echoes corresponding to the walls’ direct paths would be discovered. If no matching echo is found for a specific wall, the mapping parameters (of that wall) cannot be estimated at the current robot step. We then solved the echo labeling^{Footnote 5} problem and set \(\hat {\tau }_{ik}\), the estimate of the TOA between the robot and walli (cf. Section 3.2), as half the TOA of the corresponding echo. Finally, we calculated the distance \(\hat {d}_{ik}\) according to (9) and concatenated all distances in the measurement vector z_{k}, i.e., \(z_{ik}= \hat {d}_{ik}, i=1,\ldots,4\).
Conclusions
In this paper, we considered the selflocalization of a source that travels in an unknown environment, while simultaneously estimating its geometry from noisy timeofarrival observations. We derived a recursive algorithm, based on the EKF, for this problem as well as the corresponding HCRB. Simulation results showed that the algorithm achieves the HCRB as the number of observations increases. Although this is not a formal proof, it implies that asymptotically, the EKF is an optimal estimator and that the HCRB a tight bound.
As a concrete application, we considered the case where the timeofarrival is extracted from acoustic echoes and demonstrated the applicability of the proposed EKF algorithm.
An important direction for future research is to test theoretically if the tightness of the HCRB, as well as the optimally of the EKF that were observed in the simulation, is valid in general. The main challenge here is to extend the results about the tightness of HCRB [25] to cases where it is calculated recursively. Another important extension is to incorporate model order selection techniques to determine the number of walls, prior to applying the proposed EKF algorithm. In addition, other type of mismodelling could be considered [43], e.g., if the assumed kinematic model is not accurate.
Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Notes
The BCRB, unlike its classic counterpart the CRB, is in general, not an asymptotically tight bound.
A possible scenario where this assumption might hold is if, during the algorithm initialization, the source remains in the same Voronoi region (a region in which the set of distances from the source to the walls maintains its order). Once the algorithm converges, it is possible to identify with a low probability of error, which Voronoi region the source is in, from which it is possible to label the echoes.
For further reading on different types of unbiased conditions, the reader is referred to [25].
Note that ξ_{ik}, is not an unknown parameter in the original problem (cf. Definition 1), but rather an artifact of the setup in Fig. 9, designated to overcome the constraint that the source and microphone cannot be collocated with the RIR simulator implementation that we have used. Thus, we used the robot’s true location at time k to calculate ξ_{ik}, since otherwise we would have incorporated an additional nuisance parameter.
Since echolabeling is beyond the scope of this paper, we assume throughout that the labeling is given. In this simulation, we used the exact robot location to label the echoes, by picking the label combination that minimizes the square error between the true TOA and the estimated TOA.
Abbreviations
 SLAM:

Simultaneous localization and mapping
 RIR:

Room impulse response
 SNR:

Signaltonoise ratio
 EKF:

Extended Kalman filter
 CRB:

CramérRao bound
 FIM:

Fisher information matrix
 HFIM:

Hybrid Fisher information matrix
 MDS:

Markovian dynamic system
 TOA:

Time of arrival
 HCRB:

Hybrid CRB
 BCRB:

Bayesian CRB
 MSE:

Mean square error
 SHCRB:

Subhybrid CramérRao bound
 SHFIM:

Subhybrid Fisher information matrix
 TDOA:

Time difference of arrival
 p.d.f.:

Probability density function
References
X. Chang, C. Yang, J. Wu, X. Shi, Z. Shi, in IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM). A surveillance system for drone localization and tracking using acoustic arrays (IEEESheffield, 2018), pp. 573–577.
A. Collin, A. T. Espinoza, in IEEE International Conference on Vehicular Electronics and Safety (ICVES). SLAMbased performance quantification of sensing architectures for autonomous vehicles (IEEEMadrid, 2018), pp. 1–6.
C. Bibby, I. Reid, in IEEE International Conference on Robotics and Automation. A hybrid slam representation for dynamic marine environments (IEEEAnchorage, 2010), pp. 257–264.
F. Demim, A. Nemra, H. Abdelkadri, A. Bazoula, K. Louadj, M. Hamerlain, in The IEEE 25th International Conference on Systems, Signals and Image Processing (IWSSIP). Slam problem for autonomous underwater vehicle using SVSF filter (IEEEMaribor, 2018), pp. 1–5.
H. DurrantWhyte, T. Bailey, Simultaneous localization and mapping: part I. IEEE Robot. Autom. Mag.13(2), 99–110 (2006).
X. Sheng, Y. H. Hu, in Information Processing in Sensor Networks, ed. by F. Zhao, L. Guibas. Energy based acoustic source localization (SpringerBerlin, Heidelberg, 2003), pp. 285–300.
S. T. Birchfield, R. Gangishetty, in Proceedings. (ICASSP ’05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005, vol. 4. Acoustic localization by interaural level difference (IEEEPhiladelphia, 2005), pp. 1109–11124.
N. D. Gaubitch, W. B. Kleijn, R. Heusdens, in IEEE International Conference on Acoustics, Speech and Signal Processing. Autolocalization in adhoc microphone arrays (IEEEVancouver, 2013), pp. 106–110.
I. Dokmanić, L. Daudet, M. Vetterli, in 22nd European Signal Processing Conference (EUSIPCO). How to localize ten microphones in one finger snap (IEEELisbon, 2014), pp. 2275–2279.
S. Gannot, T. G. Dvorkind, Microphone array speaker localizers using spatialtemporal information. EURASIP J. Adv. Sig. Proc.2006(12), 1–17 (2006).
T. Kundu, Acoustic source localization. Ultrasonics. 54(1), 25–38 (2014). https://doi.org/10.1016/j.ultras.2013.06.009.
S. Tervo, T. Tossavainen, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 3D room geometry estimation from measured impulse responses (IEEEKyoto, 2012), pp. 513–516.
I. Dokmanić, Y. M. Lu, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Can one hear the shape of a room: the 2D polygonal case (IEEEPrague, 2011), pp. 321–324.
I. Jager, R. Heusdens, N. D. Gaubitch, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Room geometry estimation from acoustic echoes using graphbased echo labeling (IEEEShanghai, 2016).
T. S. Ho, Y. C. Fai, E. S. L. Ming, in 10th Asian Control Conference (ASCC). Simultaneous localization and mapping survey based on filtering techniques (IEEESabah, 2015).
C. Evers, A. H. Moore, P. A. Naylor, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Acoustic simultaneous localization and mapping (ASLAM) of a moving microphone array and its surrounding speakers (IEEEShanghai, 2016), pp. 6–10.
M. W. M. G. Dissanayake, P. Newman, S. Clark, H. F. DurrantWhyte, M. Csorba, A solution to the simultaneous localization and map building (SLAM) problem. IEEE Trans. Robot. Autom.17(3), 229–241 (2001).
M. Montemerlo, S. Thrun, D. Koller, B. Wegbreit, in Eighteenth National Conference on Artificial Intelligence. FastSLAM: a factored solution to the simultaneous localization and mapping problem (American Association for Artificial IntelligenceMenlo Park, CA, USA, 2002), pp. 593–598.
I. Dokmanić, L. Daudet, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). From acoustic room reconstruction to SLAM (Shanghai, 2016), pp. 6345–6349.
M. Krekovic, I. Dokmanić, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). EchoSLAM: simultaneous localization and mapping with acoustic echoes (Shanghai, 2016), pp. 11–15.
C. Gentner, T. Jost, W. Wang, S. Zhang, A. Dammann, U. C. Fiebig, Multipath assisted positioning with simultaneous localization and mapping. IEEE Trans. Wirel. Commun.15(9), 6104–6117 (2016).
M. Kreković, I. Dokmanić, M. Vetterli, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Omnidirectional bats, pointtoplane distances, and the price of uniqueness (New Orleans, 2017), pp. 3261–3265.
M. Krekovic, I. Dokmanić, M. Vetterli, Shapes from echoes: uniqueness from pointtoplane distance matrices. IEEE Trans. Sig. Proc.68:, 2480–2498 (2020).
Y. Rockah, P. Schultheiss, Array shape calibration using sources in unknown locations–part I: farfield sources. IEEE Trans. Acoust. Sig. Process. 35(3), 286–299 (1987).
Y. Noam, H. Messer, Notes on the tightness of the hybrid Cramér Rao lower bound. IEEE Trans. Sig. Process. 57(6), 2074–2084 (2009).
H. Messer, in Fourth IEEE Workshop on Sensor Array and Multichannel Processing (SAM). The hybrid CramérRao lower bound  from practice to theory (Waltham, 2006), pp. 304–307.
S. Bay, B. Geller, A. Renaux, J. Barbot, J. Brossier, On the hybrid Cramér Rao bound and its application to dynamical phase estimation. IEEE Sig. Process. Lett.15:, 453–456 (2008).
X. Zhang, P. Willett, Y. BarShalom, Dynamic CramérRao bound for target tracking in clutter. IEEE Trans. Aerosp. Electron. Syst.41(4), 1154–1167 (2005).
L. Xu, X. R. Li, in 12th International Conference on Information Fusion. Hybrid CramérRao lower bound on tracking ground moving extended target (IEEESeattle, 2009), pp. 1037–1044.
H. L. Van Trees, K. L. Bell, Bayesian bounds for parameter estimation and nonlinear filtering/tracking. AMC. 10:, 12 (2007).
J. Benesty, M. M. Sondhi, Y. Huang, Springer handbook of speech processing (Springer, Berlin, Heidelberg, 2008).
R. G. Brown, P. Y. C. Hwang, Introduction to random signals and applied kalman filtering: with matlab exercises and solutions, 3rd edn. (John Wiley & Sons, Reading, Massachusetts, 1997).
E. A. Wan, R. Van Der Merwe, in Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium. The unscented Kalman filter for nonlinear estimation (IEEELake Louise, 2000), pp. 153–158.
D. Hertz, Time delay estimation by combining efficient algorithms and generalized crosscorrelation methods. IEEE Trans. Acoust. Speech Sig. Process. 34(1), 1–7 (1986).
J. Chen, Y. Huang, J. Benesty, Time delay estimation. (Y. Huang, J. Benesty, eds.) (Springer, Boston, MA, 2004).
M. Crocco, A. Trucco, A. Del Bue, in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Room reflectors estimation from sound by greedy iterative approach (IEEECalgary, 2018), pp. 6877–6881.
I. Dokmanić, R. Parhizkar, A. Walther, Y. M. Lu, M. Vetterli, Acoustic echoes reveal room shape. Proc. Natl. Acad. Sci.110(30), 12186–12191 (2013).
S. Gannot, M. Moonen, in The International Workshop on Acoustic Echo and Noise Control (IWAENC). On the application of the unscented Kalman filter to speech processing (Kyoto, Japan, 2003), pp. 8–11.
S. Haykin, Kalman filtering and neural networks, vol. 47 (John Wiley & Sons, New York, 2004).
F. Gini, R. Reggiannini, On the use of CramérRaolike bounds in the presence of random nuisance parameters. IEEE Trans. Commun.48(12), 2120–2126 (2000).
C. Ren, J. Galy, E. Chaumette, F. Vincent, P. Larzabal, A. Renaux, Recursive hybrid Cramér Rao bound for discretetime Markovian dynamic systems. IEEE Sig. Process. Lett.22(10), 1543–1547 (2015).
E. A. P. Habets, Room impulse response generator (2016). https://github.com/ehabets/RIRGenerator. Accessed 2 Jan 2014.
Y. Noam, J. Tabrikian, Marginal likelihood for estimation and detection theory. IEEE Trans. Sig. Process. 55(8), 3963–3974 (2007).
Acknowledgements
Not applicable.
Funding
This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 871245.
Author information
Affiliations
Contributions
MV, YN, and SG developed the model. Experimental testing was carried out by MV and YN. The paper was written by MV, YS, and SG, with MV the first author. The authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
All authors agree to publish the submitted paper in this journal.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Sharon Gannot is a Member of EURASIP
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Veisman, M., Noam, Y. & Gannot, S. The hybrid CramérRao lower bound for simultaneous selflocalization and room geometry estimation. EURASIP J. Adv. Signal Process. 2021, 3 (2021). https://doi.org/10.1186/s13634020007026
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634020007026
Keywords
 SLAM
 Speaker localization and tracking
 Room mapping
 Hybrid CramérRao lower bound