 Research
 Open access
 Published:
Urban localization using robust filtering at multiple linearization points
EURASIP Journal on Advances in Signal Processing volumeÂ 2023, ArticleÂ number:Â 100 (2023)
Abstract
We propose a robust Bayesian filtering framework for state and multimodal uncertainty estimation in urban settings by fusing diverse sensor measurements. Our framework addresses multimodal uncertainty from various error sources by tracking a separate probability distribution for linearization points corresponding to dynamics, measurements, and cost functions. Multiple parallel robust Extended Kalman filters (REKF) leverage these linearization points to characterize the state probability distribution. Employing Raoâ€“Blackwellization, we combine the linearization point distribution with the state distribution, resulting in a unified, efficient, and outlierresistant Bayesian filter that captures multimodal uncertainty. Furthermore, we introduce a gradient descentbased optimization method to refine the filter parameters using available data. Evaluating our filter on realworld data from a multisensor setup comprising camera, Global Navigation Satellite System (GNSS), and Attitude and Heading Reference System (AHRS) demonstrates improved performance in bounding position errors based on uncertainty, while maintaining competitive accuracy and comparable computation to existing methods. Our results suggest that our framework is a promising direction for safe and reliable localization in urban environments.
1 Introduction
In complex urban environments, accurate estimation of both the state and its uncertainty is essential for ensuring the safe navigation of vehicles. The dynamic and unpredictable nature of these environments creates numerous sources of errors that can affect measurements from different sensors. For instance, Global Navigation Satellite System (GNSS) measurements are susceptible to bias error from multipath and nonlineofsight signals in environments with tall structuresÂ [1]. Similarly, visual odometry measurements are influenced by dynamic objects and changes in lighting conditionsÂ [2]. Failing to account for these errors can lead to inaccurate or even misleading estimates of the state and uncertainty, which can have serious consequences for safety.
To address these challenges inherent in urban navigation, many systems rely on multisensor integrationÂ [3, 4]. These systems fuse data from a diverse array of complementary sensors, thereby mitigating individual sensor limitations and leading to a more reliable estimate of the state. However, integrating multisensor data presents its own set of challenges, such as the need to mitigate large errors in sensor measurements and accommodate diverse noise sourcesÂ [5, 6]. These diverse noise sources contribute to multiple modes of uncertainty that may yield multiple plausible solutions for the state. Hence, algorithms for multisensor state estimation must have the dual capability of being robust to large errors while accurately capturing the underlying multimodal uncertainty.
Various techniques have been proposed to account for large errors and incorporate measurements from multiple sensors, such as optimizationbased methods and robust Bayesian filters. Optimizationbased methods, such as factor graphs with robust cost functionsÂ [7,8,9,10], are capable of handling nonlinearities and uncertainties by optimizing over a large state space consisting of multiple states and uncertainty parameters. However, these approaches are computationally demanding due to the size of the optimization space, and require significant parameter tuning to achieve good estimation performanceÂ [11]. Moreover, state uncertainty is often not assessed in optimizationbased approaches due to the increased computational demands, which limits their utility in scenarios where reliable localization is crucial.
On the other hand, Bayesian filtersâ€”such as Extended Kalman filters (EKFs) and Unscented Kalman filters (UKFs)â€”estimate the probability distribution of the state based on the history of measurementsÂ [12]. These filters utilize Bayesian statistics to track both the state estimate and its associated uncertainty by combining prior state information, vehicle dynamics, and sensor measurements across time. For example, EKFs have been used to provide localization estimates by integrating stereo camera measurements and RealTime Kinematic GPS (RTKGPS)Â [13]. Another workÂ [14] proposed a similar method by fusing GPS, INS, a monovision camera, and a 3D cartographical model. An adaptive EKF framework inÂ [15] used Google Street View images with GPS for state estimation. Outlierrobust versions of these filters use strategies such as robust cost functions, statistical tests, minimax optimization, or heavytailed priors to handle outliers and model complex noise distributions. For example, robust UKFs were explored inÂ [16] for tight integration of multiple GPS receivers with a monocular camera and an IMU, and inÂ [17] for vehicle geolocalization with a GPS receiver, a video camera, and a 3D city model. While these filters are computationally efficient and easy to implement, they require careful parameter tuning based on the application to achieve good state and uncertainty estimation performance. This can become challenging in complex systems that involve multiple sensor measurements and their associated parametersÂ [18]. Moreover, these filters rely on a Gaussian distribution to model the state probability distribution, which restricts their capacity to adequately account for the multimodal uncertainty that can arise due to sensor measurements in complex urban environments.
Particle filter, another type of Bayesian filter, captures multimodal uncertainty over the state by tracking the distribution as a weighted collection of points in the state spaceÂ [19]. In a recent workÂ [20], a particle filtering framework was proposed for fusing GNSS with camera images and for characterizing the uncertainty in localization from sensor fusion. Similarly, inÂ [21], images from a monocular camera were fused with lowcost GPS sensors and a map to provide highaccuracy localization. Variants of particle filters have been explored for multimodal sensor fusion such as decentralized filtersÂ [22] and differentiable filtersÂ [23]. However, particle filters are often challenged by the â€ścurse of dimensionalityâ€ťâ€”or exponentially increasing computational complexity with higherdimensional state spacesâ€”leading to prohibitive computational costsÂ [24].
Recent literature has proposed two other promising approaches to efficiently capture multimodal uncertainty over the state: employing a combination of filters, also known as Interacting Multiple Model(IMM) filters Â [25], and modeling the distribution as a mixture of Gaussian distributionsÂ [26]. For example, IMM filters have been explored for providing positioning by integrating lowcost GPS and invehicle sensors while adapting the vehicle model to various driving conditionsÂ [27]. An IMM filter was proposed inÂ [28] to provide faulttolerant positioning by integrating IMU with wheel encoders in GPSdegraded environments for mobile robots. Selfadaptive Gaussian mixture models were used inÂ [29] to model the effects of nonGaussian GNSS outliers. These approaches have shown an improved ability in capturing realworld uncertainty in a variety of robotics and navigation applications. However, these techniques also have limitations, such as model complexity and difficulty in parameter selection.
In addition to conventional approaches, various studies have introduced datadriven adaptations of filtering algorithms by incorporating machine learning and deep learning techniques. To enhance localization accuracy during GNSS outages, an adaptive Kalman filter that incorporates neural networkderived position and velocity measurements was proposed inÂ [30]. A fully differentiable pipeline to train an Extended Kalman Filter (EKF) for visualinertial odometry was established inÂ [31]. Similarly, other researchers have explored neural networkbased methodologies for modeling diverse parameters within filtering techniquesÂ [23, 32,33,34]. These methods use existing data to automatically infer good parameters for improving the accuracy of localization. However, the lack of transparency within these methods poses a challenge, as relying on parameters finetuned for accuracy on a limited dataset may not adequately account for uncertainty in the general case.
The limitations of existing robust state estimation methods in effectively capturing uncertainty highlight the need for developing techniques to address these challenges. In this work, we propose a novel robust Bayesian filtering framework that uses Raoâ€“Blackwellization to effectively capture multimodal uncertainty while maintaining the accuracy and error resilience of existing robust Bayesian filtering approaches. Our approach enhances the robust EKF by tracking multiple points where the EKF and its robust cost function are linearized. By employing a particle filter to effectively track these linearization points, our framework can overcome the limitations of the linearization and Gaussian assumptions in the EKF, capturing the multimodal uncertainty in a computationally efficient manner.
We use Raoâ€“BlackwellizationÂ [35] to integrate the robust EKF and the particle filter into a single robust Bayesian filter that estimates a multimodal probability distribution of the state. This facilitates the reuse of the standard filtering components in the EKF for efficient implementationÂ [36]. We then apply this filter to a multisensor setup comprising of camera, GNSS, and attitude and heading reference system (AHRS), where the sensor modules are developed based on our previous workÂ [37]. However, tuning the parameters for our complex multisensor setup is challenging. To address this, we devise a gradient descentbased optimization strategy for efficient parameter tuning.
Next, we use the estimated multimodal uncertainty to develop an approach for computing position error bounds along the lateral, longitudinal, and vertical directions of motion. These bounds enable us to assess the positioning performance for safe operation in urban environmentsÂ [38]. Our approach is implemented in PyTorchÂ [39] to leverage parallelization for fast execution and automatic differentiation capabilities.
Our strategy for using Raoâ€“Blackwellization to improve the estimation of state and uncertainty shares similarities withÂ [40, 41] and distinguishes itself from the conventional RaoBlackwellized particle filter in several key aspects. In traditional applications of Raoâ€“Blackwellization, the overall state space of interest is partitioned into multiple substates, where each substate is tracked with a different filterÂ [26, 42, 43]. Our approach, on the other hand, introduces an additional termâ€”the linearization pointâ€”and employs Raoâ€“Blackwellization to factor it separately from the primary state of interest. Moreover, our approach emphasizes multisensor and robust filtering settings and focuses on characterizing multimodal uncertainty to compute position error bounds, whereas the previous approaches do not address these aspects.
The contributions of this work are summarized as follows:

We develop a novel Bayesian filtering framework that captures multimodal uncertainty while accommodating diverse sensor measurements and possible outliers. Our approach tracks a probability distribution of the points for linearizing the dynamics, observation, and robust cost models, effectively accounting for errors from approximation and diverse noise sources.

Drawing on recent advancements in differentiable filter design researchÂ [23, 44], we present a gradient descentbased optimization strategy for tuning the parameters in our filter. The optimization objective is expressed as a total loss function that includes both measurement and position loss terms.

We present a method for estimating position error bounds along the lateral, longitudinal, and vertical directions of the vehicleâ€™s motion. Our proposed approach uses the multimodal probability distribution obtained from our hybrid filter and explicitly accounts for uncertainties in both position and heading estimation.

We validate our approach using realworld data from Hong KongÂ [45], a dense urban environment, and demonstrate its effectiveness in achieving improved reliability in the position error bounds, while maintaining competitive state estimation performance compared to existing methods. Moreover, we demonstrate that the computational requirements of our approach are comparable to existing methods, making it a practical choice for state estimation in urban environments.
The rest of the paper is structured as follows: in Sect.Â 2, we present background on robust Bayesian filters. Next, we derive our proposed robust filter for multimodal uncertainty quantification in Sect.Â 3. SectionÂ 4 outlines the dynamics and the sensor modules in our multisensor setup. We describe the estimation of position error bounds from the captured multimodal uncertainty in Sect.Â 5. The loss functions and optimization strategy for tuning the filter parameters are presented in Sect.Â 6. Experimental results and discussions for the performance evaluation of our approach are provided in Sect.Â 7. Finally, we conclude the paper with Sect.Â 8.
2 Background on robust Bayesian filters
We consider a nonlinear discretetime system with a timeinvariant transition function f and a timevarying measurement function \(h_t\) given as follows:
where \(x_{t}\) is the system state and \(m_t\) is the measurement at time t. The process noise \(w_t\) and the measurement noise \(\nu _t\) are zero mean with covariance matrix \(Q_t\) and \(R_t\), respectively. The timevarying nature of \(h_t\) accommodates an asynchronous measurement setting, where measurements from different sensors are obtained at distinct time instances due to varying sampling rates.
The objective of Bayesian filtering is to estimate the probability distribution of the state vector \(x_t\) given the sequence of measurement vectors \(m_{1:t} = \{m_1, \ldots , m_t\}\) and the prior probability estimate \(p(x_{t1}  m_{t1})\). Using Bayesâ€™ rule and the law of total probability, the posterior probability of the state conditioned on the measurements is given as
where \(p(m_t  x_t)\) is the likelihood function representing the probability of observing measurement \(m_t\) from state \(x_t\). The term \(p(x_t  x_{t1})\) represents the transition probability from state \(x_{t1}\) to \(x_t\).
Conventional approaches to Bayesian filtering approximate the probability distribution \(p(x_t  m_t)\) as a Gaussian distribution for efficiency and tractability. For example, the extended Kalman filter (EKF) uses a linearized version of the transition and measurement functions, and the unscented Kalman filter (UKF) uses nonlinear function evaluations at preselected sigma points to model \(p(x_t  m_t)\)Â [46]. Using these approximations, \(p(x_t  m_t)\) can be efficiently estimated from \(p(x_{t1}  m_{t1})\) and the measurements through Eq.Â 3 and matrix operations. This process is commonly broken down into two steps, namely the predict and the update stepÂ [47]. However, the estimation accuracy of these approaches can substantially degrade when the process and measurement noise significantly deviates from the modeled Gaussian distribution, which is a common occurrence when using sensors in realworld settingsÂ [48,49,50].
To improve the accuracy in nonGaussian settings, we can replace the Gaussian likelihood used in conventional Bayesian filtering approaches with a robust cost functionbased likelihood
where \(r_t = m_t  h_t(x_t)\) is the residual, and \(\rho (r_t)\) is the robust cost function, such as Huber loss or Tukey biweight lossÂ [51]. The robust cost function reweights the influence of measurements based on their residuals, thereby reducing the impact of outliers and unmodeled errors.
The robust costbased likelihood can be integrated into the EKF and UKF frameworks by modifying the update stepÂ [52,53,54,55]. A common method to do this involves reweighting the measurement covariance matrices \(R_t\) based on the firstorder approximation of the cost function \(\psi (r_t)\), also known as the influence function,
The measurement covariance matrix \(R_t\) is then reweighted using \(\psi (r_t)\) to obtain a robust covariance matrix \(\tilde{R}_t\):
The robust measurement covariance matrix \(\tilde{R}_t\) is then used in the update step of the EKF and UKF to compute the Kalman gain and update the estimate of the state and covariance.
The performance of robust Bayesian filtering depends on the careful selection of a robust cost function and its associated parameters, which can be challenging to identify as they depend on the sensor configuration and environmental characteristicsÂ [56]. Furthermore, the approximations employed by these filters to enhance computational efficiencyâ€”such as linearizing in transition, measurements, and robust costâ€”can also introduce inaccuracies in both the estimated state and the quantification of uncertainty. Therefore, it is important to account for the impacts of these error sources to effectively represent the uncertainty within the filtering framework.
3 Proposed robust Bayesian filter with multimodal uncertainty
In the context of multisensor navigation in urban environments, the presence of diverse and dynamic noise can cause significant deviation from the typically assumed Gaussian behavior in the sensor measurements. Instead, the measurements may exhibit multimodal behavior, which then propagates to a multimodal uncertainty in the state estimation processÂ [29, 49]. While the robust Bayesian filters discussed in the previous section can improve state estimation accuracy by relying on a single Gaussian distribution to represent the state probability distribution, they are illequipped to effectively capture the underlying multimodal uncertainty.
To address this challenge, we propose a robust Bayesian filter that incorporates multimodal uncertainty modeling into the estimation process. Our approach extends the standard Bayesian filtering framework by explicitly incorporating a linearization point, denoted as \(x^l_t\), at each timestep t. The linearization point serves as a reference for linearizing the transition and measurement models, as well as for computing the influence function. The overall approach is illustrated in Fig.Â 1
We first rewrite Eq.Â 3 to estimate the posterior probability of the state \(x_t\) given the linearization point \(x^l_t\) and the measurements \(m_{1:t}\)
In this equation, the probability terms are computed in a manner analogous to the robust EKF approach described earlier in Sec.Â 2, with the key distinction that the linearization point \(x^l_t\) is used for linearizing the equations.
In order to incorporate the uncertainty in the selection of an appropriate linearization point \(x^l_t\), we model a separate probability distribution. The corresponding filtering equation is expressed as
This equation has a similar form to Eq.Â 3 and is used to update the linearization point \(x^l_t\) based on available measurements \(m_t\). To effectively model the multimodal uncertainty, we employ a particle filter to estimate \(p(x^l_t  m_{1:t})\). The filter uses a set of weighted particles to represent the probability distribution, where each particle is a linearization point. First, we use the prior distribution \(p(x^l_{t1}  m_{1:t1})\) to generate a set of prediction particles given by the state transition model
where \(w^l_t\) is the process noise for the linearization point at time t with covariance matrix \(Q^l_t\). For simplicity, we set the covariance \(Q^l_t = Q_t\) in this paper. However, we note that the choice of \(Q_t\) can have a significant impact on the estimation accuracy and robustness. Future work can consider exploring the different choices of \(Q^l_t\) as a separate termâ€”such as by taking advantage of the wellestablished methods in particle filter literature and the use of proposal distributionsÂ [57, 58]â€”to improve the estimation performance.
Based on the linearization point \(x^l_{t}\) corresponding to each particle, we can estimate the posterior state probability distribution \(p(x_t  x^l_t, m_{1:t})\) using the Bayesian filtering expression described in Eq.Â 8, while keeping the linearization point \(x^l_{t}\) unchanged. However, due to the randomness in the particle filter, tracking the linearization point in this manner can result in precision errors, particularly when the number of particles is low. To address this issue and improve the accuracy of the estimation process, we propose a twostep approach. First, we estimate \(p(x_t  x^l_t, m_{1:t})\) using the linearization point \(x^l_{t}\). Then, we refine the value of \(x^l_t\) by setting it to the mean of the posterior distribution \(p(x_t  x^l_t, m_{1:t})\). This refinement step ensures that the linearization point is more closely aligned with the state probability distribution, improving the accuracy of the estimation.
In the second step of the particle filter, we update the weight of each prediction particle based on the available measurements \(m_t\) and likelihood \(p(m_t  x^l_t)\). We model the likelihood as a normal distribution with mean based on the nonlinear measurement model \(h_t(\cdot )\) and covariance \(R^l_t\)
To improve estimation performance, appropriate values of \(R^l_t\) can be assigned based on the application. By modeling \(R^l_t\) separately from \(R_t\), we can incentivize tracking distinct linearization points in high uncertainty scenarios without compromising on the estimation accuracy.
Finally, we perform a resampling step on the set of linearization points based on the updated weights to improve our estimate of the posterior distribution \(p(x^l_t  m_{1:t})\). We use the soft resampling strategy proposed inÂ [59] to maintain differentiability in the parameter optimization.
To estimate the overall probability distribution of the state \(p(x_t m_t)\), we combine the estimates from the particle filter and the robust EKF using the law of total probability and the RaoBlackwell theorem
which allows us to estimate \(p(x_t m_{1:t})\) by conditioning on the linearization point \(x^l_t\). This approach has a smaller estimation variance than Eq.Â 3 based on the RaoBlackwell theorem, allowing us to better capture the uncertainty in the estimate of the state \(x_t\). The overall algorithm is outlined in Algorithm 1.
The use of a particle filter allows us to capture multiple modes in the distribution of the linearization point, which in turn enables us to better represent the multimodal uncertainty in state estimation. However, particle filters are often challenged by the â€ścurse of dimensionalityâ€ť when the state space is large. To address this challenge, we restrict the domain of the linearization point to the position domain, reducing the dimensionality of the particle filter state space. This enables us to maintain computational efficiency while modeling the uncertainty inherent in the system.
4 Multisensor state estimation
In this section, we describe how we apply the approach described in the previous sections for multisensor state and uncertainty estimation. FigureÂ 2 illustrates the multisensor setup and the individual sensor modules. The camera module captures an image at each timestep, and utilizes robust features and depth estimated using deep neural networks to assess the motion. The GNSS module combines pseudorange measurements from multiple satellite constellations and a base station. The attitude and heading reference system (AHRS) provides the orientation and angular velocity measurements. Our proposed filter integrates all of these measurements to estimate the state and uncertainty. Further details regarding the filter and each of the sensor modules are provided in the subsequent sections.
4.1 State space and dynamics model
The state vector \(x_t = \begin{bmatrix}p_t\ v_t\ q_t\ \tilde{\omega }_t\end{bmatrix}\) consists of position \(p_t\), velocity \(v_t\), orientation quaternion \(q_t\) and the bias in angular velocity measurements \(\tilde{\omega }_t\) at time t. We model the vehicle motion as a constant velocity model using the angular velocity \(\omega _t\) obtained from the AHRS:
where \(\Delta t\) is the time step between consecutive measurements, and \(\dot{q}(\omega )\) is the quaternion derivative corresponding to the angular velocity \(\omega\). To avoid numerical stability issues, we renormalize the quaternion \(q_t\) to a unit quaternion each time we update it based on the dynamics. The process noise covariance matrix \(Q_t\) is given as
where \(Q^p_t\), \(Q^v_t\), and \(Q^{\tilde{\omega }}_t\) denote the diagonal process noise covariance matrices for position, velocity, and angular velocity bias, respectively. We construct the process noise covariance for the quaternion \(Q^q_t\) by linearizing the Eulertoquaternion transformation about \(q_t\), which enables us to incorporate correlation across terms and obtain better uncertainty characterization in the dynamics. To compute \(Q^q_t\), we first obtain the Jacobian \(J^q_t\) of the Eulertoquaternion transformation. We then use the diagonal \(3\times 3\) process noise covariance matrix \(\tilde{Q}^q_t\) in the Euler space, and apply it to \(J^q_t\) as
To account for the varied acquisition rates of sensors, we utilize sequential or asynchronous updates within our filterÂ [60, 61]. Specifically, the state probability distribution is updated incrementally as the measurements arrive, with different update steps executed for the measurements. This approach both improves computational efficiency and enables modularity, allowing easy addition or removal of the different sensor components.
In the following sections, we provide a detailed description of each sensor module and the corresponding filter observation model.
4.2 Camera module
Our visual odometry pipeline utilizes consecutive image frames \(\{I_{t1}, I_t\}\) from an onboard monocular camera to estimate the vehicle motion. The choice of a monocular camera is motivated by its advantages in terms of affordability, compactness, and versatility over other visual sensorsÂ [62]. While traditional techniques based on featurematching from monocular images can estimate motion up to an unknown scale factorÂ [63, 64], we leverage recent research on visual odometry to develop a more accurate approach. Our pipeline incorporates neural networks to both detect features and estimate their associated depth from camera images, which allows us to obtain scaleconsistent and robust visual odometry. This strategy, which combines monocular depth estimates with keypoints detected on the image, has been previously explored in [65, 66] and has been shown to outperform both geometrybased and learningbased methods. Throughout the paper, we refer to this pipeline as MonocularDepth aided Visual Odometry (MDVO).
MDVO relies on keypoint detection and depth estimation to construct the observation model.
Keypoint detection In the first step, we detect a set of keypoints \(\textbf{k}_t = \{k^1_t, \ldots , k^M_t\}\) and descriptors \(\textbf{d}_t = \{d^1_t, \ldots , d^M_t\}\) from the input image \(I_t\) where M is the total number of detected features. Keypoints are the distinctive local regions in the image, while descriptors are vectors that represent these regions with invariance to motion and lighting deformations. However, realworld camera images are often subject to significant variations over time, leading to spurious feature matching and erroneous motion estimationÂ [2]. To mitigate the effects of spurious featurematching, we use the neural networkbased SuperPoint detectorÂ [67], which has been optimized using realworld images. This detector is designed to identify keypoints and descriptors that are robust to realworld deformations and suitable for multipleview geometry problems. By utilizing these features, we improve our feature extraction step and improve positioning performance in urban environments.
Next, we match the descriptor set \(\textbf{d}_t\) with the descriptor sets \(\{\textbf{d}_{t1}, \ldots , \textbf{d}_{tT_{SP}}\}\) from the previous \(T_{SP} = 10\) frames. We rank the matches based on the sum of closest matching scores \(\eta ^i_{SP}\) calculated as:
We apply a threshold \(\tau _{SP}\) to the scores \(\eta ^i_{SP}\) to select the 50 bestperforming keypoints for subsequent motion estimation. This strategy ensures that the selected features persist in multiple frames, leading to more robust feature detection. To further enhance the robustness of subsequent motion estimation against outliers, we employ Random Sample Consensus (RANSAC) algorithmÂ [68] to identify and remove large outliers.
For the remainder of this section, we will use the notation \(\textbf{k}_t\) to refer to the \(M'\) best keypoints identified using the aforementioned method. Additionally, we will use \(\textbf{k}_{t1}\) to denote the matching keypoints (i.e., the minimizers of \(\eta ^{i}_{SP}(t1)\)) from the previous frame.
Depth Estimation In the depth estimation step, we estimate the 3D depth \(I^D_{t1}(k)\) associated with every pixel k in frame \(I_{t1}\) from the previous time \(t1\). Projecting 2D image coordinates to 3D is an illposed and inherently ambiguous task. However, previous research has shown that this task can be effectively accomplished by leveraging patterns in the appearance of objectsÂ [69, 70].
In this work, we utilize pretrained models from MonoVOÂ [70] to estimate the depth. To reduce computation requirements, we initiate depth calculation from the previous image acquisition timestep, as we only require the 3D locations at the previous time.
Next, we estimate the 3D coordinates \(\tilde{\textbf{k}}_{t1}\) that correspond to the 2D keypoints \(\textbf{k}_{t1}\) as
where \(K_\text{cam}\) is the matrix containing camera intrinsic parameters. Using these estimated 3D coordinates, we construct observation models based on the current frame.
To construct the observation model for the camera module, we consider the sensor measurement \(m_t\) based on the current frame 2D keypoints \(\textbf{k}_{t}\) and an estimated body frame velocity \(\tilde{v}_t\). The measurement vector \(m_t\) is given as
here \(\textbf{k}_{t}\) is rearranged in a columnmajor fashion to obtain a single vector, which makes it easier to use within the filter. We obtain the velocity \(\tilde{v}_t\) from the matched 3D2D keypoints using perspectivenpoint algorithmsÂ [71]. We enforce \(\tilde{v}_t\) to be nonzero only along the axis that aligns with the forward direction of the vehicleâ€™s motion. Our empirical evaluations indicate that adding this velocity measurement to the overall measurement vector helps regularize the filterâ€™s estimate of velocity and improves the overall stability of the filter. Based on the estimated 3D coordinates \(\tilde{\textbf{k}}_{t1}\) of the previous frame keypoints \(\textbf{k}_{t1}\), we model the measurement \(m_t\) as a function of both the vehicleâ€™s state \(x_t\) and an estimate of the keyframeâ€™s position \(\tilde{p}_{t1}\) and orientation \(\tilde{q}_{t1}\)
where \(R(q_t)\) and \(R(\tilde{q}_{t1})\) are rotation matrices corresponding to the orientations \(q_t\) and \(\tilde{q}_{t1}\), respectively, and \(\zeta\) is the scaling parameter. The measurement noise matrix \(R_t\) corresponding to \(m_t\) is set as a diagonal matrix with identical values for each keypoint. In addition, we assign small values to the diagonal terms corresponding to the zerovalued entries in \(\tilde{v}_t\). This is because the vehicle can be assumed to have negligible or zero motion along nonforward directions with a high level of certainty. By assigning smaller values to these terms, the filter is incentivized to model the vehicleâ€™s motion primarily along the forward direction.
We estimate the keyframeâ€™s position \(\tilde{p}_{t1}\) and orientation \(\tilde{q}_{t1}\) separately from the vehicleâ€™s state \(x_t\) by running a concurrent EKF with identical dynamics and sensor modules, except for a modification to the camera module. In the camera module, we set the observation model \(h_t(x_t)\) to the expected body frame velocity obtained from Eq.Â 22, and measurement \(m_t\) to the velocity \(\tilde{v}_t\) derived through 3D2D matching. Based on our experiments, using a separate filter to estimate the keyframe location provides more stable estimates compared to using estimates based on the same filter.
4.3 GNSS module
Our GNSS pipeline utilizes pseudorange measurements \(\varvec{\rho }_t = \{\rho ^{1}_t, \ldots , \rho ^{M_t}_t\}\) from multiple constellations (GPS and Beidou) and a nearby base station, where \(M_t\) is the total number of visible GNSS satellites at time t. These measurements are obtained by calculating the time delay between the transmission of a signal from a satellite and its reception at the receiver onboard the vehicle. The kth pseudorange measurement \(\rho ^{k}_t\) is related to the distance between the receiver and the satellite by the following equationÂ [72]:
where \(p^{k}_t\) denotes the position of the satellite corresponding to the kth measurement at time t, \(b_\text{const}\) denotes the bias error in the receiverâ€™s clock that also depends on the GNSS constellation, \(b^k\) denotes the bias error in the satelliteâ€™s clock, \(\epsilon _\text{ion}, \epsilon _\text{trop}\) denote the errors due to ionospheric and tropospheric effects, \(\epsilon _\text{mp}\) denotes the error due to multipath effects and \(\epsilon\) denotes the random error. We use multiple GNSS constellations to compensate for limited satellite visibility in urban environments.
To utilize the pseudorange measurements \(\varvec{\rho }_t\) for positioning, it is necessary to account for the various error sources specified in Eq.Â 23. A common approach is to employ doubledifferenced (DD) measurementsÂ [73]. To construct DD measurements, measurements from the base station \(\tilde{\varvec{\rho }}_t\) are first subtracted from the received measurements \(\varvec{\rho }_t\), thereby canceling out the effects of ionospheric and tropospheric disturbances, as well as satellite clock bias. To account for the receiver clock bias, a reference satellite (index denoted as \(\mathrm ref\)) is selected and its measurements are subtracted from all other measurements. For the kth satellite, the DD measurement \(\nabla \Delta \rho ^{k}_t\) is expressed as follows:
The above approach successfully removes most error sources based on clock differences between the satellite and the receiver. However, DD measurements that form differences between measurements from different constellations still have a remaining receiver clock bias error due to lack of synchronization between the constellationsÂ [72]. This errorâ€”known as the InterSystem Bias (ISB)â€”does not change significantly over long periods of time. Therefore, we preestimate this error from initial measurement residuals and ground truth data.
The overall measurement vector \(m_t\) for the GNSS module is constructed using DD measurements as
The corresponding sensor observation model \(h_t(x_t)\) is as follows:
where the double differences are constructed in a similar way to the DD measurements in Eq.Â 25.
We set the measurement noise matrix \(R_t\) corresponding to \(m_t\) as a diagonal matrix with identical entries for all measurements for simplicity, which is common practice in filtering literature. However, it is worth noting that alternative methods of assigning covariance exist that take into account additional properties such as signal strength and satellite elevationÂ [74]. Investigating the impact of such techniques on filter performance could be a direction for future work.
4.4 AHRS module
In the AHRS module, we use the orientation measurements \(o_t\) to update the orientation quaternion \(q_t\) tracked by the filter. The sensor observation model \(h_t(x_t)\) is the estimated quaternion \(q_t\). The measurement \(m_t\) is set as the quaternion \(q(o_t)\) corresponding to the measured orientation \(o_t\), where \(q(\cdot )\) is the Eulertoquaternion transformation function. The measurement noise covariance matrix \(R_t\) is calculated using the Jacobian \(J^q_t\) of the Eulertoquaternion transformation and a \(3\times 3\) diagonal covariance matrix \(\tilde{R}_t\) for Euler angles, similar to Eq.Â 14
To ensure numerical stability, we set the offdiagonal terms in \(R_t\) to zero and only keep the diagonal terms.
5 Estimating position error bounds
To ensure the safety of our proposed filtering method for location estimation, we use the tracked probability distribution \(p(x_t  m_{1:t})\) to estimate probabilistic bounds on the estimation error. These error bounds can then be compared against carefully designed safety limits, such as the ones described in Â [38], to detect situations when the filterâ€™s location output is unsafe for navigation. Incorporating these checks adds a layer of protection to the localization system against uncorrectable estimation errors, improving the systemâ€™s reliability.
To estimate the position error bounds, we first utilize the vehicleâ€™s estimated orientation \(q_t\) to determine the transformation from the global coordinate frame to the coordinate frame aligned with the lateral (sidetoside), longitudinal (forwardâ€“backward), and vertical directions of the vehicleâ€™s motion. This transformation allows us to project the probability distribution \(p(x_t  m_{1:t})\) onto the motion axes of the vehicle, enabling us to estimate the error along directions that are more relevant from a safety perspective.
The position part of the state probability distribution is represented as a weighted sum of Gaussian distributions, where the ith componentâ€™s mean and covariance are denoted by \(p_{t, i}\) and \(\Sigma ^p_{t, i}\), respectively. To estimate the probability distribution corresponding to the axesaligned error vector \(e_t\), we project each Gaussian distribution component along the axes specified by \(q_t\).
To compute the mean \(\mu ^e_{t, i}\) parameter of the ith projected Gaussian distribution component, we apply the rotation matrix \(R(q_t)\) to compute the difference between \(p_{t, i}\) and the filter position estimate \(p_t\)
To estimate the covariance parameter \(\Sigma ^e_{t, i}\), we also need to account for the uncertainty in estimating the orientation. We first generate M samples of orientation \(\{q_1, \ldots , q_M\}\) as
where \(\Sigma ^q_{t}\) denotes the filter orientation covariance. We then project the covariance \(\Sigma ^p_{t, i}\) by applying the rotation matrix \(R(q_j)\) and use Monte Carlo integration to estimate \(\Sigma ^e_{t, i}\):
To obtain a probabilistic error bound for a given direction d and confidence level \(\alpha\), we seek to compute the quantile \(\tau ^{\alpha }_d\) from the error distribution \(p(e_t)\) that satisfies the following equation:
where \(\mathbbm {1}_d\) is a 3dimensional unit vector with the entry corresponding to direction d set to one and the rest set to zero. However, as the error distribution is represented as a mixture of Gaussian distributions, we cannot compute this quantile analytically. Instead, we overapproximate the bound by maximizing across the componentwise quantiles. The position error bound \(\tau ^{\alpha }_d\) for direction d is estimated as
where \(\tau ^{\alpha /2}_{d, i}\) denotes the quantile of the Gaussian distribution \(\mathcal {N}(\mu ^e_{t, i} \mathbbm {1}_d, \mathbbm {1}_i^\top \Sigma ^e_{t, i} \mathbbm {1}_d)\) for confidence level \(\alpha /2\). The quantile \(\tau ^{\alpha /2}_{d, i}\) is computed using the absolute value of the mean to provide a onesided bound on the position error, which is the maximum possible deviation in the positive or negative direction along the axis. Similarly, we use the confidence level of \(\alpha /2\) for these bounds since it includes both sides of the probability distribution.
The position error bounds estimated from our approach are designed to account for the uncertainty resulting from the various sources of noise and linear approximations inherent in the filtering process. In instances where the noise sources are unbiased and the approximations hold true, the quantiles derived from the different components would exhibit similar behavior, resulting in a performance that is similar to an EKF. However, in cases where these assumptions do not hold, our methodology enables us to capture and quantify the uncertainty present across the different components, thereby offering an advantage over the Gaussian approximations utilized in the EKF.
6 Optimizing filter parameters
Designing effective filtering methods involves selecting appropriate values for all the filter parameters. For our filter, these parameters include standard deviation terms for the dynamics, sensor observation models and the scaling factor in the camera module. However, choosing suitable values for these parameters can be challenging and timeconsuming due to the filterâ€™s complexity.
In settings with available prior ground truth data and measurements, we can address this challenge by leveraging ideas from recent research on differentiable filter designÂ [23, 44]. We present a gradientbased optimization strategy that uses available measurements and ground truth position data to quickly find values for the filter parameters that result in good filter performance. Specifically, we minimize a total loss function that comprises measurement loss terms and position loss terms.
We employ a windowbased strategy for optimization, wherein we randomly select windows of length \(T=5\) s from the available data. This approach allows us to incrementally tune the filter parameters on subsets of data, which is more computationally efficient than optimizing over the entire dataset. At the beginning of each window, we initialize the filter by sampling from a Gaussian distribution centered at the ground truth position with a standard deviation of 5Â ms, which helps to regularize the optimization and to prevent overfitting to specific data instances.
The measurement loss \(\mathcal {L}_m(\theta )\) with respect to the current parameters \(\theta\) is the negative log probability of observing the measurements \(m_{1:t}\) from the filter given the initial state.
where \(T_m\) denotes the total measurement instances available within the window, and \(x_t\) is estimated using the filter.
The terms in this loss are available at the acquisition rate of the sensor measurements. This loss is similar to the unsupervised optimization objectives used for tuning Bayesian filter parameters in existing researchÂ [75].
The position loss \(\mathcal {L}_p(\theta )\) is the mean squared error between the filter estimated positions \(p_{1:T}\) and the corresponding ground truth positions \(p^*_{1:T}\), which are usually available at a much slower rate than the sensor measurements. The position loss \(\mathcal {L}_p(\theta )\) is expressed as
where \(T_p\) denotes the total ground truth instances available within the window. This loss provides a stronger signal for tuning the filter parameters where explicit supervision is availableÂ [76]. The total loss \(\mathcal {L}(\theta )\) is expressed as the weighted sum of the position loss and each of the measurement losses
where \(\mathcal {L}_\text{cam}, \mathcal {L}_\text{gnss}, \mathcal {L}_\text{ahrs}\) are calculated according to Eq.Â 35. The parameters \(\lambda _\text{cam}, \lambda _\text{gnss}, \lambda _\text{ahrs}\) denote the weighting factors for the losses and are set to 0.2 in our case.
We optimize \(\mathcal {L}(\theta )\) using gradient descent to update the parameters \(\theta\). To ensure numerical stability and prevent nonpositive values during optimization, we use the softplus functionâ€”which is a smooth approximation of the rectified linear unit (ReLU) functionâ€”to enforce positivity for all parameters.
We note that the overall optimization problem consisting of several parameters is complex, and finding the global minimizer is not necessarily possible using this strategy since it depends on the initialization. Nevertheless, we find this strategy useful in refining the parameters starting from heuristically set initial values and reducing the overall tuning effort considerably.
7 Experimental results and discussion
7.1 UrbanNav dataset
We evaluate our proposed filter on the Hong Kong UrbanNav datasetÂ [45] which includes measurements from diverse sensors like GNSS, LiDAR, camera, and IMU, gathered in an urban environment with distinct regions. These regions comprise a wide street, onesided buildings, and mediumheight buildings, creating nonlineofsight and multipath errors in GNSS measurements. The environment also contains dynamic objects that contribute to errors in the camera module as shown in the dataset images in Fig.Â 3b. The dataset spans a total duration of 785 s, covering a path length of 3.64 km with two loops of the same trajectory. The second loop was used to tune the filterâ€™s parameters, while the first loop was used to evaluate its performance.
In our evaluation, we utilize GNSS (GPS and Beidou), a monocular camera, and AHRS measurements from the UrbanNav dataset. The GNSS measurements were captured at 1 Hz using a commercial ublox ZEDF9P receiver, with GPS and Beidou visibility ranging from 4 to 20 satellites at each time instant. AHRS data was obtained from an Xsens Mti 10, collected at 400 Hz. Images were captured at 15 Hz from a ZED2 camera with a resolution of \(1920 \times 1080\). Ground truth was obtained from a postprocessed RTK GNSSINS integrated system, available at 1 Hz. In our experiments, the filter and the baselines were executed at 4 Hz. We focus on positioning performance and its uncertainty in this work.
7.2 Baselines
We compare the performance of our algorithm with respect to five filtering baselines, namely: a) Robust Naive RBPF (RRBPF), b) Robust EKF with tight VO integration (REKF), c) Robust UKF with tight VO integration (RUKF), d) Robust EKF with MDVO only (REKFVO), and e) Robust EKF with GNSS only (REKFGNSS).
Some notes about these baselines are as follows:

RRBPF partitions the state space into position terms and tracks them using a standard particle filter with 20 particles to match our approach. The particle weights are computed using the robust version of the measurement likelihood function, as described in EqÂ 4. An EKF tracks the remaining terms, including orientation, velocity, and angular velocity bias. The position error bounds are determined by calculating the maximum deviation across the tracked particles along the lateral, longitudinal, and vertical directions.

REKF and RUKF use the robust cost function in their update step as described in Sect.Â 2. During the execution of RUKF, it was observed that the covariance matrix becomes nonpositive definite at certain time instances due to numerical errors associated with the UKF, a wellknown challengeÂ [77] attributed to the choice of filter parameters. To address this issue, we reinitialized the covariance matrix and continued executing the filter from the last estimated state.

We use REKFVO and REKFGNSS as simple baselines to gain insights into the filter performance when only a single sensor is used.
We evaluate the filterâ€™s performance on the entire dataset. However, for ease of visualization, we present the results categorized into three distinct regions, as illustrated in Fig.Â 4. This approach enables a comprehensive evaluation of the filter performance within each typical category of regions present in urban environments.
7.3 Experimental setup
The filter and all baselines were tuned and executed on the Stanford Research and Computing Centerâ€™s HPC cluster using an AMD 7502P CPU with 256 GB RAM and an NVIDIA Geforce RTX 2080Ti GPU. PyTorch is used for automatic differentiation and tensor operations. The implementation of the filter and baselines is built upon the multimodal sensor fusion codebase fromÂ [23].
To tune the filter parameters, we use REKF as a reference due to its fast execution time. We employ the same set of parameters for all filters to ensure a fair comparison. The parameters are tuned individually while the remaining parameters are held fixed. We optimize the parameters using the Adam optimizerÂ [78] with a learning rate of 0.01 for 100,000 iterations. The final parameter values, along with the key manually set parameters, are listed in TableÂ 1.
The Tukey biweight functionÂ [79] is used as the cost function in the filters. However, if the input residuals exceed a set threshold, this function becomes constant, causing the influence function to become noninvertible. To address this issue, we approximate the inverse by assigning a high value to the entries associated with residuals above the threshold. These threshold values are manually set and remain unchanged during the optimization of filter parameters using gradient descent.
7.4 Positioning performance
We first compare the positioning performance of our approach with respect to the baselines. FigureÂ 4 shows the trajectory tracking plots of each approach and Fig.Â 5 shows the positioning performance with respect to time for qualitative analysis. The plots demonstrate that our proposed method produces position estimates with higher accuracy than the baselines. Specifically, in the region characterized by mediumheight buildings, our approach exhibits clear improvement compared to REKF. Similarly, our approach outperforms RUKF in a few regions where numerical errors occurred during covariance estimation, necessitating reinitialization. Notably, both REKF and RUKF demonstrate smoother position estimates than our approach. Despite using the same number of particles as our approach, RRBPF performs substantially worse. This observation is consistent with the widely recognized fact that particle filter methods alone require a large number of particlesâ€”that scales exponentially with the size of the state spaceâ€”for good estimation performanceÂ [80]. However, for realworld applications, the computational costs associated with increasing the number of particles may become prohibitively high. We also observe that, in general, baselines with multisensor measurements provide better positioning performance compared to those relying on singlesensor measurements.
To quantitatively evaluate the performance of our approach, we computed the mean, median, and 95th percentile of positioning error for all three sections of the trajectory. The results, presented in TableÂ 2, demonstrate that our method produces positioning errors comparable to those of REKF and REKFGNSS in the wide street and onesided building regions. Additionally, our method exhibits smaller errors than all other baselines in these regions. In the medium urban region, our approach achieves smaller errors than all baselines except RUKF. These results highlight the usability of our approach in realworld sensor fusion settings, where accurate state estimation is important.
7.5 Position error bound evaluation
For qualitative evaluation of the position error bounding performance, we show plots that depict the estimated error bounds versus position error for each approach along the lateral, longitudinal, and vertical directions, as shown in Figs.Â 6, 7 and 8. The position error bounds lying in the upper half of the plotâ€”separated by the dotted red lineâ€”successfully enclose the position error.
Our approach generates position error bounds that are more clustered in the upper half of the plots compared to the baselines in all regions, except for RUKF in the medium urban region. For clarity purposes, we compare our approach qualitatively against REKF and RUKF only, while other baselines are included in the quantitative comparisons.
For quantitative comparison, we employ the failure rate metric to evaluate the effectiveness of each method in bounding the position error. The failure rate metric specifies the fraction of instances where the estimated error bound fails to capture the actual position error, where a smaller value indicates better performance. The results are included in TablesÂ 3, 4, and 5 for the failure rates along the lateral, longitudinal, and vertical directions, respectively.
Based on the presented results, our proposed approach outperforms all baseline filters, except for RUKF in the medium urban region and REKFGNSS in all three regions. REKFGNSS shows promising performance, despite its comparatively lower positioning accuracy, due to its sole reliance on GNSS measurements. The EKFâ€™s Gaussian uncertainty modeling assumptions, which are used in REKFGNSS, hold reasonably true for GNSS measurements. Similarly, RUKF demonstrates good bounding performance in the medium urban region where it also has good positioning performance. However, its performance deteriorates in the onesided building and wide street regions, where it exhibits worse positioning accuracy.
The results indicate that both RUKF and REKF exhibit a higher failure rate along the vertical direction, most likely due to overconfident uncertainty estimates. In contrast, our approach uses the uncertainty across multiple particles to generate more conservative error bounds in the vertical direction with considerably lower failure rates. These observations emphasize the significance of accurate uncertainty parameters in characterizing the position error bounds, particularly in challenging urban environments.
Overall, the qualitative and quantitative evaluations demonstrate that our proposed approach produces more reliable error bounds compared to the baselines. Our approachâ€™s improved position error bounding can be attributed to its capability to capture multimodal uncertainty through multiple linearization points, which effectively captures uncertainty in situations where multiple plausible solutions exist based on the measurements. This reduces the proposed methodâ€™s reliance on the choice of the uncertainty parameters, highlighting the potential of our approach for realworld applications in developing robust and reliable systems for autonomous navigation.
7.6 Computational statistics
We analyzed the average runtime per step for our proposed approach and the baselines, tabulated in TableÂ 6.
Our method and RRBPFâ€”both using 20 particlesâ€”exhibit an average step time that is less than twice that of the EKF and UKFbased baselines. This efficiency is attributed to the utilization of tensor operations to parallelize the EKF operations associated with the linearization points. From a theoretical standpoint, this aligns with the expected computational complexity of \(O(nm^3 + nm^2 + n)\) â€”contributed by the robust filtering, weighting, and resampling stepsâ€”where n is the number of particles and m is the size of the measurement vector. It is worth noting that upon assuming optimal parallelization across all particles, the complexity reduces to \(O(m^3 + n)\). This is comparable to the EKFâ€™s theoretical complexity of \(O(m^3)\).
We also investigate the effect of the number of landmark features used in the measurement vector and the number of particles on the computational time, as shown in Fig.Â 9. Our findings indicate that the computational time stays under 100 ms till 200 features in the measurement vector and then increases at a sublinear rate till 1000 features. Furthermore, the computational time grows linearly with the particles at a slope smaller than 1, demonstrating the computational benefits of efficient tensor operations in our approach. These observations show that our approach can be employed in realworld applications without imposing a significant computational burden.
7.7 Discussion
Our experiments on positioning performance demonstrate that the proposed robust Bayesian filtering framework for positioning produces state estimates that are competitive with the robust EKF and UKF baselines. Moreover, this performance can be achieved with minimal manual tuning through the proposed gradient descentbased optimization strategy. Additionally, our approach outperforms particle filter baselines with the same number of particles when applied to a realworld setting. However, the performance of our method is dependent on the underlying particle filter for tracking linearization points, which results in position estimates that are less smooth compared to the EKF and UKFbased methods. Therefore, it may be beneficial to explore improvements in particle filtering literature to further enhance the performance of our method. Incorporating improved methods for robust state estimation, such as factor graphs, in the framework is another direction for future research to enhance the overall performance.
Our position error bounding experiments indicate that the error bounds estimated using our approach are better correlated with the actual positioning error along each direction than the bounds from most baselines. It is important to note that a limitation of our approach is the difficulty in establishing theoretical guarantees on its performance due to the need for knowing the individual error distributions of each measurement at each time instant. Nevertheless, our empirical results suggest that our approach can identify situations where the positioning error grows large, making it useful in preventing the unsafe use of the estimated location. Hence, our approach has practical utility in safetycritical applications.
Finally, our experiments on computational statistics illustrate that our proposed framework has comparable computational requirements to the underlying robust state estimation method, which is achieved using parallelization with tensor operations. Therefore, our approach can be integrated into realtime localization systems without imposing excessive computational overhead. Future work can further explore adaptively varying the number of tracked linearization points in our framework to improve computational efficiency.
8 Conclusions
In this paper, we introduced a robust Bayesian filtering framework that effectively captures multimodal uncertainty in positioning using diverse sensor measurements while being robust to outliers. Our framework consists of two key components: the robust filter component, which uses techniques such as Extended Kalman Filters for efficient and robust state estimation, and the multimodal uncertainty component, which tracks a probability distribution over points for linearizing the dynamics, measurement models, and robust cost in the robust filter component. We combined these two components using the RaoBlackwell theorem to create a robust Bayesian filter that is resilient to measurement outliers and can capture multimodal state uncertainty.
We validated our proposed filter on realworld data from a multisensor setup comprising a camera, GNSS, and AHRS. To tune the filter parameters, we utilized a gradient descentbased optimization strategy that leverages available measurements and ground truth position data. The results demonstrate our filterâ€™s competitive state estimation performance compared to existing filterbased robust state estimation methods and improved performance in bounding the position errors based on uncertainty. Furthermore, these improvements are achieved with less than twice the computational time of existing Kalman filterbased methods.
These results suggest that our proposed framework is suitable for realworld localization applications where the reliability of the estimated location is critical. Our approach enables more robust and accurate localization in complex environments, thus proving to be a valuable tool for autonomous navigation systems in urban environments.
Availability of data and materials
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.
Abbreviations
 GNSS:

Global Navigation Satellite System
 GPS:

Global Positioning System
 ISB:

Intersystem bias
 AHRS:

Attitude and heading reference system
 EKF:

Extended Kalman filter
 UKF:

Unscented Kalman filter
 DD:

Double differenced
 RTK:

Realtime kinematic
 IMM:

Interacting multiple model
 RBPF:

Raoâ€“Blackwellized particle filter
 VO:

Visual odometry
 IMU:

Inertial measurement unit
 MDVO:

Monocular depthaided visual odometry
 RANSAC:

Random sample and consensus
References
Y.J. Morton, F.S.T. Van Diggelen, J.J. Spilker, B.W. Parkinson (eds.), Position, Navigation, and Timing Technologies in the 21st Century: Integrated Satellite Navigation, Sensor Systems, and Civil Applications (First edition edn. Wiley/IEEE Press, Hoboken, 2020)
B. Kitt, F. Moosmann, C. Stiller, Moving on to dynamic environments: visual odometry using feature classification. In: 2010 IEEE/RSJ international conference on intelligent robots and systems, pp. 5551â€“5556 (2010). https://doi.org/10.1109/IROS.2010.5650517. ISSN: 21530866
M. Camurri, M. Ramezani, S. Nobili, M. Fallon, Pronto: a multisensor state estimator for legged robots in realworld scenarios. Front. Robot. AI 7, 68 (2020)
H. Du, W. Wang, C. Xu, R. Xiao, C. Sun, Realtime onboard 3d state estimation of an unmanned aerial vehicle in multienvironments using multisensor data fusion. Sensors 20(3), 919 (2020)
Y. Jiang, S. Pan, Q. Meng, W. Gao, C. Ma, B. Yu, F. Jia, Robust Kalman filter enhanced by projection statistic detector for multisensor navigation in urban canyon environment. IEEE Sens. J. (2022). https://doi.org/10.1109/JSEN.2022.3230708
R. Sun, Y. Yang, K.W. Chiang, T.T. Duong, K.Y. Lin, G.J. Tsai, Robust IMU/GPS/VO integration for vehicle navigation in GNSS degraded urban areas. IEEE Sens. J. 20(17), 10110â€“10122 (2020)
H.P. Chiu, X.S. Zhou, L. Carlone, F. Dellaert, S. Samarasekera, R. Kumar, Constrained optimal selection for multisensor robot navigation using plugandplay factor graphs. In: 2014 IEEE international conference on robotics and automation (ICRA), (IEEE, 2014), pp. 663â€“670
Q. Zeng, W. Chen, J. Liu, H. Wang, An improved multisensor fusion navigation algorithm based on the factor graph. Sensors 17(3), 641 (2017)
W. Xiwei, X. Bing, W. Cihang, G. Yiming, L. Lingwei, Factor graph based navigation and positioning for control system design: a review. Chin. J. Aeronaut. 35(5), 25â€“39 (2022)
W. Wen, X. Bai, Y.C. Kan, L.T. Hsu, Tightly coupled GNSS/ins integration via factor graph and aided by fisheye camera. IEEE Trans. Veh. Technol. 68(11), 10651â€“10662 (2019)
F. Dellaert, Factor graphs: exploiting structure in robotics. Annu. Rev. Control Robot. Auton. Syst. 4, 141â€“166 (2021)
M. Rhudy, Y. Gu, M.R. Napolitano, An analytical approach for comparing linearization methods in EKF and UKF. Int. J. Adv. Robot. Syst. 10(4), 208 (2013)
L. Wei, C. Cappelle, Y. Ruichek, F. Zann, Intelligent vehicle localization in urban environments using EKFbased visual odometry and GPS fusion. IFAC Proc. Vol. 44(1), 13776â€“13781 (2011)
C. Cappelle, M.E.B. El Najjar, D. Pomorski, F. Charpillet, Localisation in urban environment using gps and ins aided by monocular vision system and 3d geographical model. In: 2007 IEEE intelligent vehicles symposium, (IEEE, 2007), pp. 811â€“816
S.V.S. Chauhan, G.X. Gao, Joint gps and vision estimation using an adaptive filter. In: Proceedings of the 30th international technical meeting of the satellite division of the institute of navigation (ION GNSS+ 2017), pp. 808â€“812 (2017)
A. Shetty, G.X. Gao, Visionaided measurement level integration of multiple GPS receivers for uavs. In: Proceedings of the 28th international technical meeting of the satellite division of the institute of navigation (ION GNSS+ 2015), pp. 834â€“840 (2015)
M. Dawood, C. Cappelle, M.E. El Najjar, M. Khalil, D. Pomorski, Vehicle geolocalization based on immukf data fusion using a GPS receiver, a video camera and a 3d city model. In: 2011 IEEE intelligent vehicles symposium (IV), (IEEE, 2011), pp. 510â€“515
S. Yazdkhasti, J.Z. Sasiadek, Multi sensor fusion based on adaptive Kalman filtering. In: Advances in aerospace guidance, navigation and control: selected papers of the fourth ceas specialist conference on guidance, navigation and control held in Warsaw, Poland, April 2017, (Springer, 2018), pp. 317â€“333
J.S. Liu, R. Chen, Sequential monte Carlo methods for dynamic systems. J. Am. Stat. Assoc. 93(443), 1032â€“1044 (1998). https://doi.org/10.1080/01621459.1998.10473765
A. Mohanty, S. Gupta, G.X. Gao, A particlefiltering framework for integrity risk of GNSScamera sensor fusion. Navigation 68(4), 709â€“726 (2021)
H. Li, F. Nashashibi, G. Toulminet, Localization for intelligent vehicle by fusing monocamera, lowcost gps and map data. In: 13th international IEEE conference on intelligent transportation systems, pp. 1657â€“1662 (2010). https://doi.org/10.1109/ITSC.2010.5625240
M. Rosencrantz, G. Gordon, S. Thrun, Decentralized sensor fusion with distributed particle filters. arXiv preprint arXiv:1212.2493 (2012)
M.A. Lee, B. Yi, R. MartĂnMartĂn, S. Savarese, J. Bohg, Multimodal sensor fusion with differentiable filters. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), (IEEE, 2020), pp. 10444â€“10451
F. Daum, J. Huang, Curse of dimensionality and particle filters. In: 2003 IEEE aerospace conference proceedings (Cat. No. 03TH8652), vol. 4, (IEEE, 4â€“197941993) (2003)
E. Mazor, A. Averbuch, Y. BarShalom, J. Dayan, Interacting multiple model methods in target tracking: a survey. IEEE Trans. Aerosp. Electron. Syst. 34(1), 103â€“123 (1998). https://doi.org/10.1109/7.640267
M. Ulmschneider, C. Gentner, R. Faragher, T. Jost, Gaussian mixture filter for multipath assisted positioning. In: Proceedings of the 29th international technical meeting of the satellite division of the institute of navigation (ION GNSS+ 2016), (Institute of Navigation, Portland, Oregon, 2016), pp. 1263â€“1269
K. Jo, K. Chu, M. Sunwoo, Interacting multiple model filterbased sensor fusion of GPS with invehicle sensors for realtime vehicle positioning. IEEE Trans. Intell. Transp. Syst. 13(1), 329â€“343 (2012). https://doi.org/10.1109/TITS.2011.2171033
M. Kheirandish, E.A. Yazdi, H. Mohammadi, M. Mohammadi, A faulttolerant sensor fusion in mobile robots using multiple model Kalman filters. Robot. Auton. Syst. 161, 104343 (2023). https://doi.org/10.1016/j.robot.2022.104343
W. Wen, X. Bai, L.T. Hsu, T. Pfeifer, Gnss/lidar integration aided by selfadaptive gaussian mixture models in urban scenarios: an approach robust to nongaussian noise. In: 2020 IEEE/ION Position, location and navigation symposium (PLANS), (IEEE, 2020), pp. 647â€“654
J. Cheng, Y. Gao, J. Wu, An adaptive integrated positioning method for urban vehicles based on multitask heterogeneous deep learning during GNSS outages. IEEE Sens. J. (2023). https://doi.org/10.1109/JSEN.2023.3302796
C. Li, S.L. Waslander, Towards Endtoend learning of visual inertial odometry with an EKF. In: 2020 17th conference on computer and robot vision (CRV), pp. 190â€“197 (2020). https://doi.org/10.1109/CRV50864.2020.00033
J. Liu, G. Guo, Vehicle localization during GPS outages with extended Kalman filter and deep learning. IEEE Trans. Instrum. Meas. 70, 1â€“10 (2021). https://doi.org/10.1109/TIM.2021.3097401
C. Chen, C.X. Lu, B. Wang, N. Trigoni, A. Markham, DynaNet: neural Kalman dynamical model for motion estimation and prediction. IEEE Trans. Neural Netw. Learn. Syst. 32(12), 5479â€“5491 (2021). https://doi.org/10.1109/TNNLS.2021.3112460
Y.C. Lin, Y.W. Huang, K.W. Chiang, A neuralKF hybrid sensor fusion scheme for INS/GPS/odometer integrated land vehicular navigation system. In: Proceedings of the 19th international technical meeting of the satellite division of the institute of navigation (ION GNSS 2006), pp. 2174â€“2181, (2006)
K. Murphy, S. Russell, Raoblackwellised particle filtering for dynamic bayesian networks. In: Sequential Monte Carlo methods in practice, pp. 499â€“515 (2001)
G. Hendeby, R. Karlsson, F. Gustafsson, The raoblackwellized particle filter: a filter bank implementation. EURASIP J. Adv. Sign. process. 2010, 1â€“10 (2010)
S. Gupta, A. Mohanty, G. Gao, Getting the best of particle and Kalman filters: GNSS sensor fusion using raoblackwellized particle filter. In: Proceedings of the 35th international technical meeting of the satellite division of the institute of navigation (ION GNSS+ 2022), Denver, Colorado, pp. 1610â€“1623 (2022). https://doi.org/10.33012/2022.18470
T. Reid, S. Houts, R. Cammarata, G. Mills et al., Localization requirements for autonomous vehicles. SAE Intl. J CAV 2(3), 173â€“190 (2019). https://doi.org/10.4271/1202030012
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, A. Lerer, PyTorch: an imperative style, highperformance deep learning library. https://pytorch.org/. Accessed 20 Apr 2023 (2019)
O. Stepanov, V. Vasiliev, A. Toropov, Mapaided navigation algorithms taking into account the variability of position errors of the corrected navigation system. In: 2022 29th saint petersburg international conference on integrated navigation systems (ICINS), (IEEE, 2022), pp. 1â€“5
A.S. Stordal, H.A. Karlsen, G. NĂ¦vdal, H.J. Skaug, B. VallĂ¨s, Bridging the ensemble Kalman filter and particle filters: the adaptive gaussian mixture filter. Comput. Geosci. 15, 293â€“305 (2011)
A. Giremus, A. Doucet, V. Calmettes, J.Y. Tourneret, A raoblackwellized particle filter for ins/gps integration. In: 2004 IEEE international conference on acoustics, speech, and signal processing, vol. 3, p. 964 (2004). https://doi.org/10.1109/ICASSP.2004.1326707
P. Vernaza, D.D. Lee, Robust GPS/insaided localization and mapping via GPS bias estimation, in Experimental Robotics. Springer Tracts in Advanced Robotics, vol. 39, ed. by O. Khatib, V. Kumar, D. Rus (Springer, Berlin, Heidelberg, 2008), pp.125â€“135
A. Corenflos, J. Thornton, G. Deligiannidis, A. Doucet, Differentiable particle filtering via entropyregularized optimal transport. In: International conference on machine learning, (PMLR, 2021), pp. 2100â€“2111
L.T. Hsu, W.W. Nobuaki Kubo, W. Chen, Z. Liu, T. Suzuki, J. Meguro, Urbannav: an opensourced multisensory dataset for benchmarking positioning algorithms designed for urban areas. In: Proceedings of the 34th international technical meeting of the satellite division of the institute of navigation, pp. 226â€“256 (2021)
S. Chen, Kalman filter for robot vision: a survey. IEEE Trans. Ind. Electron. 59(11), 4409â€“4420 (2011)
S.J. Julier, J.K. Uhlmann, Unscented filtering and nonlinear estimation. Proc. IEEE 92(3), 401â€“422 (2004)
L.T. Hsu, Analysis and modeling GPS NLOS effect in highly urbanized area. GPS Solut. 22(1), 7 (2018)
T. Cimiega, S. BadriHoeher, Enhanced state estimation based on particle filter and sensor data with nongaussian and multimodal noise. IEEE Access 9, 60704â€“60712 (2021)
Q. Li, J.P. Queralta, T.N. Gia, Z. Zou, T. Westerlund, Multisensor fusion for navigation and mapping in autonomous vehicles: accurate localization in urban environments. Unman. Syst. 8(03), 229â€“237 (2020)
P. Meer, D. Mintz, A. Rosenfeld, D.Y. Kim, Robust regression methods for computer vision: a review. Int. J. Comput. Vis. 6, 59â€“70 (1991)
M.A. Gandhi, L. Mili, Robust Kalman filter based on a generalized maximumlikelihoodtype estimator. IEEE Trans. Sign. Process. 58(5), 2509â€“2520 (2009)
G. Agamennoni, J.I. Nieto, E.M. Nebot, An outlierrobust Kalman filter. In: 2011 IEEE international conference on robotics and automation, (IEEE, 2011), pp. 1551â€“1558
L. Chang, B. Hu, G. Chang, A. Li, Huberbased novel robust unscented Kalman filter. IET Sci. Meas. Technol. 6(6), 502â€“509 (2012)
C.H. Park, J.H. Chang, Robust LMmedSbased WLS and Tukeybased EKF algorithms under LOS/NLOS mixture conditions. IEEE Access 7, 148198â€“148207 (2019)
H. Wang, H. Li, W. Zhang, J. Zuo, H. Wang, A unified framework for mestimation based robust Kalman smoothing. Sign. Process. 158, 61â€“65 (2019)
F. Gustafsson, Particle filter theory and practice with positioning applications. IEEE Aeros. Electron. Syst. Mag. 25(7), 53â€“82 (2010). https://doi.org/10.1109/MAES.2010.5546308. (Accessed 20220909)
J. Elfring, E. Torta, R. van de Molengraft, Particle filters: a handson tutorial. Sensors 21(2), 438 (2021)
P. Karkus, D. Hsu, W.S. Lee, Particle filter networks with application to visual localization. In: Conference on robot learning, (PMLR, 2018), pp. 169â€“178
Y. Junjun, P. Dongliang, G. Quanbo, Sequential fusion for asynchronous multisensor system based on Kalman filter. In: 2009 Chinese control and decision conference, (IEEE, 2009), pp. 4677â€“4681
S.M. Oh, Multisensor fusion for autonomous uav navigation based on the unscented Kalman filter with sequential measurement updates. In: 2010 IEEE conference on multisensor fusion and integration, (IEEE, 2010), pp. 217â€“222
K. Yokoyama, K. Morioka, Autonomous mobile robot with simple navigation system based on deep reinforcement learning and a monocular camera. In: 2020 IEEE/SICE international symposium on system integration (SII), (IEEE, 2020), pp. 525â€“530
M. He, C. Zhu, Q. Huang, B. Ren, J. Liu, A review of monocular visual odometry. Vis. Comput. 36(5), 1053â€“1065 (2020). https://doi.org/10.1007/s00371019017146. (Accessed 20220916)
D. Burschka, E. Mair, Direct pose estimation with a monocular camera. In: Robot vision: second international workshop, RobVis 2008, Auckland, New Zealand, February 18â€“20, 2008. Proceedings 2, (Springer, 2008), pp. 440â€“453
H. Zhan, C.S. Weerasekera, J.W. Bian, I. Reid, Visual odometry revisited: what should be learnt? In: 2020 IEEE international conference on robotics and automation (ICRA), (IEEE, 2020), pp. 4203â€“4210
M. Geng, S. Shang, B. Ding, H. Wang, P. Zhang, Unsupervised learningbased depth estimationaided visual slam approach. Circuits Syst. Sign. Process. 39, 543â€“570 (2020)
D. DeTone, T. Malisiewicz, A. Rabinovich, Superpoint: selfsupervised interest point detection and description. CVPR Workshop, (2017). https://doi.org/10.48550/ARXIV.1712.07629
M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381â€“395 (1981). https://doi.org/10.1145/358669.358692
C. Zhao, Q. Sun, C. Zhang, Y. Tang, F. Qian, Monocular depth estimation based on deep learning: an overview. Sci. China Technol. Sci. 63(9), 1612â€“1627 (2020). https://doi.org/10.1007/s1143102015828. (Accessed 20220910)
C. Godard, O. Mac Aodha, M. Firman, G.J. Brostow, Digging into selfsupervised monocular depth estimation. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp. 3828â€“3838 (2019)
X.X. Lu, A review of solutions for perspectivenpoint problem in camera pose estimation. J. Phys.: Conf. Ser. 1087, 052009 (2018)
N. Nadarajah, P.J. Teunissen, N. Raziq, Beidou intersatellitetype bias evaluation and calibration for mixed receiver attitude determination. Sensors 13(7), 9435â€“9463 (2013)
P. Misra, P.K. Enge, The global positioning system: 6 signals, measurements, and performance. Int. J. Wirel. Inf. Netw. 1, 83â€“105 (1994). https://doi.org/10.1007/BF02106512
R. Sun, Z. Zhang, Q. Cheng, W.Y. Ochieng, Pseudorange error prediction for adaptive tightly coupled GNSS/IMU navigation in urban areas. GPS Solut. 26, 1â€“13 (2022)
G. Revach, N. Shlezinger, T. Locher, X. Ni, R.J. van Sloun, Y.C. Eldar, Unsupervised learned Kalman filtering. In: 2022 30th European signal processing conference (EUSIPCO), (IEEE, 2022), pp. 1571â€“1575
G. Revach, N. Shlezinger, X. Ni, A.L. Escoriza, R.J. Van Sloun, Y.C. Eldar, Kalmannet: neural network aided Kalman filtering for partially known dynamics. IEEE Trans. Sign. Process. 70, 1532â€“1547 (2022)
Y.G. Choi, J. Lim, A. Roy, J. Park, Positivedefinite correction of covariance matrix estimators via linear shrinkage (2015)
D.P. Kingma, J. Ba, Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
A.E. Beaton, J.W. Tukey, The fitting of power series, meaning polynomials, illustrated on bandspectroscopic data. Technometrics 16(2), 147â€“185 (1974). https://doi.org/10.1080/00401706.1974.10489171
A. Doucet, A.M. Johansen, A tutorial on particle filtering and smoothing: fifteen years later. Handb. Nonlinear Filter. 12(656â€“704), 3 (2009)
Acknowledgements
We would like to thank Asta Wu for peer reviewing this paper and the rest of the Stanford NAVLab for their insightful discussions and feedback.
Funding
We would like to thank Ford Motor Company for providing the funds to support this project.
Author information
Authors and Affiliations
Contributions
SG conceived the idea and wrote the first draft of the manuscript and implemented the experiments; AM provided technical concepts and guided the research; and, SG, AM, and GG finalized the manuscript writeup, proofread and checked the technical correctness of the manuscript, and provided future perspectives of the research. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisherâ€™s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gupta, S., Mohanty, A. & Gao, G. Urban localization using robust filtering at multiple linearization points. EURASIP J. Adv. Signal Process. 2023, 100 (2023). https://doi.org/10.1186/s13634023010627
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13634023010627