Skip to main content

Table 3 Results of ablation study

From: Accurate visual localization with semantic masking and attention

 

Kings’ College

Old Hospital

Shop Facade

St M. Church

Average

EssNet

0.82m, 2.32\(^\circ\)

1.58m, 4.37\(^\circ\)

1.29m, 6.32\(^\circ\)

1.74m, 6.17\(^\circ\)

1.36m, 4.80\(^\circ\)

BiasNet

0.79m, 2.77\(^\circ\)

1.46m, 4.26\(^\circ\)

1.27m, 6.63\(^\circ\)

1.51m, 6.79\(^\circ\)

1.26m, 4.99\(^\circ\)

BiasNet (without dilation)

0.82m, 2.80\(^\circ\)

1.40m, 4.00\(^\circ\)

1.48m, 9.30\(^\circ\)

1.52m, 6.27\(^\circ\)

1.31m, 5.59\(^\circ\)

AttNet

0.84m, 2.28\(^\circ\)

1.60m, 4.41\(^\circ\)

1.19m, 4.96\(^\circ\)

1.57m, 5.56\(^\circ\)

1.30m, 4.30\(^\circ\)

BiasNet + AttNet

0.75m, 2.26\(^\circ\)

1.44m, 3.23\(^\circ\)

0.95m, 5.24\(^\circ\)

1.62m, 5.27\(^\circ\)

1.19m, 4.00\(^\circ\)

BiasAttNet (without RANSAC)

0.82m, 2.11\(^\circ\)

1.60m, 3.96\(^\circ\)

1.00m, 5.39\(^\circ\)

1.71m, 5.77\(^\circ\)

1.28m, 4.31\(^\circ\)

Prior Mask + ResNet

0.72m, 2.22\(^\circ\)

1.68m, 3.62\(^\circ\)

1.17m, 6.57\(^\circ\)

1.78m, 6.29\(^\circ\)

1.34m, 4.68\(^\circ\)

Prior Mask + AttNet

0.84m, 2.25\(^\circ\)

1.63m, 3.78\(^\circ\)

1.30m, 6.84\(^\circ\)

1.55m, 5.38\(^\circ\)

1.33m, 4.56\(^\circ\)

Semantics only

1.43m, 3.30\(^\circ\)

4.32m, 9.87\(^\circ\)

3.23m, 16.01\(^\circ\)

4.53m, 21.04\(^\circ\)

3.38m, 12.56\(^\circ\)

SIFT + RANSAC (448 \(\times\) 448)

0.63m, 0.79\(^\circ\)

1.75m, 1.81\(^\circ\)

0.35m, 1.47\(^\circ\)

0.50m, 1.47\(^\circ\)

0.81m, 1.38\(^\circ\)

SIFT + RANSAC + Semantics

0.59m, 0.81\(^\circ\)

1.58m, 2.10\(^\circ\)

0.32m, 1.50\(^\circ\)

0.54m, 1.56\(^\circ\)

0.76m, 1.49\(^\circ\)

  1. We compare the pipeline that has individual modules removed or replaced. The best results are highlighted in bold except for methods marked with a *