Skip to main content

Table 2 Results on the Cambridge Landmarks dataset

From: Accurate visual localization with semantic masking and attention

  

Kings’ College

Old Hospital

Shop Facade

St M. Church

Average

3D

*Active Search [17]

0.48m, 0.67\(^\circ\)

0.81m, 1.15\(^\circ\)

0.17m, 0.65\(^\circ\)

0.36m, 1.00\(^\circ\)

0.46m, 0.87\(^\circ\)

IFB

*SIFT + RANSAC

0.49m, 0.70\(^\circ\)

1.04m, 1.29\(^\circ\)

0.19m, 0.67\(^\circ\)

0.36m, 1.03\(^\circ\)

0.52m, 0.92\(^\circ\)

IR

DenseVLAD (D.VLAD) [20]

2.80m, 5.72\(^\circ\)

4.01m, 7.13\(^\circ\)

1.11m, 7.61\(^\circ\)

2.31m, 8.00\(^\circ\)

2.56m, 7.12\(^\circ\)

D.VLAD + Inter. [23]

1.48m, 4.45\(^\circ\)

2.68m, 4.63\(^\circ\)

0.90m, 4.32\(^\circ\)

1.62m, 6.06\(^\circ\)

1.67m, 4.87\(^\circ\)

APE

PoseNet (PN) [9]

1.92m, 5.40\(^\circ\)

2.31m, 5.38\(^\circ\)

1.46m, 8.08\(^\circ\)

2.65m, 8.48\(^\circ\)

2.09m, 6.84\(^\circ\)

Bay. PN [10]

1.74m, 4.06\(^\circ\)

2.57m, 5.14\(^\circ\)

1.25m, 7.54\(^\circ\)

2.11m, 8.38\(^\circ\)

1.92m, 6.28\(^\circ\)

 

LSTM PN [11]

0.99m, 3.65\(^\circ\)

1.51m, 4.29\(^\circ\)

1.18m, 7.44\(^\circ\)

1.52m, 6.68\(^\circ\)

1.30m, 5.52\(^\circ\)

 

MapNet [12]

1.07m, 1.89\(^\circ\)

1.94m, 3.91\(^\circ\)

1.49m, 4.22\(^\circ\)

2.00m, 4.53\(^\circ\)

1.63m, 3.64\(^\circ\)

RPE

EssNet [16]

0.82m, 2.32\(^\circ\)

1.58m, 4.37\(^\circ\)

1.29m, 6.32\(^\circ\)

1.74m, 6.17\(^\circ\)

1.36m, 4.80\(^\circ\)

BiasAttNet (ours)

0.75m, 2.26\(^\circ\)

1.44m, 3.23\(^\circ\)

0.95m, 5.24\(^\circ\)

1.62m, 5.27\(^\circ\)

1.19m, 4.00\(^\circ\)

  1. We compare our approach against 3D structure-based method (3D), image retrieval (IR), indirect feature-based localization (IFB), absolute and relative pose estimation (APE and RPE) methods. We report the median position error in meters and orientation error in degrees. The best results are highlighted in bold except for methods marked with a *