1 Introduction
1.1 Motivation
1.2 State of the Art
1.3 Contribution
2 FAB-MAP
2.1 Training and Recognition
2.2 Modifications to OpenFABMAP
3 ABLE-M
3.1 Training and Recognition
3.2 Modifications Made to OpenABLE
4 Experimental Evaluation
4.1 Experimental Sites
PUT LC test
and PUT LC test 2
. The user was asked to walk roughly along the trajectory marked by the green dots in Fig. 3. The PUT LC test
sequence consists of 466 images, and the PUT LC test 2
sequence contains 448 images.
PUT MC short
and PUT MC long
, recorded on the third floor. The trajectories are presented in Fig. 5 by green dots (PUT MC short
) and by red dots (PUT MC long
). Both trajectories started and ended in a place marked by ’S’ in Fig. 5, while the direction of motion is marked by an arrow. The shorter trajectory is equal to approximately 60 metres and contains 620 images, while the longer trajectory is approximately 120 metres long with 996 images. Taking into account the choice of training places, 10 FAB-MAP test locations are put along the PUT MC short
trajectory, and 15 test locations are put along the PUT MC long
trajectory.PUT MC people short
consists of 525 images, and in the PUT MC people long
864 images were recorded.4.2 Evaluation Procedure for OpenFABMAP
PUT LC test
and PUT MC short
were chosen for the initial validation and the results of system testing are presented in Table 1. As already mentioned, setting the sequence size n to 1 results in incorrect recognitions even if the recognition threshold \(t_p\) is set to 0.99. As any incorrect recognition may be critical for indoor localization [38], these results strongly suggest that OpenFABMAP in the original version was incapable of operation in the target environment. Increasing \(c_w\) allows to suppress the number of incorrect recognitions to 0, but might also lower the number of correct recognitions as the system needs to properly match several images in a sequence. Therefore, the window size of 3 or 5 is the best choice in our case. As performing recognition on several images is more descriptive, it is possible to lower \(t_p\) increasing the number of correct recognitions without any additional incorrect matches.PUT MC short
and PUT LC test
Sequence |
\(t_p\)
| Recognitions | Comparison window \(c_w\)
| |||
---|---|---|---|---|---|---|
1 | 3 | 5 | 8 | |||
PUT MC short
| 0.6 | Correct | 241 | 136 | 92 | 67 |
Incorrect | 36 | 0 | 0 | 0 | ||
Distinct | 9 | 9 | 6 | 5 | ||
0.99 | Correct | 174 | 85 | 59 | 38 | |
Incorrect | 1 | 0 | 0 | 0 | ||
Distinct | 9 | 6 | 5 | 4 | ||
PUT LC test
| 0.6 | Correct | 223 | 140 | 94 | 63 |
Incorrect | 207 | 23 | 2 | 0 | ||
Distinct | 6 | 6 | 5 | 4 | ||
0.99 | Correct | 202 | 126 | 94 | 65 | |
Incorrect | 122 | 11 | 1 | 0 | ||
Distinct | 6 | 6 | 5 | 4 |
Sequence | Recognitions | Sequence length n
| |||
---|---|---|---|---|---|
1 | 3 | 5 | 8 | ||
PUT LC test 2
| Correct | 224 | 199 | 168 | 135 |
Incorrect | 52 | 6 | 1 | 0 | |
distinct | 6 | 6 | 6 | 6 | |
PUT MC long
| Correct | 301 | 229 | 173 | 119 |
Incorrect | 22 | 2 | 0 | 0 | |
distinct | 13 | 12 | 10 | 7 | |
PUT MC people long
| Correct | 150 | 84 | 42 | 15 |
Incorrect | 24 | 3 | 0 | 0 | |
Distinct | 10 | 7 | 7 | 5 | |
PUT MC people short
| Correct | 34 | 18 | 10 | 2 |
Incorrect | 4 | 0 | 0 | 0 | |
Distinct | 9 | 4 | 3 | 2 |
PUT LC test 2
and PUT MC long
, and the datasets with people moving inside corridors PUT MC people long
and PUT MC people short
. The results confirm that the proper choice of windows size \(c_w\) usually allows minimizing the number of incorrect matches. In some cases, the characteristic of the environment contains multiple, misleading features making recognition a challenging task. Such case is presented in Fig. 6a as multiple features are visible on windows, misleading the FAB-MAP. Similarly, the people present in the corridors make it difficult to correctly recognize the places. Such situation is presented in Fig. 6b and Table 2.
4.3 Evaluation Procedure for OpenABLE
PUT LC test
dataset. The most significant results are presented in Table 3 containing tests for \(c_l\) ranging from 2 to 40 images and two values of \(t_r\). For the convenience of the readers, we only provide the clustered recognized places, which is a more intuitive performance measure than the total number of positive results from OpenABLE.PUT LC test
\(t_r\)
| Recognized places |
\(c_l\)
| ||||
---|---|---|---|---|---|---|
2 | 5 | 10 | 20 | 40 | ||
0.4 | Correctly | 61 | 36 | 19 | 21 | 6 |
incorrectly | 387 | 105 | 42 | 4 | 0 | |
0.3 | Correctly | 121 | 49 | 16 | 15 | 8 |
Incorrectly | 57 | 4 | 0 | 0 | 0 |
PUT LC test 2
. The system correctly recognized 6 places on the trajectory with 4529 positive recognitions from OpenABLE. Moreover, the system presented no false positives.Sequence | Recognized places |
\(c_l\)
| ||
---|---|---|---|---|
40 | 60 | 80 | ||
PUT MC short
| Correctly | 10 | 6 | 5 |
(\(t_r=0.4\)) | Incorrectly | 6 | 1 | 0 |
PUT MC long
| Correctly | 13 | 10 | 8 |
(\(t_r=0.4\)) | Incorrectly | 8 | 0 | 0 |
PUT MC people short
| Correctly | 5 | 3 | 1 |
(\(t_r=0.4\)) | Incorrectly | 1 | 0 | 0 |
PUT MC people long
| Correctly | 8 | 1 | 0 |
(\(t_r=0.4\)) | Incorrectly | 0 | 0 | 0 |
PUT MC people short
| Correctly | 9 | 8 | 2 |
(\(t_r=0.5\)) | Incorrectly | 33 | 5 | 1 |
PUT MC people long
| Correctly | 12 | 11 | 2 |
(\(t_r=0.5\)) | Incorrectly | 24 | 0 | 0 |
PUT MC short
, PUT MC long
and the sequences with people being present in the field of view are shown in Table 4.PUT MC short
and PUT MC long
. As corridors are hard to distinguish even for humans, the window size \(c_l\) was extended as shorter sequences might not provide sufficient information for place recognitions. This is necessary as building structure contains similar corridors, like those presented in Fig. 7a. Increasing \(c_l\) to 80 images corresponds to the 8 s of motion.
PUT MC people short
and PUT MC people long
, there were no incorrect recognitions when \(c_l\) was set to 60. Compared to sequences without people, the system recognizes fewer different places. This is normal as people can occlude a significant part of the images changing the properties of images for comparison. Therefore, larger values of \(c_l\) might result in more robust recognition in the case of small disturbances, like a single person in the field of view, but even very long windows are inefficient in a crowded scene. Nevertheless, OpenABLE exhibits impressive performance correctly recognizing places even in such situations as the one presented in Fig. 7b.4.4 Performance Comparison Between OpenFABMAP and OpenABLE
PUT LC test 2
sequence. It can be observed that the OpenFABMAP correctly recognized all of the places along the test trajectory. The OpenABLE failed to achieve recognitions in the middle part of the trajectory but provided continuous localization in the beginning and the end of the sequence. Depending on the application, both systems provided valuable information for the localization purposes.
PUT MC long
sequence. The OpenFABMAP managed to correctly recognize user location on 10 out of 15 possible places along the trajectory. The OpenABLE recognized the user in 7 parts of the trajectory, but some of those parts are equal to several seconds providing the system with continuous user localization. The main difference between algorithms is visible close to junctions—OpenABLE cannot recognize user on those locations, while OpenFABMAP works correctly. This suggests that these solutions are suitable for buildings of different structure or might be combined in a robust visual place recognition system.
PUT MC long
, but during the student break with many people walking along the corridors. In such conditions, the OpenFABMAP recognized the user in 7 out of 15 possible places along the trajectory. The OpenABLE recognized the user in 7 parts of the trajectory, but the recognized parts were shorter when compared to the recognition results obtained when the people were not present. Both systems correctly recognized the images of re-visited places if the presence od people only slightly disturbed the original view of the scene. Therefore, in many cases, the OpenFABMAP was able to correctly recognize a place based on images taken when people just passed by, which was not possible to OpenABLE. On the other hand, OpenABLE required a longer sequence of similar images for successful recognition but was more robust to small, but continuous disturbance introduced by the presence of people, which led to some surprisingly correct recognitions, as presented in Fig. 7b. Neither of the algorithms provided incorrect recognitions due to the people obscuring the field of view.5 Processing Time Analysis
6 Nordland Dataset
Database size n
| OpenABLE PC (s) | OpenABLE SGN3 (s) | OpenABLE LS8-50 (s) |
---|---|---|---|
1000 | 0.169 | 0.627 | 0.366 |
10,000 | 2.273 | 8.593 | 5.069 |
100,000 | 24.476 | 85.480 | 52.103 |
7 Improved FastABLE for Mobile Devices
7.1 Increasing the Matching Speed of OpenABLE
7.2 Processing Time Analysis of FastABLE
\(c_l\)
| Database size n
| |||||
---|---|---|---|---|---|---|
500 | 1000 | 2000 | 4000 | 8000 | ||
SGN3 | 20 | 11.4 | 9.8 | 9.3 | 9.2 | 9.2 |
40 | 22.3 | 19.5 | 18.6 | 18.2 | 18.2 | |
60 | 32.5 | 28.0 | 27.5 | 27.1 | 26.9 | |
80 | 40.5 | 36.5 | 36.2 | 35.6 | 35.6 | |
LS8-50 | 20 | 15.3 | 11.0 | 9.7 | 9.7 | 9.5 |
40 | 29.6 | 21.7 | 19.2 | 19.2 | 18.7 | |
60 | 42.3 | 31.2 | 28.1 | 28.1 | 27.6 | |
80 | 52.2 | 40.2 | 36.6 | 37.1 | 36.5 |
7.3 Verifying the Gains of FastABLE on the Nordland Dataset
Database size | 1000 (s) | 10,000 (s) | 100,000 (s) |
---|---|---|---|
OpenABLE PC | 0.169 | 2.273 | 24.476 |
FastABLE PC | 0.0016 | 0.019 | 0.205 |
OpenABLE SGN3 | 0.627 | 8.593 | 85.480 |
FastABLE SGN3 | 0.0056 | 0.075 | 0.751 |
OpenABLE LS8-50 | 0.366 | 5.069 | 52.103 |
FastABLE LS8-50 | 0.0036 | 0.043 | 0.442 |