Introduction
Research background
Visual SLAM
Scene text recognition
Large language model
Proposed methods
Overall framework
Text error correction chain (TECC)
Similarity classification
Two-stage memory strategy
Text clustering
Task-relevant selection
Position calculation and position clustering
Natural user interface
Experimental results
Real-world experiments
Scene text recognition
Similarity classification and two-stage memory
Threshold | 10 | 20 | 25–200 | 250 | 300 |
MDR | 0 | 0 | 0 | 8.3% | 16.7% |
FDR | 25.0% | 7.6% | 0 | 0 | 0 |
Text clustering
Task-relevant selection
Landmark | |
[KFC], [必胜客欢乐餐厅(pizza hut)], | |
[ALIENWARE], [HUAWEI], [GUCCI] | |
Non-landmark | |
[Donotbeat], [请切拍打(Do not beat, with one wrong | |
character)], [NoSmoking], [Don’t Touch], | |
[DANGER], [HIGHTEMPERATURE], [WASHROOM] |
Overall mapping
Natural language interface
Visualized comparison
Virtual experiments
Scene and text mapping
Response to natural language queries
Computational complexity analysis
Detected texts | Whether is shop | Estimated text position | True text position | Estimated camera position | True camera position | Text error | Camera error | |
---|---|---|---|---|---|---|---|---|
1 | 特步 | Y | (1.34, \(-\)3.13, 3.75) | (3.75, 1.01, \(-\)1.83) | (0.42, \(-\)3.54, 3.76) | (3.71, 0.08, \(-\)2.23) | 1.6 | 3.81 |
2 | Wendys | Y | (5.16, \(-\)2.42, 3.79) | (3.43, 4.24, \(-\)1.4) | (0.87, \(-\)4.21, 3.45) | (3.4, 0.46, \(-\)2.84) | 0.73 | 2.8 |
3 | DQ | Y | (5, \(-\)4.36, 3.71) | (3.67, 4.22, \(-\)2.89) | (1.28, \(-\)3.71, 1.09) | (1.26, 1.04, \(-\)2.5) | 2.82 | 2.02 |
4 | Levis | Y | (2.56, \(-\)4.25, 3.52) | (3.45, 1.92, \(-\)2.83) | (\(-\)0.2, \(-\)2.96, 3.4) | (3.39, \(-\)0.49, \(-\)1.67) | 2.73 | 4.35 |
5 | ANTA | Y | (12.11, \(-\)1.07, 3.2) | (3.04, 10.81, 0.05) | (3.75, \(-\)3.23, 0.84) | (1.4, 3.09, \(-\)2.07) | 2.23 | 2.7 |
6 | Adidas | Y | (12.51, \(-\)4.72, 3.81) | (3.57, 10.35, \(-\)3.18) | (8.27, \(-\)2.76, 1.85) | (2.04, 6.78, \(-\)1.5) | 2.45 | 3.58 |
7 | FILA | Y | (24.52, \(-\)3.09, 4.35) | (3.68, 20.28, \(-\)1.91) | (17.06, \(-\)6.15, \(-\)0.77) | (\(-\)0.35, 14.57, \(-\)4.31) | 0.95 | 2.63 |
8 | CHANEL | Y | (23.61, \(-\)5.43, \(-\)0.16) | (\(-\)0.05, 22.2, \(-\)4.25) | (3.7, \(-\)2.72, 3.36) | (3.43, 3.19, \(-\)1.51) | 3.48 | 4.37 |
9 | Burberry | Y | (29.65, \(-\)5.68, \(-\)0.09) | (0.23, 26.8, \(-\)4.43) | (17.2, \(-\)3.06, \(-\)0.73) | (\(-\)0.35, 14.7, \(-\)1.8) | 2.9 | 1.81 |
10 | McDonald’s | Y | (39.25, \(-\)3.73, 3.52) | (3.65, 35.02, \(-\)2.25) | (37.24, \(-\)3.68, 2.46) | (2.77, 33.49, \(-\)2.24) | 1.13 | 1.05 |
11 | Animate | Y | (41.66, \(-\)3.67, \(-\)6.27) | (\(-\)3.63, 38.13, \(-\)3.39) | (40.32, \(-\)4.4, 2.46) | (3.09, 36.58, \(-\)2.92) | 0.37 | 1.16 |
12 | iPhone | Y | (39.87, \(-\)4.44, \(-\)5.36) | (\(-\)4.09, 37.25, \(-\)2.76) | (41.69, \(-\)3.64, \(-\)3.73) | (\(-\)2.63, 38.79, \(-\)2.23) | 1.59 | 3.39 |
13 | LG | Y | (34.43, \(-\)4.15, \(-\)4.74) | (\(-\)2.84, 30.46, \(-\)2.9) | (23.1, \(-\)5.11, 3.52) | (3.4, 20.1, \(-\)3.36) | 0.65 | 0.7 |
14 | Intel | Y | (33.19, \(-\)2.54, \(-\)4.31) | (\(-\)3.23, 30.56, \(-\)1.15) | (26.5, \(-\)4.95, \(-\)0.84) | (\(-\)0.17, 23.41, \(-\)3.48) | 2.57 | 1.66 |
15 | 必胜客 | Y | (26.43, \(-\)4.9, \(-\)6.05) | (\(-\)3.22, 25.75, \(-\)3.12) | (29.88, \(-\)3.81, \(-\)5.29) | (\(-\)3.18, 28.43, \(-\)2.3) | 2.58 | 3.34 |
16 | Dior | Y | (22.86, \(-\)4.77, \(-\)3.92) | (\(-\)2.99, 20.6, \(-\)3.29) | (18.33, \(-\)2.98, \(-\)0.75) | (\(-\)0.35, 15.72, \(-\)1.69) | 1.62 | 1.83 |
17 | Nike | Y | (18.54, \(-\)1.42, \(-\)6.47) | (\(-\)3.04, 18.31, \(-\)0.24) | (22.45, \(-\)2.56, \(-\)6.11) | (\(-\)2.91, 21.81, \(-\)1.29) | 2.33 | 1.24 |
18 | Vans | Y | (17.4, \(-\)3.47, \(-\)5.1) | (\(-\)3.31, 15.48, \(-\)2.13) | (10.99, \(-\)4.14, \(-\)0.33) | (0.45, 9.47, \(-\)2.85) | 0.98 | 0.85 |
19 | Lenovo | Y | (13.42, \(-\)2.96, \(-\)7.75) | (\(-\)3.61, 13.11, \(-\)1.76) | (20.56, \(-\)1.24, \(-\)6.61) | (\(-\)3.34, 19.98, \(-\)0.1) | 0.52 | 0.5 |
20 | Versace | Y | (5.79, \(-\)2.36, \(-\)8.43) | (\(-\)3.3, 6.47, \(-\)0.89) | (9.56, \(-\)3.14, \(-\)7.58) | (\(-\)2.86, 10.12, \(-\)1.72) | 0.94 | 1.29 |
21 | New Balance | Y | (5.69, \(-\)3.92, \(-\)8.43) | (\(-\)3.36, 6.52, \(-\)2.39) | (8.83, \(-\)3.57, \(-\)7.87) | (\(-\)3.03, 9.46, \(-\)2.13) | 2.33 | 1.71 |
22 | HERMES | Y | (14.22, \(-\)4.96, \(-\)0.24) | (0.04, 12.03, \(-\)3.54) | (9.03, \(-\)2.58, 1.85) | (2.02, 7.42, \(-\)1.33) | 2.31 | 3.77 |
23 | Panasonic | Y | (21.18, \(-\)6.41, \(-\)0.61) | (\(-\)0.08, 16.99, \(-\)4.22) | (9.33, \(-\)1.93, 1.07) | (1.09, 7.71, \(-\)0.79) | 2.92 | 4.04 |
24 | 卫生间 | N | (43.85, \(-\)5.26, 3.05) | (3.2, 39.23, \(-\)3.65) | (41.06, \(-\)4.05, 1.71) | (2.04, 36.92, \(-\)2.52) | 1.97 | 0.5 |
Questions | Answers | ||
---|---|---|---|
1 | 特步店铺在什么位置? (Where is XTEP, a Chinese sports brand) | 特步 (XTEP) | (1.34, \(-\)3.13, 3.75) |
2 | 斐乐店铺在什么位置? (Where is FILA, a Chinese sports brand) | FILA | (24.52, \(-\)3.09, 4.35) |
3 | 香奈儿店铺在什么位置? (Where is the Channel) | CHANEL | (23.61, \(-\)5.43, \(-\)0.16) |
4 | Where is Pizza and More? | 必胜客 (Pizza and More) | (26.43, \(-\)4.9, \(-\)6.05) |
5 | Where can I buy an iPhone? | iPhone | (39.87, \(-\)4.44, \(-\)5.36) |
6 | Where can I buy a necklace? | Dior | (22.86, \(-\)4.77, \(-\)3.92) |
7 | Where can I buy a ring? | CHANEL | (23.61, \(-\)5.43, \(-\)0.16) |
8 | 我想吃披萨,请问可以去哪里? (I’d like to have pizza, where can I go?) | 必胜客 (Pizza and More) | (26.43, \(-\)4.9, \(-\)6.05) |
9 | 我想买一双运动鞋,请问可以去哪里? (I’d like to buy sport shoes, where can I go?) | Adidas | (12.62, \(-\)4.66, 3.88) |
10 | I want to go to Dior and Channel, can you design a route for me? | Dior CHANNEL | (22.86, \(-\)4.77, \(-\)3.92) (23.61, \(-\)5.43, \(-\)0.16) |
11 | 我的手机需要维修,请问我应该去哪里? (I need to fix my phone, where can I go?) | Panasonic | (19.8, \(-\)5.83, \(-\)0.19) |
12 | 我想先买一双运动鞋,然后去吃饭,但我不喜欢吃披萨,请帮我设计一条路线 (I’d like to buy sport shoes first, then I will go for a meal. But I do not like pizza. Please design a route for me.) | Nike McDonald’s | (18.54, \(-\)1.42, \(-\)6.47) (39.25, \(-\)3.73, 3.52) |
13 | I want to go to the toilet first, then I would like to buy a bag for my wife, can you design a route for me? | 卫生间 (Toilet) Burberry | (43.85, \(-\)5.26, 3.05) (29.65, \(-\)5.68, \(-\)0.09) |
14 | I am now at (0, 0, 0), please help me find the nearest sport shop | 特步 (XTEP) | (1.34, \(-\)3.13, 3.75) |
15 | 我现在在 (0, 0, 0),帮我找到离我最近的餐厅 (I am at (0, 0, 0), please help me find the nearest restaurant.) | 必胜客 (Pizza and More) | (26.43, \(-\)4.9, \(-\)6.05) |