1 Introduction
-
We propose a CTDE-based MSA3C framework for handling multi-robot cooperative planning tasks with social safety and comfort awareness under the limited FOV condition. Our method achieves great performance in multiple experiments compared to various baselines.
-
We design a multi-agent rollout replaybuffer to align the time-varying dimension of historical transitions and introduce a parameter-sharing social encoder for each robot based on TSG network to help robots better understand the relative social relationship of surrounding pedestrians.
-
We incorporate a predictive K-step lookahead reward function into the MARL paradigm during the training phase to enhance the social comfort awareness of each robot and prevent the adoption of unnatural and shortsighted policies.
2 Related work
2.1 Multi-robot cooperative planning
2.2 Learning-based social aware robot motion planning
3 Methodology
3.1 Dec-POMDP configuration
3.1.1 Observation
3.1.2 Action space
3.1.3 Reward setting
3.2 Algorithm description of MSA3C
3.2.1 Rollout replay buffer
3.2.2 Social encoder
3.2.3 Multi-robot global attention-based actor-critic
3.2.4 Decentralized cooperative planning
4 Experiments and discussion
4.1 Experimental configuration
4.1.1 Basic setting
Env setting | Value | RL setting | Value |
---|---|---|---|
\(r_\text {Safety}^{P}\)
| 0.5–1.3 m |
\(K_{\text {lookahead}}\)
| 5 |
\(r_\text {Safety}^{R}\)
| 0.6 m |
\(r_{\text {time}}\)
| − 1e−3 |
\(v_{\text {pref}}^P\)
| 0.5–1.5 m/s | lr | 5e−4 |
\(v_{\text {pref}}^R\)
| 1 m/s |
\(\tau\)
| 0.01 |
\(d_{\text {comfort}}\)
| 0.25 m | Policy delay | 2 |
\(R_{\text {scenario}}\)
| [6 m, 8 m, 10 m] | Batch size | 256 |
\(N_{\text {peds}}\)
| 5–20 |
\(\alpha _\text {init}\)
| 0.02 |
\(N_{\text {robots}}\)
| 3 | Buffer size | 2e5 |
FOV | 2\(\pi\), [5 m, 10 m] | episode | 5e4 |
Timestep | 0.25 s |
\(l_{\text {rollout-tps}}\)
| 10 |
4.1.2 Network setting of MSA3C
4.2 Metrics
Algorithms | Scene | Different Metrics | ||||
---|---|---|---|---|---|---|
CSR%\(\uparrow\)
| CR%\(\downarrow\)
| APL\(\downarrow\)
| NTC\(\downarrow\)
| CIR%\(\downarrow\)
| ||
MASAC-F10 | 5p3r | Nan | Nan | Nan | Nan | Nan |
MLGA2C-F10 | 5p3r | 98.0 | 44.2 | 26.8 | 45.7 | 10.2 |
MSA3C-F10 | 5p3r | 92.6 | 0.9 | 27.7 | 51.4 | 3.0 |
10p3r | 80.2 | 2.2 | 31.8 | 71.6 | 4.1 | |
20p3r | 61.4 | 5.8 | 36.9 | 81.4 | 6.7 | |
MSA3CPred-F10 | 5p3r | 98.8 | 2.4 | 28.7 | 51.7 | 1.2 |
10p3r | 99.8 | 2.5 | 33.1 | 65.2 | 1.6 | |
20p3r | 92.6 | 8.4 | 41.2 | 81.0 | 2.9 | |
MSA3CPred-F5 | 5p3r | 99.2 | 11.6 | 34.2 | 65.8 | 2.3 |
10p3r | 97.2 | 14.9 | 40.2 | 79.3 | 2.7 | |
20p3r | 89.5 | 16.6 | 43.6 | 84.3 | 3.8 | |
SARL-F10* | 5p3r | 85.2 | 12.2 | 40.4 | 56.5 | 4.9 |
ORCA-PS-F10* | 5p3r | 98.4 | 2.8 | 30.2 | 66.4 | 1.4 |
10p3r | 92.4 | 3.5 | 38.2 | 89.7 | 1.6 | |
20p3r | 60.4 | 7.8 | 39.5 | 102.9 | 2.5 | |
SF-F10* | 5p3r | 98.0 | 35.5 | 33.1 | 80.8 | 7.3 |
10p3r | 68.2 | 77.9 | 37.1 | 89.4 | 9.6 | |
20p3r | 71.2 | 135.1 | 42.9 | 99.0 | 12.5 |
4.3 Quantitative and qualitative analysis
4.3.1 Baselines
4.3.2 Quantitative analysis
Scene | Module setting | Different metrics | ||||||
---|---|---|---|---|---|---|---|---|
-Attn | -TSG | -Pred | CSR%\(\uparrow\)
| CR%\(\downarrow\)
| APL\(\downarrow\)
| NTC\(\downarrow\)
| CIR%\(\downarrow\)
| |
5p3r |
\(\checkmark\)
| ✗ | ✗ | 95.6 | 29.2 | 30.8 | 52.5 | 6.6 |
10p3r |
\(\checkmark\)
| ✗ | ✗ | 54.5 | 45.9 | 39.8 | 78.7 | 15.5 |
20p3r |
\(\checkmark\)
| ✗ | ✗ | Nan | Nan | Nan | Nan | Nan |
5p3r |
\(\checkmark\)
|
\(\checkmark\)
| ✗ | 92.6 | 1.9 | 27.7 | 51.4 | 3.0 |
10p3r |
\(\checkmark\)
|
\(\checkmark\)
| ✗ | 80.2 | 2.2 | 31.8 | 71.6 | 4.1 |
20p3r |
\(\checkmark\)
|
\(\checkmark\)
| ✗ | 61.4 | 7.8 | 36.9 | 81.4 | 6.7 |
5p3r |
\(\checkmark\)
|
\(\checkmark\)
|
\(\checkmark\)
| 98.8 | 2.4 | 28.7 | 51.7 | 1.2 |
10p3r |
\(\checkmark\)
|
\(\checkmark\)
|
\(\checkmark\)
| 99.8 | 2.5 | 33.1 | 65.2 | 1.6 |
20p3r |
\(\checkmark\)
|
\(\checkmark\)
|
\(\checkmark\)
| 92.6 | 8.4 | 41.2 | 91.0 | 2.9 |