nach oben

Autonomous Agents and Multi-Agent Systems

Erschienen in:

01.04.2022

Privacy preserving planning in multi-agent stochastic environments

verfasst von: Tommy Hefner, Guy Shani, Roni Stern

Erschienen in: Autonomous Agents and Multi-Agent Systems | Ausgabe 1/2022

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

Collaborative privacy preserving planning (cppp) gained much attention in the past decade. cppp aims to create solutions for multi agent planning problems where cooperation is required to achieve an efficient solution, without exposing information that the agent considers private in the process. To date, cppp has focused on domains with deterministic action effects. However, in real-world problems action effects are often non-deterministic, and actions can have multiple possible effects with varying probabilities. In this paper, we introduce Stochastic cppp (scppp), which is an extension of cppp to domains with stochastic action effects. We show how scppp can be modeled as a Markov decision process (mdp) and how the value-iteration algorithm can be adapted to solve it. This adaptation requires extending value-iteration to support multiple agents and privacy. Then, we present two adaptions of the real-time dynamic programming (rtdp) algorithm, a popular algorithm for solving mdps, designed to solve scppp problems. The first rtdp adaptation, called distributed rtdp (drtdp), yields identical behavior to applying rtdp in a centralized manner on the joint problem. To preserve privacy, drtdp uses a message passing mechanism adopted from the mafs algorithm. The second rtdp adaptation is an approximation of drtdp called public synchronization rtdp (ps-rtdp). ps-rtdp differs from drtdp mainly in its message passing mechanism, where ps-rtdp sends significantly fewer messages than drtdp. We experimented on domains adapted from the deterministic cppp literature by adding different stochastic effects to different actions. The results show that ps-rtdp can reduce the amount of messages compared to drtdp by orders of magnitude thus improving run-time, while producing policies with similar expected costs.

Vorheriger Artikel Quantifying the effects of environment and population diversity in multi-agent reinforcement learning

Nächster Artikel What values should an agent align with?

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

https://github.com/TmanHef/SCPPP-Algs

http://agents.fel.cvut.cz/codmap/

https://github.com/TmanHef/SCPPP-PDDLs

Hefner, T., Stern, R., & Shani, G. (2020). Privacy preserving planning in stochastic environments. In J. C. Beck, O. Buffet, J. Hoffmann, E. Karpas, & S. Sohrabi, (Eds.). Proceedings of the Thirtieth International Conference on Automated Planning and Scheduling, Nancy, France, October 26–30 (pp. 258–262). AAAI Press. https://aaai.org/ojs/index.php/ICAPS/article/view/6669

Brafman, R. I., & Domshlak, C. (2008). From one to many: Planning for loosely coupled multi-agent systems. In ICAPS (pp. 28–35).

Bellman, R. (1966). Dynamic programming. Science, 153(3731), 34–37.CrossRef

Kolobov, A. . (2012). Planning with Markov decision processes: An AI perspective. Synthesis Lectures on Artificial Intelligence and Machine Learning, 6(1), 1–210.MATH

Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1–2), 81–138.CrossRef

Nissim, R., & Brafman, R. I. (2014). Distributed heuristic forward search for multi-agent planning. JAIR, 51, 293–332.MathSciNetCrossRef

Štolba, M., Komenda, A., & Kovacs, D. L. (2015). Competition of distributed and multiagent planners (codmap). In The International Planning Competition (WIPC-15), (p. 24).

Brafman, R. I., & Domshlak, C. (2013). On the complexity of planning for agent teams and its implications for single agent planning. Artificial Intelligence, 198, 52–71.MathSciNetCrossRef

Brafman, R. I. (2015). A privacy preserving algorithm for multi-agent planning and search. In IJCAI (pp. 1530–1536).

10.

Tožička, J., Štolba, M., & Komenda, A. (2017). The limits of strong privacy preserving multi-agent planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 27).

11.

Maliah, S., Shani, G., & Stern, R. (2018). Action dependencies in privacy-preserving multi-agent planning. Autonomous Agents and Multi-Agent Systems, 32(6), 779–821.CrossRef

12.

Maliah, S., Shani, G., & Brafman, R. I. (2016a). Online macro generation for privacy preserving planning. In Twenty-Sixth International Conference on Automated Planning and Scheduling.

13.

Wu, F., Zilberstein, S., & Chen, X. (2018). Privacy-preserving policy iteration for decentralized pomdps. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI), New Orleans, USA (pp. 4759–4766).

14.

Štolba, M., Tožička, J., & Komenda, A. (2018). Quantifying privacy leakage in multi-agent planning. TOIT, 18(3), 28.CrossRef

15.

Štolba, M., Fišer, D., & Komenda, A. (2019). Privacy leakage of search-based multi-agent planning algorithms. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 29, pp. 482–490).

16.

Guestrin, C., Koller, D., & Parr, R. (2001). Max-norm projections for factored mdps. InIn IJCAI (Vol. 17, pp. 673–680).

17.

Bellman, R. (1957). A Markovian decision process. Journal of Mathematics and Mechanics, 6, 679–684.MathSciNetMATH

18.

Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming. Wiley.MATH

19.

Trevizan, F. W., Teichteil-Königsbuch, F., & Thiébaux, S. (2017). Efficient solutions for stochastic shortest path problems with dead ends. In UAI.

20.

Du, Wenliang., & Zhan, Zhijun. (2002). A practical approach to solve secure multi-party computation problems. In Proceedings of the 2002 workshop on New security paradigms, pp 127–135.

21.

Bertsekas, D. (1982). Distributed dynamic programming. IEEE Transactions on Automatic Control, 27(3), 610–616.MathSciNetCrossRef

22.

Witzig, J., Beckenbach, I., Eifler, L., Fackeldey, K., Gleixner, A., Grever, A., & Weber, M. (2018). Mixed-integer programming for cycle detection in nonreversible Markov processes. Multiscale Modeling & Simulation, 16(1), 248–265.MathSciNetCrossRef

23.

Ameloot, T. J., & Van den Bussche, J. (2015). On the convergence of cycle detection for navigational reinforcement learning. http://arxiv.org/1511.08724

24.

Papadimitriou, C. H., & Tsitsiklis, J. N. (1987). The complexity of Markov decision processes. Mathematics of Operations Research, 12(3), 441–450.MathSciNetCrossRef

25.

Bertsekas, D. P., & Tsitsiklis, J. N. (1991). An analysis of stochastic shortest path problems. Mathematics of Operations Research, 16(3), 580–595.MathSciNetCrossRef

26.

Littman, M. L., Dean, T. L., & Kaelbling, L. P. (2013). On the complexity of solving markov decision problems. http://arxiv.org/1302.4971

27.

Štolba, M., & Komenda, A. (2017). The Madla planner: Multi-agent planning by combination of distributed and local heuristic search. Artificial Intelligence, 252, 175–210.MathSciNetCrossRef

28.

Maliah, S., Shani, G., & Stern, R. (2016b). Stronger privacy preserving projections for multi-agent planning. In the International Conference on Automated Planning and Scheduling (ICAPS) (pp. 221–229).

29.

Maliah, S., Shani, G., & Stern, R. (2016c). Stronger privacy preserving projections for multi-agent planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 26).

30.

Maliah, S., Shani, G., & Stern, R. (2014). Privacy preserving landmark detection. In the European Conference on Artificial Intelligence (ECAI) (pp. 597–602).

31.

Hauskrecht, M., Meuleau, N., Kaelbling, L. P, Dean, T. L., & Boutilier, C. (2013). Hierarchical solution of markov decision processes using macro-actions. http://arxiv.org/1301.7381

32.

Gerevini, A. E., Lipovetzky, N., Percassi, F., Saetti, A., & Serina, I. (2019). Best-first width search for multi agent privacy-preserving planning. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 29, pp. 163–171).

33.

Gohari, P., Hale, M., & Topcu, U. (2020). Privacy-preserving policy synthesis in markov decision processes. In 2020 59th IEEE Conference on Decision and Control (CDC) (pp. 6266–6271). IEEE.

34.

Venkitasubramaniam, P. (2013). Privacy in stochastic control: A markov decision process perspective. In 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton) (pp. 381–388). IEEE.

35.

Boutilier, C. (1999). Sequential optimality and coordination in multiagent systems. In In IJCAI (Vol. 99, pp. 478–485).

36.

Amato, C., Chowdhary, G., Geramifard, A., Kemal Üre, N., & Kochenderfer, M. J. (2013). Decentralized control of partially observable markov decision processes. In 52nd IEEE Conference on Decision and Control (pp. 2398–2405). IEEE.

37.

Kocsis, L., Szepesvári, C. (2006). Bandit based monte-carlo planning. In European conference on machine learning (pp. 282–293). Springer.

38.

Brendan, M. H., Likhachev, M., & Gordon, G. J. (2005). Bounded real-time dynamic programming: Rtdp with monotone upper bounds and performance guarantees. In Proceedings of the 22nd international conference on Machine learning (pp. 569–576). ACM.

39.

Smith, T., & Simmons, R. (2006). Focused real-time dynamic programming for mdps: Squeezing more out of a heuristic. In AAAI (pp. 1227–1232).

40.

Bonet, B., & Geffner, H. (2001). Planning as heuristic search. Artificial Intelligence, 129(1), 5–33.MathSciNetCrossRef

41.

Keller, T., & Eyerich, P. (2012). Prost: Probabilistic planning based on uct. In Proceedings of the International Conference on Automated Planning and Scheduling (Vol. 22).

42.

Bonet, B., & Geffner, H. (2006). Learning depth-first search: A unified approach to heuristic search in deterministic and non-deterministic settings, and its application to mdps. In In ICAPS (Vol. 6, pp. 142–151).

43.

Stolba, M., Tozicka, J., & Komenda, A. (2018). Quantifying privacy leakage in multi-agent planning. ACM Transactions on Internet Technology, 18(3), 28:1-28:21.CrossRef

44.

Stolba, M., Fiser, D., & Komenda, A. (2019). Privacy leakage of search-based multi-agent planning algorithms. In J. Benton, N. Lipovetzky, E. Onaindia, D. E. Smith, & S. Srivastava (Eds.). Proceedings of the Twenty-Ninth International Conference on Automated Planning and Scheduling, ICAPS 2018, Berkeley, CA, USA, July 11–15, 2019 (pp. 482–490). AAAI Press.

Titel: Privacy preserving planning in multi-agent stochastic environments
verfasst von: Tommy Hefner
Guy Shani
Roni Stern
Publikationsdatum: 01.04.2022
Verlag: Springer US
Erschienen in: Autonomous Agents and Multi-Agent Systems / Ausgabe 1/2022
Print ISSN: 1387-2532
Elektronische ISSN: 1573-7454
DOI: https://doi.org/10.1007/s10458-022-09554-w

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Weitere Artikel der Ausgabe 1/2022

Combining quantitative and qualitative reasoning in concurrent multi-player games

Candidate selections with proportional fairness constraints

Gini index based initial coin offering mechanism

Designing empathic virtual agents: manipulating animation, voice, rendering, and empathy to create persuasive agents

Evaluating approval-based multiwinner voting in terms of robustness to noise

A practical guide to multi-objective reinforcement learning and planning

Premium Partner