Skip to main content

Tipp

Weitere Kapitel dieses Buchs durch Wischen aufrufen

2020 | OriginalPaper | Buchkapitel

SAGE: Task-Environment Platform for Evaluating a Broad Range of AI Learners

verfasst von : Leonard M. Eberding, Kristinn R. Thórisson, Arash Sheikhlar, Sindri P. Andrason

Erschienen in: Artificial General Intelligence

Verlag: Springer International Publishing

Abstract

While several tools exist for training and evaluating narrow machine learning (ML) algorithms, their design generally does not follow a particular or explicit evaluation methodology or theory. Inversely so for more general learners, where many evaluation methodologies and frameworks have been suggested, but few specific tools exist. In this paper we introduce a new framework for broad evaluation of artificial intelligence (AI) learners, and a new tool that builds on this methodology. The platform, called SAGE (Simulator for Autonomy & Generality Evaluation), works for training and evaluation of a broad range of systems and allows detailed comparison between narrow and general ML and AI. It provides a variety of tuning and task construction options, allowing isolation of single parameters across complexity dimensions. SAGE is aimed at helping AI researchers map out and compare strengths and weaknesses of divergent approaches. Our hope is that it can help deepen understanding of the various tasks we want AI systems to do and the relationship between their composition, complexity, and difficulty for various AI systems, as well as contribute to building a clearer research road map for the field. This paper provides an overview of the framework and presents results of an early use case.

Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe



 


Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt 90 Tage mit der neuen Mini-Lizenz testen!

Fußnoten
1
https://​index.​ros.​org/​doc/​ros2/​ – accessed Feb. \(26^{th}\) 2020.
 
2
http://​gazebosim.​org/​ – accessed Feb. \(26^{th}\) 2020.
 
Literatur
1.
Zurück zum Zitat Adams, S., et al.: Mapping the landscape of human-level artificial general intelligence. AI Mag. 33(1), 25–42 (2012) CrossRef Adams, S., et al.: Mapping the landscape of human-level artificial general intelligence. AI Mag. 33(1), 25–42 (2012) CrossRef
2.
Zurück zum Zitat Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013) CrossRef Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013) CrossRef
3.
Zurück zum Zitat Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: SAGE: task-environment platform for evaluating a broad range of AI learners. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 4148–4152 (2015) Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: SAGE: task-environment platform for evaluating a broad range of AI learners. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 4148–4152 (2015)
4.
Zurück zum Zitat Bieger, J., Thórisson, K.R., Steunebrink, B.R., Thorarensen, T., Sigurdardóttir, J.S.: Evaluation of general-purpose artificial intelligence: why, what & how. In: EGPAI 2016 - Evaluating General-Purpose A.I., Workshop Held in Conjuction with the European Conference on Artificial Intelligence (2016) Bieger, J., Thórisson, K.R., Steunebrink, B.R., Thorarensen, T., Sigurdardóttir, J.S.: Evaluation of general-purpose artificial intelligence: why, what & how. In: EGPAI 2016 - Evaluating General-Purpose A.I., Workshop Held in Conjuction with the European Conference on Artificial Intelligence (2016)
6.
Zurück zum Zitat Hernández-Orallo, J., et al.: A new AI evaluation cosmos: ready to play the game? AI Mag. 38(3), 66–69 (2017) CrossRef Hernández-Orallo, J., et al.: A new AI evaluation cosmos: ready to play the game? AI Mag. 38(3), 66–69 (2017) CrossRef
7.
Zurück zum Zitat Johnston, B.: The toy box problem (and a preliminary solution). In: Conference on Artificial General Intelligence. Atlantis Press (2010) Johnston, B.: The toy box problem (and a preliminary solution). In: Conference on Artificial General Intelligence. Atlantis Press (2010)
8.
Zurück zum Zitat Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), vol. 3, pp. 2149–2154. IEEE (2004) Koenig, N., Howard, A.: Design and use paradigms for gazebo, an open-source multi-robot simulator. In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No. 04CH37566), vol. 3, pp. 2149–2154. IEEE (2004)
9.
Zurück zum Zitat Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000) Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)
10.
Zurück zum Zitat Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012) Levesque, H., Davis, E., Morgenstern, L.: The winograd schema challenge. In: Thirteenth International Conference on the Principles of Knowledge Representation and Reasoning (2012)
12.
Zurück zum Zitat Martınez-Plumed, F., Hernández-Orallo, J.: AI results for the atari 2600 games: difficulty and discrimination using IRT. In: EGPAI, Workshop on Evaluating General-Purpose Artificial Intelligence, vol. 33 (2016) Martınez-Plumed, F., Hernández-Orallo, J.: AI results for the atari 2600 games: difficulty and discrimination using IRT. In: EGPAI, Workshop on Evaluating General-Purpose Artificial Intelligence, vol. 33 (2016)
13.
Zurück zum Zitat Oppy, G., Dowe, D.: The turing test. In: Stanford Encyclopedia of Philosophy, pp. 519–539 (2003) Oppy, G., Dowe, D.: The turing test. In: Stanford Encyclopedia of Philosophy, pp. 519–539 (2003)
14.
Zurück zum Zitat Quigley, M., et al.: ROS: an open-source Robot Operating System. In: ICRA Workshop on Open Source Software, Kobe, Japan, vol. 3, p. 5 (2009) Quigley, M., et al.: ROS: an open-source Robot Operating System. In: ICRA Workshop on Open Source Software, Kobe, Japan, vol. 3, p. 5 (2009)
16.
Zurück zum Zitat Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, London (2016) MATH Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, London (2016) MATH
17.
Zurück zum Zitat Świechowski, M., Park, H., Mańdziuk, J., Kim, K.J.: Recent advances in general game playing. Sci. World J. 2015, 22 (2015) CrossRef Świechowski, M., Park, H., Mańdziuk, J., Kim, K.J.: Recent advances in general game playing. Sci. World J. 2015, 22 (2015) CrossRef
18.
Zurück zum Zitat Thorarensen, T.: FraMoTEC: A framework for modular task-environment construction for evaluating adaptive control systems. M.Sc. thesis, Department of Computer Science, Reykjavik University (2016) Thorarensen, T.: FraMoTEC: A framework for modular task-environment construction for evaluating adaptive control systems. M.Sc. thesis, Department of Computer Science, Reykjavik University (2016)
21.
Zurück zum Zitat Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016) Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Metadaten
Titel
SAGE: Task-Environment Platform for Evaluating a Broad Range of AI Learners
verfasst von
Leonard M. Eberding
Kristinn R. Thórisson
Arash Sheikhlar
Sindri P. Andrason
Copyright-Jahr
2020
DOI
https://doi.org/10.1007/978-3-030-52152-3_8

Premium Partner