Skip to main content
Erschienen in: Empirical Software Engineering 2/2024

01.03.2024

Evaluating the impact of flaky simulators on testing autonomous driving systems

verfasst von: Mohammad Hossein Amini, Shervin Naseri, Shiva Nejati

Erschienen in: Empirical Software Engineering | Ausgabe 2/2024

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Simulators are widely used to test Autonomous Driving Systems (ADS), but their potential flakiness can lead to inconsistent test results. We investigate test flakiness in simulation-based testing of ADS by addressing two key questions: (1) How do flaky ADS simulations impact automated testing that relies on randomized algorithms? and (2) Can machine learning (ML) effectively identify flaky ADS tests while decreasing the required number of test reruns? Our empirical results, obtained from two widely-used open-source ADS simulators and five diverse ADS test setups, show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms. Further, our ML classifiers effectively identify flaky ADS tests using only a single test run, achieving F1-scores of 85%, 82% and 96% for three different ADS test setups. Our classifiers significantly outperform our non-ML baseline, which requires executing tests at least twice, by 31%, 21%, and 13% in F1-score performance, respectively. We conclude with a discussion on the scope, implications and limitations of our study. We provide our complete replication package in a Github repository (Github paper 2023).

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Literatur
Zurück zum Zitat Abdessalem RB, Nejati S, Briand LC, Stifter T (2018) Testing vision-based control systems using learnable evolutionary algorithms. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), IEEE, pp 1016–1026 Abdessalem RB, Nejati S, Briand LC, Stifter T (2018) Testing vision-based control systems using learnable evolutionary algorithms. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), IEEE, pp 1016–1026
Zurück zum Zitat Afzal A, Katz DS, Le Goues C, Timperley CS (2021) Simulation for robotics test automation: developer perspectives. In: 2021 14th IEEE conference on software testing, verification and validation (ICST), pp 263–274 Afzal A, Katz DS, Le Goues C, Timperley CS (2021) Simulation for robotics test automation: developer perspectives. In: 2021 14th IEEE conference on software testing, verification and validation (ICST), pp 263–274
Zurück zum Zitat Ahlgren J, Bojarczuk K, Drossopoulou S, Dvortsova I, George J, Gucevska N, Harman M, Lomeli M, Lucas SM, Meijer E, et al (2021) Facebook’s cyber–cyber and cyber–physical digital twins. In: Evaluation and assessment in software engineering, pp 1–9 Ahlgren J, Bojarczuk K, Drossopoulou S, Dvortsova I, George J, Gucevska N, Harman M, Lomeli M, Lucas SM, Meijer E, et al (2021) Facebook’s cyber–cyber and cyber–physical digital twins. In: Evaluation and assessment in software engineering, pp 1–9
Zurück zum Zitat Alshammari A, Morris C, Hilton M, Bell J (2021) Flakeflagger: predicting flakiness without rerunning tests. In: 43rd IEEE/ACM international conference on software engineering: companion proceedings, ICSE Companion 2021, Madrid, Spain, May 25-28, 2021, IEEE, p 187 Alshammari A, Morris C, Hilton M, Bell J (2021) Flakeflagger: predicting flakiness without rerunning tests. In: 43rd IEEE/ACM international conference on software engineering: companion proceedings, ICSE Companion 2021, Madrid, Spain, May 25-28, 2021, IEEE, p 187
Zurück zum Zitat Bell J, Legunsen O, Hilton M, Eloussi L, Yung T, Marinov D (2018) Deflaker: automatically detecting flaky tests. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), pp 433–444 Bell J, Legunsen O, Hilton M, Eloussi L, Yung T, Marinov D (2018) Deflaker: automatically detecting flaky tests. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), pp 433–444
Zurück zum Zitat Birchler C, Khatiri S, Bosshard B, Gambi A, Panichella S (2023) Machine learning-based test selection for simulation-based testing of self-driving cars software. Empir Softw Eng 28(3):71CrossRef Birchler C, Khatiri S, Bosshard B, Gambi A, Panichella S (2023) Machine learning-based test selection for simulation-based testing of self-driving cars software. Empir Softw Eng 28(3):71CrossRef
Zurück zum Zitat Borg M, Abdessalem RB, Nejati S, Jegeden F, Shin D (2021) Digital twins are not monozygotic - cross-replicating ADAS testing in two industry-grade automotive simulators. In: 14th IEEE conference on software testing, verification and validation, ICST 2021, Porto de Galinhas, Brazil, April 12-16, 2021, IEEE, pp 383–393 Borg M, Abdessalem RB, Nejati S, Jegeden F, Shin D (2021) Digital twins are not monozygotic - cross-replicating ADAS testing in two industry-grade automotive simulators. In: 14th IEEE conference on software testing, verification and validation, ICST 2021, Porto de Galinhas, Brazil, April 12-16, 2021, IEEE, pp 383–393
Zurück zum Zitat Capon JA (1991) Elementary Statistics for the Social Sciences: Study Guide. Wadsworth Publishing Company, Belmont, CA, USA Capon JA (1991) Elementary Statistics for the Social Sciences: Study Guide. Wadsworth Publishing Company, Belmont, CA, USA
Zurück zum Zitat Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef
Zurück zum Zitat Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16 Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16
Zurück zum Zitat Dutta S, Shi A, Choudhary R, Zhang Z, Jain A, Misailovic S (2020) Detecting flaky tests in probabilistic and machine learning applications. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, association for computing machinery, New York, USA, ISSTA 2020, pp 211–224, https://doi.org/10.1145/3395363.3397366 Dutta S, Shi A, Choudhary R, Zhang Z, Jain A, Misailovic S (2020) Detecting flaky tests in probabilistic and machine learning applications. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, association for computing machinery, New York, USA, ISSTA 2020, pp 211–224, https://​doi.​org/​10.​1145/​3395363.​3397366
Zurück zum Zitat Gog I, Kalra S, Schafhalter P, Wright MA, Gonzalez JE, Stoica I (2021) Pylot: a modular platform for exploring latency-accuracy tradeoffs in autonomous vehicles. In: 2021 IEEE international conference on robotics and automation (ICRA), IEEE, pp 8806–8813 Gog I, Kalra S, Schafhalter P, Wright MA, Gonzalez JE, Stoica I (2021) Pylot: a modular platform for exploring latency-accuracy tradeoffs in autonomous vehicles. In: 2021 IEEE international conference on robotics and automation (ICRA), IEEE, pp 8806–8813
Zurück zum Zitat Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Losada DE, Fernández-Luna JM (eds) Advances in Information Retrieval. Springer, Berlin, Heidelberg, pp 345–359 Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Losada DE, Fernández-Luna JM (eds) Advances in Information Retrieval. Springer, Berlin, Heidelberg, pp 345–359
Zurück zum Zitat Hagan MT, Demuth HB, Beale M (1997) Neural network design. PWS Publishing Co Hagan MT, Demuth HB, Beale M (1997) Neural network design. PWS Publishing Co
Zurück zum Zitat Haq FU, Shin D, Nejati S, Briand LC (2020) Comparing offline and online testing of deep neural networks: An autonomous car case study. In: 13th IEEE international conference on software testing, validation and verification, ICST 2020, Porto, Portugal, October 24-28, 2020, IEEE, pp 85–95 Haq FU, Shin D, Nejati S, Briand LC (2020) Comparing offline and online testing of deep neural networks: An autonomous car case study. In: 13th IEEE international conference on software testing, validation and verification, ICST 2020, Porto, Portugal, October 24-28, 2020, IEEE, pp 85–95
Zurück zum Zitat Haq FU, Shin D, Nejati S, Briand LC (2021) Can offline testing of deep neural networks replace their online testing? Empir Softw Eng 26(5):90CrossRef Haq FU, Shin D, Nejati S, Briand LC (2021) Can offline testing of deep neural networks replace their online testing? Empir Softw Eng 26(5):90CrossRef
Zurück zum Zitat Haq FU, Shin D, Briand L (2022) Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization. In: 2022 IEEE/ACM 44th international conference on software engineering (ICSE), pp 811–822, https://doi.org/10.1145/3510003.3510188 Haq FU, Shin D, Briand L (2022) Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization. In: 2022 IEEE/ACM 44th international conference on software engineering (ICSE), pp 811–822, https://​doi.​org/​10.​1145/​3510003.​3510188
Zurück zum Zitat Haq FU, Shin D, Briand LC (2023) Many-objective reinforcement learning for online testing of dnn-enabled systems. In: 45th IEEE/ACM international conference on software engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, IEEE, pp 1814–1826 Haq FU, Shin D, Briand LC (2023) Many-objective reinforcement learning for online testing of dnn-enabled systems. In: 45th IEEE/ACM international conference on software engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, IEEE, pp 1814–1826
Zurück zum Zitat Herzig K, Nagappan N (2015) Empirically detecting false test alarms using association rules. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 39–48 Herzig K, Nagappan N (2015) Empirically detecting false test alarms using association rules. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 39–48
Zurück zum Zitat Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Cheung S, Orso A, Storey MD (eds) Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014, ACM, pp 643–653 Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Cheung S, Orso A, Storey MD (eds) Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014, ACM, pp 643–653
Zurück zum Zitat Matinnejad R, Nejati S, Briand LC (2017) Automated testing of hybrid simulink/stateflow controllers: industrial case studies. In: Bodden E, Schäfer W, van Deursen A, Zisman A (eds) Proceedings of the 2017 11th joint meeting on foundations of software engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, ACM, pp 938–943 Matinnejad R, Nejati S, Briand LC (2017) Automated testing of hybrid simulink/stateflow controllers: industrial case studies. In: Bodden E, Schäfer W, van Deursen A, Zisman A (eds) Proceedings of the 2017 11th joint meeting on foundations of software engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, ACM, pp 938–943
Zurück zum Zitat Micco J (2018) Advances in continuous integration testing at google Micco J (2018) Advances in continuous integration testing at google
Zurück zum Zitat Nguyen V, Huber S, Gambi A (2021) Salvo: automated generation of diversified tests for self-driving cars from existing maps. In: 2021 IEEE international conference on artificial intelligence testing (AITest), pp 128–135 Nguyen V, Huber S, Gambi A (2021) Salvo: automated generation of diversified tests for self-driving cars from existing maps. In: 2021 IEEE international conference on artificial intelligence testing (AITest), pp 128–135
Zurück zum Zitat Paydar S, Azamnouri A (2019) An experimental study on flakiness and fragility of randoop regression test suites. In: Fundamentals of software engineering Paydar S, Azamnouri A (2019) An experimental study on flakiness and fragility of randoop regression test suites. In: Fundamentals of software engineering
Zurück zum Zitat Riccio V, Tonella P (2023) When and why test generators for deep learning produce invalid inputs: an empirical study. In: 45th IEEE/ACM international conference on software engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, IEEE, pp 1161–1173 Riccio V, Tonella P (2023) When and why test generators for deep learning produce invalid inputs: an empirical study. In: 45th IEEE/ACM international conference on software engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, IEEE, pp 1161–1173
Zurück zum Zitat Shi A, Gyori A, Legunsen O, Marinov D (2016) Detecting assumptions on deterministic implementations of non-deterministic specifications. In: 2016 IEEE international conference on software testing, verification and validation (ICST), pp 80–90 Shi A, Gyori A, Legunsen O, Marinov D (2016) Detecting assumptions on deterministic implementations of non-deterministic specifications. In: 2016 IEEE international conference on software testing, verification and validation (ICST), pp 80–90
Zurück zum Zitat Ulbrich S, Menzel T, Reschka A, Schuldt F, Maurer M (2015) Defining and substantiating the terms scene, situation, and scenario for automated driving. In: 2015 IEEE 18th international conference on intelligent transportation systems, pp 982–988, https://doi.org/10.1109/ITSC.2015.164 Ulbrich S, Menzel T, Reschka A, Schuldt F, Maurer M (2015) Defining and substantiating the terms scene, situation, and scenario for automated driving. In: 2015 IEEE 18th international conference on intelligent transportation systems, pp 982–988, https://​doi.​org/​10.​1109/​ITSC.​2015.​164
Zurück zum Zitat Vargha A, Delaney HD (2000) A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J Educ Behav Stat 25(2):101–132 Vargha A, Delaney HD (2000) A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J Educ Behav Stat 25(2):101–132
Zurück zum Zitat Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, Amsterdam Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, Amsterdam
Zurück zum Zitat Zohdinasab T, Riccio V, Gambi A, Tonella P (2023) Deephyperion: Exploring the feature space of deep learning-based systems through illumination search. In: Engels G, Hebig R, Tichy M (eds) Software Engineering 2023, Fachtagung des GI-Fachbereichs Softwaretechnik, 20.-24. Februar 2023, Paderborn, Gesellschaft für Informatik e.V., LNI, vol P-332, pp 131–132 Zohdinasab T, Riccio V, Gambi A, Tonella P (2023) Deephyperion: Exploring the feature space of deep learning-based systems through illumination search. In: Engels G, Hebig R, Tichy M (eds) Software Engineering 2023, Fachtagung des GI-Fachbereichs Softwaretechnik, 20.-24. Februar 2023, Paderborn, Gesellschaft für Informatik e.V., LNI, vol P-332, pp 131–132
Metadaten
Titel
Evaluating the impact of flaky simulators on testing autonomous driving systems
verfasst von
Mohammad Hossein Amini
Shervin Naseri
Shiva Nejati
Publikationsdatum
01.03.2024
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 2/2024
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-023-10433-5

Weitere Artikel der Ausgabe 2/2024

Empirical Software Engineering 2/2024 Zur Ausgabe

Premium Partner