Top

Empirical Software Engineering

Published in:

01-03-2024

Evaluating the impact of flaky simulators on testing autonomous driving systems

Authors: Mohammad Hossein Amini, Shervin Naseri, Shiva Nejati

Published in: Empirical Software Engineering | Issue 2/2024

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Simulators are widely used to test Autonomous Driving Systems (ADS), but their potential flakiness can lead to inconsistent test results. We investigate test flakiness in simulation-based testing of ADS by addressing two key questions: (1) How do flaky ADS simulations impact automated testing that relies on randomized algorithms? and (2) Can machine learning (ML) effectively identify flaky ADS tests while decreasing the required number of test reruns? Our empirical results, obtained from two widely-used open-source ADS simulators and five diverse ADS test setups, show that test flakiness in ADS is a common occurrence and can significantly impact the test results obtained by randomized algorithms. Further, our ML classifiers effectively identify flaky ADS tests using only a single test run, achieving F1-scores of 85%, 82% and 96% for three different ADS test setups. Our classifiers significantly outperform our non-ML baseline, which requires executing tests at least twice, by 31%, 21%, and 13% in F1-score performance, respectively. We conclude with a discussion on the scope, implications and limitations of our study. We provide our complete replication package in a Github repository (Github paper 2023).

previous article Studying the impact of risk assessment analytics on risk awareness and code review performance

next article Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

(2016) Udacity self-driving challenge 2. https://github.com/udacity/self-driving-car/tree/master/challenges/challenge-2, Accessed 11 Oct 2019

(2022a) Foundations. https://carla.readthedocs.io/en/latest/foundations/. Accessed 15 Nov 2022

(2022b) Quick start. https://carla.readthedocs.io/en/latest/start_quickstart/. Accessed 15 Nov 2022

(2022) Raquel Urtasun’s tech company develops self-driving vehicle simulator. https://www.thestar.com/business/2022/02/09/raquel-urtasuns-tech-company-develops-self-driving-vehicle-simulator.html. Accessed May 2022

(2023) BeamNG.tech Website. https://beamng.tech. Accessed 3 Mar 2023

(2023) Carla Challenge. https://carla.readthedocs.io/en/latest/adv_traffic_manager/, Accessed 1 Feb 2023

(2023) Github repo for cyber-physical systems testing tool competition. https://github.com/sbft-cps-tool-competition/cps-tool-competition, Accessed 10 Apr 2023

(2023) Github repo for svl simulator: an autonomous vehicle simulator. https://github.com/lgsvl/simulator, Accessed 10 Apr 2023

(2023) Github repo for the paper. https://github.com/anonoymous9423013/anonymous_paper/. Accessed 10 Apr 2023

(2023) Github repo for transfuser: imitation with transformer-based sensor fusion for autonomous driving. https://github.com/autonomousvision/transfuser. Accessed 10 Apr 2023

(2023) Online supplementary material for the paper. https://github.com/anonoymous9423013/anonymous_paper/tree/main/supplementary_materials. Accessed 26 Apr 2023

Abdessalem RB, Nejati S, Briand LC, Stifter T (2018) Testing vision-based control systems using learnable evolutionary algorithms. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), IEEE, pp 1016–1026

Afzal A, Katz DS, Le Goues C, Timperley CS (2021) Simulation for robotics test automation: developer perspectives. In: 2021 14th IEEE conference on software testing, verification and validation (ICST), pp 263–274

Ahlgren J, Bojarczuk K, Drossopoulou S, Dvortsova I, George J, Gucevska N, Harman M, Lomeli M, Lucas SM, Meijer E, et al (2021) Facebook’s cyber–cyber and cyber–physical digital twins. In: Evaluation and assessment in software engineering, pp 1–9

Alshammari A, Morris C, Hilton M, Bell J (2021) Flakeflagger: predicting flakiness without rerunning tests. In: 43rd IEEE/ACM international conference on software engineering: companion proceedings, ICSE Companion 2021, Madrid, Spain, May 25-28, 2021, IEEE, p 187

Bell J, Legunsen O, Hilton M, Eloussi L, Yung T, Marinov D (2018) Deflaker: automatically detecting flaky tests. In: 2018 IEEE/ACM 40th international conference on software engineering (ICSE), pp 433–444

Birchler C, Khatiri S, Bosshard B, Gambi A, Panichella S (2023) Machine learning-based test selection for simulation-based testing of self-driving cars software. Empir Softw Eng 28(3):71CrossRef

Borg M, Abdessalem RB, Nejati S, Jegeden F, Shin D (2021) Digital twins are not monozygotic - cross-replicating ADAS testing in two industry-grade automotive simulators. In: 14th IEEE conference on software testing, verification and validation, ICST 2021, Porto de Galinhas, Brazil, April 12-16, 2021, IEEE, pp 383–393

Capon JA (1991) Elementary Statistics for the Social Sciences: Study Guide. Wadsworth Publishing Company, Belmont, CA, USA

Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRef

Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16

Dutta S, Shi A, Choudhary R, Zhang Z, Jain A, Misailovic S (2020) Detecting flaky tests in probabilistic and machine learning applications. In: Proceedings of the 29th ACM SIGSOFT international symposium on software testing and analysis, association for computing machinery, New York, USA, ISSTA 2020, pp 211–224, https://doi.org/10.1145/3395363.3397366

Gog I, Kalra S, Schafhalter P, Wright MA, Gonzalez JE, Stoica I (2021) Pylot: a modular platform for exploring latency-accuracy tradeoffs in autonomous vehicles. In: 2021 IEEE international conference on robotics and automation (ICRA), IEEE, pp 8806–8813

Goutte C, Gaussier E (2005) A probabilistic interpretation of precision, recall and f-score, with implication for evaluation. In: Losada DE, Fernández-Luna JM (eds) Advances in Information Retrieval. Springer, Berlin, Heidelberg, pp 345–359

Hagan MT, Demuth HB, Beale M (1997) Neural network design. PWS Publishing Co

Haq FU, Shin D, Nejati S, Briand LC (2020) Comparing offline and online testing of deep neural networks: An autonomous car case study. In: 13th IEEE international conference on software testing, validation and verification, ICST 2020, Porto, Portugal, October 24-28, 2020, IEEE, pp 85–95

Haq FU, Shin D, Nejati S, Briand LC (2021) Can offline testing of deep neural networks replace their online testing? Empir Softw Eng 26(5):90CrossRef

Haq FU, Shin D, Briand L (2022) Efficient online testing for dnn-enabled systems using surrogate-assisted and many-objective optimization. In: 2022 IEEE/ACM 44th international conference on software engineering (ICSE), pp 811–822, https://doi.org/10.1145/3510003.3510188

Haq FU, Shin D, Briand LC (2023) Many-objective reinforcement learning for online testing of dnn-enabled systems. In: 45th IEEE/ACM international conference on software engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, IEEE, pp 1814–1826

Harman M, McMinn P (2010) A theoretical and empirical study of search-based testing: local, global, and hybrid search. IEEE Trans Softw Eng 36(2):226–247. https://doi.org/10.1109/TSE.2009.71CrossRef

Herzig K, Nagappan N (2015) Empirically detecting false test alarms using association rules. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 2, pp 39–48

Luke S (2013) Essentials of Metaheuristics, 2nd edn. Lulu, available for free at http://cs.gmu.edu/~sean/book/metaheuristics/

Luo Q, Hariri F, Eloussi L, Marinov D (2014) An empirical analysis of flaky tests. In: Cheung S, Orso A, Storey MD (eds) Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, (FSE-22), Hong Kong, China, November 16 - 22, 2014, ACM, pp 643–653

Matinnejad R, Nejati S, Briand LC (2017) Automated testing of hybrid simulink/stateflow controllers: industrial case studies. In: Bodden E, Schäfer W, van Deursen A, Zisman A (eds) Proceedings of the 2017 11th joint meeting on foundations of software engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, ACM, pp 938–943

Micco J (2018) Advances in continuous integration testing at google

Nguyen V, Huber S, Gambi A (2021) Salvo: automated generation of diversified tests for self-driving cars from existing maps. In: 2021 IEEE international conference on artificial intelligence testing (AITest), pp 128–135

Parry O, Kapfhammer GM, Hilton M, McMinn P (2021) A survey of flaky tests. ACM Trans Softw Eng Methodol 31(1), https://doi.org/10.1145/3476105

Paydar S, Azamnouri A (2019) An experimental study on flakiness and fragility of randoop regression test suites. In: Fundamentals of software engineering

Riccio V, Tonella P (2023) When and why test generators for deep learning produce invalid inputs: an empirical study. In: 45th IEEE/ACM international conference on software engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023, IEEE, pp 1161–1173

Samak CV, Samak TV, Kandhasamy S (2020) Control strategies for autonomous vehicles. arXiv:2011.08729

Shi A, Gyori A, Legunsen O, Marinov D (2016) Detecting assumptions on deterministic implementations of non-deterministic specifications. In: 2016 IEEE international conference on software testing, verification and validation (ICST), pp 80–90

Ulbrich S, Menzel T, Reschka A, Schuldt F, Maurer M (2015) Defining and substantiating the terms scene, situation, and scenario for automated driving. In: 2015 IEEE 18th international conference on intelligent transportation systems, pp 982–988, https://doi.org/10.1109/ITSC.2015.164

Vargha A, Delaney HD (2000) A critique and improvement of the cl common language effect size statistics of mcgraw and wong. J Educ Behav Stat 25(2):101–132

Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, Amsterdam

Zeller A, Gopinath R, Böhme M, Fraser G, Holler C (2023) Code coverage. In: The Fuzzing Book, CISPA Helmholtz Center for Information Security, https://www.fuzzingbook.org/html/Coverage.html, retrieved 2023-01-07 13:54:15+01:00

Zhong Z, Kaiser G, Ray B (2023) Neural network guided evolutionary fuzzing for finding traffic violations of autonomous vehicles. IEEE Trans Softw Eng 49(4):1860–1875. https://doi.org/10.1109/TSE.2022.3195640CrossRef

Zohdinasab T, Riccio V, Gambi A, Tonella P (2023) Deephyperion: Exploring the feature space of deep learning-based systems through illumination search. In: Engels G, Hebig R, Tichy M (eds) Software Engineering 2023, Fachtagung des GI-Fachbereichs Softwaretechnik, 20.-24. Februar 2023, Paderborn, Gesellschaft für Informatik e.V., LNI, vol P-332, pp 131–132

Title: Evaluating the impact of flaky simulators on testing autonomous driving systems
Authors: Mohammad Hossein Amini
Shervin Naseri
Shiva Nejati
Publication date: 01-03-2024
Publisher: Springer US
Published in: Empirical Software Engineering / Issue 2/2024
Print ISSN: 1382-3256
Electronic ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-023-10433-5

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Other articles of this Issue 2/2024

Demystifying API misuses in deep learning applications

Analyzing source code vulnerabilities in the D2A dataset with ML ensembles and C-BERT

Improving the quality of software issue report descriptions in Turkish: An industrial case study at Softtech

When less is more: on the value of “co-training” for semi-supervised software defect predictors

LineFlowDP: A Deep Learning-Based Two-Phase Approach for Line-Level Defect Prediction

Traceability and reuse mechanisms, the most important properties of model transformation languages

Premium Partner