nach oben

Empirical Software Engineering

Erschienen in:

Open Access 01.09.2021

Can Offline Testing of Deep Neural Networks Replace Their Online Testing?

A Case Study of Automated Driving Systems

verfasst von: Fitash Ul Haq, Donghwan Shin, Shiva Nejati, Lionel Briand

Erschienen in: Empirical Software Engineering | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

We distinguish two general modes of testing for Deep Neural Networks (DNNs): Offline testing where DNNs are tested as individual units based on test datasets obtained without involving the DNNs under test, and online testing where DNNs are embedded into a specific application environment and tested in a closed-loop mode in interaction with the application environment. Typically, DNNs are subjected to both types of testing during their development life cycle where offline testing is applied immediately after DNN training and online testing follows after offline testing and once a DNN is deployed within a specific application environment. In this paper, we study the relationship between offline and online testing. Our goal is to determine how offline testing and online testing differ or complement one another and if offline testing results can be used to help reduce the cost of online testing? Though these questions are generally relevant to all autonomous systems, we study them in the context of automated driving systems where, as study subjects, we use DNNs automating end-to-end controls of steering functions of self-driving vehicles. Our results show that offline testing is less effective than online testing as many safety violations identified by online testing could not be identified by offline testing, while large prediction errors generated by offline testing always led to severe safety violations detectable by online testing. Further, we cannot exploit offline testing results to reduce the cost of online testing in practice since we are not able to identify specific situations where offline testing could be as accurate as online testing in identifying safety requirement violations.

Vorheriger Artikel An empirical study of same-day releases of popular packages in the npm ecosystem

Nächster Artikel The entrepreneurial logic of startup software development: A study of 40 software startups

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

This is how Tian et al. (2018) have interpreted the steering angle values provided along with the Udacity dataset, and we follow their interpretation. We were not able to find any explicit information about the measurement unit of these values anywhere else.

Autumn’s RMSE is not presented in the final leaderboard.

If f has multiple points of the minima, one of them is randomly returned.

We use PICT (https://github.com/microsoft/pict) to compute combinatorial coverage.

Archer KJ, Kimes RV (2008) Empirical characterization of random forest variable importance measures, vol 52. https://doi.org/10.1016/j.csda.2007.08.015. http://www.sciencedirect.com/science/article/pii/S0167947307003076

Autumn T (2016) Autumn model. https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/autumn, Accessed: 2019-10-11

Barr ET, Harman M, McMinn P, Shahbaz M, Yoo S (2015) The oracle problem in software testing: A survey. IEEE Trans Softw Eng 41 (5):507–525. https://doi.org/10.1109/TSE.2014.2372785CrossRef

Chauffeur T (2016) Chauffeur model. https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/chauffeur, Accessed: 2019-10-11

Ciresan DC, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. arXiv:1202.2745

Codevilla F, Lopez AM, Koltun V, Dosovitskiy A (2018) On offline evaluation of vision-based driving models. In: The european conference on computer vision (ECCV)

Cohen WW (1995) Fast effective rule induction. In: Prieditis A, Russell S (eds) Machine Learning Proceedings 1995, Morgan Kaufmann, San Francisco (CA), pp 115–123 https://doi.org/10.1016/B978-1-55860-377-6.50023-2. http://www.sciencedirect.com/science/article/pii/B9781558603776500232

Deng L, Hinton G, Kingsbury B (2013) New types of deep neural network learning for speech recognition and related applications: an overview. In: 2013 IEEE International conference on acoustics, speech and signal processing, pp 8599–8603 https://doi.org/10.1109/ICASSP.2013.6639344

Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: An open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16

Dreossi T, Ghosh S, Sangiovanni-Vincentelli A, Seshia SA (2017) Systematic testing of convolutional neural networks for autonomous driving. arXiv:1708.03309

ESI Group (2019) Esi pro-sivic - 3d simulations of environments and sensors. https://www.esi-group.com/software-solutions/virtual-environment/virtual-systems-controls/esi-pro-sivictm-3d-simulations-environments-and-sensors, Accessed: 2019-10-11

Gambi A, Mueller M, Fraser G (2019) Automatically testing self-driving cars with search-based procedural content generation. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, New York, NY, USA, ISSTA, 2019 pp 318–328. https://doi.org/10.1145/3293882.3330566

Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite

Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31(14):2225–2236. https://doi.org/10.1016/j.patrec.2010.03.014. http://www.sciencedirect.com/science/article/pii/S0167865510000954CrossRef

Group OM (2014) Object constraint language specification. https://www.omg.org/spec/OCL/, Accessed: 2019-10-11

Haq FU, Shin D, Nejati S, Briand L (2020a) Comparing offline and online testing of deep neural networks: An autonomous car case study. In: 2020 IEEE International conference on software testing, verification and validation, p to appear

Haq FU, Shin D, Nejati S, Briand L (2020b) Supporting materials (temporal link for the double-blind review). http://tiny.cc/Experiment-data, Accessed: 2020-07-26

Kalra N, Paddock SM (2016) Driving to safety: How many miles of driving would it take to demonstrate autonomous vehicle reliability?. Trans Res Part A Pol Pract 94:182–193. https://doi.org/10.1016/j.tra.2016.09.010. http://www.sciencedirect.com/science/article/pii/S0965856416302129CrossRef

Kim J, Feldt R, Yoo S (2019) Guiding deep learning system testing using surprise adequacy. In: Proceedings of the 41st international conference on software engineering, IEEE Press, Piscataway, NJ, USA, ICSE ’19, pp 1039–1049 https://doi.org/10.1109/ICSE.2019.00108

Komanda T (2016) Komanda model. https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/komanda, Accessed: 2020-04-14

Ma L, Juefei-Xu F, Zhang F, Sun J, Xue M, Li B, Chen C, Su T, Li L, Liu Y, Zhao J, Wang Y (2018) Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ACM, New York, NY, USA, ASE, 2018, pp 120–131. https://doi.org/10.1145/3238147.3238202

Majumdar R, Mathur A, Pirron M, Stegner L, Zufferey D (2019) Paracosm: A language and tool for testing autonomous driving systems. arXiv:1902.01084

McGehee DV, Mazzae EN, Baldwin GS (2000) Driver reaction time in crash avoidance research: Validation of a driving simulator study on a test track. In: Proceedings of the human factors and ergonomics society annual meeting 44(20):3–320–3–323 https://doi.org/10.1177/154193120004402026

Pei K, Cao Y, Yang J, Jana S (2017) Deepxplore: Automated whitebox testing of deep learning systems. In: Proceedings of the 26th symposium on operating systems principles, ACM, New York, NY, USA, SOSP ’17, pp 1–18 https://doi.org/10.1145/3132747.3132785

Pineau J (2019) Icse 2019 keynote: Building reproducible, reusable, and robust machine learning software. https://2019.icse-conferences.org/details/icse-2019-Plenary-Sessions/20/Building-Reproducible-Reusable-and-Robust-Machine-Learning-Software, Accessed: 2019-10-11

Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems, pp 305–313

Rong G, Shin BH, Tabatabaee H, Lu Q, Lemke S, Mozeikǒ M, Boise E, Uhm G, Gerow M, Mehta S, Agafonov E, Kim TH, Sterner E, Ushiroda K, Reyes M, Zelenkovsky D, Kim S (2020) Lgsvl simulator: A high fidelity simulator for autonomous driving. In: 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC), pp 1–6 https://doi.org/10.1109/ITSC45102.2020.9294422

Shah S, Dey D, Lovett C, Kapoor A (2018) Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Hutter M, Siegwart R (eds) Field and service robotics, springer international publishing, Cham, pp 621–635

Sotiropoulos T, Waeselynck H, Guiochet J, Ingrand F (2017) Can robot navigation bugs be found in simulation? an exploratory study. In: 2017 IEEE International conference on software quality, reliability and security (QRS), pp 150–159 https://doi.org/10.1109/QRS.2017.25

Sutskever I, Vinyals O, Le Q V (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence N D, Weinberger K Q (eds) Advances in Neural Information Processing Systems. 27 Curran Associates Inc. pp 3104–3112

TASS International - Siemens Group (2019) Prescan: Simulation of adas and active safety. https://tass.plm.automation.siemens.com, Accessed: 2019-10-11

Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th international conference on software engineering, ACM, New York, NY, USA, ICSE ’18, pp 303–314. https://doi.org/10.1145/3180155.3180220

Tuncali CE, Fainekos G, Ito H, Kapinski J (2018) Simulation-based adversarial test generation for autonomous vehicles with machine learning components. In: 2018 IEEE intelligent vehicles symposium, IV pp 1555–1562. https://doi.org/10.1109/IVS.2018.8500421

Udacity (2016a) Udacity self-driving car challenge 2: Using deep learning to predict steering angles. https://github.com/udacity/self-driving-car/tree/master/challenges/challenge-2, Accessed: 2019-10-11

Udacity (2016b) Udacity self-driving challenge 2, ch2-001 (testing) and ch2-002 (training). https://github.com/udacity/self-driving-car/tree/master/datasets/CH2, Accessed: 2019-10-11

Wicker M, Huang X, Kwiatkowska M (2018) Feature-guided black-box safety testing of deep neural networks. In: Beyer D, Huisman M (eds) Tools and Algorithms for the Construction and Analysis of Systems. Springer International Publishing, Cham, pp 408–426

Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: Survey, landscapes and horizons. IEEE Trans Softw Eng 1–1

Zhang M, Zhang Y, Zhang L, Liu C, Khurshid S (2018) Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, acm, New York, NY, USA, ASE 2018, pp 132–142. https://doi.org/10.1145/3238147.3238187

Zhou H, Li W, Zhu Y, Zhang Y, Yu B, Zhang L, Liu C (2018) Deepbillboard: Systematic physical-world testing of autonomous driving systems. arXiv:1812.10812

Zhou ZQ, Sun L (2019) Metamorphic testing of driverless cars. Commun ACM 62(3):61–67. https://doi.org/10.1145/3241979CrossRef

Titel: Can Offline Testing of Deep Neural Networks Replace Their Online Testing?
A Case Study of Automated Driving Systems
verfasst von: Fitash Ul Haq
Donghwan Shin
Shiva Nejati
Lionel Briand
Publikationsdatum: 01.09.2021
Verlag: Springer US
Erschienen in: Empirical Software Engineering / Ausgabe 5/2021
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI: https://doi.org/10.1007/s10664-021-09982-4

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Springer Professional "Technik"

Springer Professional "Wirtschaft+Technik"

Weitere Artikel der Ausgabe 5/2021

Perceived diversity in software engineering: a systematic literature review

Automated driver management for selenium WebDriver

Topic recommendation for software repositories using multi-label classification algorithms

AI lifecycle models need to be revised

Evaluating refactorings for disciplining #ifdef annotations: An eye tracking study with novices

Finding the sweet spot for organizational control and team autonomy in large-scale agile software development

Premium Partner