Skip to main content
Erschienen in: Empirical Software Engineering 5/2021

Open Access 01.09.2021

Can Offline Testing of Deep Neural Networks Replace Their Online Testing?

A Case Study of Automated Driving Systems

verfasst von: Fitash Ul Haq, Donghwan Shin, Shiva Nejati, Lionel Briand

Erschienen in: Empirical Software Engineering | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

We distinguish two general modes of testing for Deep Neural Networks (DNNs): Offline testing where DNNs are tested as individual units based on test datasets obtained without involving the DNNs under test, and online testing where DNNs are embedded into a specific application environment and tested in a closed-loop mode in interaction with the application environment. Typically, DNNs are subjected to both types of testing during their development life cycle where offline testing is applied immediately after DNN training and online testing follows after offline testing and once a DNN is deployed within a specific application environment. In this paper, we study the relationship between offline and online testing. Our goal is to determine how offline testing and online testing differ or complement one another and if offline testing results can be used to help reduce the cost of online testing? Though these questions are generally relevant to all autonomous systems, we study them in the context of automated driving systems where, as study subjects, we use DNNs automating end-to-end controls of steering functions of self-driving vehicles. Our results show that offline testing is less effective than online testing as many safety violations identified by online testing could not be identified by offline testing, while large prediction errors generated by offline testing always led to severe safety violations detectable by online testing. Further, we cannot exploit offline testing results to reduce the cost of online testing in practice since we are not able to identify specific situations where offline testing could be as accurate as online testing in identifying safety requirement violations.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
This is how Tian et al. (2018) have interpreted the steering angle values provided along with the Udacity dataset, and we follow their interpretation. We were not able to find any explicit information about the measurement unit of these values anywhere else.
 
2
Autumn’s RMSE is not presented in the final leaderboard.
 
3
If f has multiple points of the minima, one of them is randomly returned.
 
4
We use PICT (https://​github.​com/​microsoft/​pict) to compute combinatorial coverage.
 
Literatur
Zurück zum Zitat Ciresan DC, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. arXiv:1202.2745 Ciresan DC, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. arXiv:1202.​2745
Zurück zum Zitat Codevilla F, Lopez AM, Koltun V, Dosovitskiy A (2018) On offline evaluation of vision-based driving models. In: The european conference on computer vision (ECCV) Codevilla F, Lopez AM, Koltun V, Dosovitskiy A (2018) On offline evaluation of vision-based driving models. In: The european conference on computer vision (ECCV)
Zurück zum Zitat Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: An open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16 Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: An open urban driving simulator. In: Proceedings of the 1st annual conference on robot learning, pp 1–16
Zurück zum Zitat Dreossi T, Ghosh S, Sangiovanni-Vincentelli A, Seshia SA (2017) Systematic testing of convolutional neural networks for autonomous driving. arXiv:1708.03309 Dreossi T, Ghosh S, Sangiovanni-Vincentelli A, Seshia SA (2017) Systematic testing of convolutional neural networks for autonomous driving. arXiv:1708.​03309
Zurück zum Zitat Gambi A, Mueller M, Fraser G (2019) Automatically testing self-driving cars with search-based procedural content generation. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, New York, NY, USA, ISSTA, 2019 pp 318–328. https://doi.org/10.1145/3293882.3330566 Gambi A, Mueller M, Fraser G (2019) Automatically testing self-driving cars with search-based procedural content generation. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ACM, New York, NY, USA, ISSTA, 2019 pp 318–328. https://​doi.​org/​10.​1145/​3293882.​3330566
Zurück zum Zitat Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite
Zurück zum Zitat Haq FU, Shin D, Nejati S, Briand L (2020a) Comparing offline and online testing of deep neural networks: An autonomous car case study. In: 2020 IEEE International conference on software testing, verification and validation, p to appear Haq FU, Shin D, Nejati S, Briand L (2020a) Comparing offline and online testing of deep neural networks: An autonomous car case study. In: 2020 IEEE International conference on software testing, verification and validation, p to appear
Zurück zum Zitat Ma L, Juefei-Xu F, Zhang F, Sun J, Xue M, Li B, Chen C, Su T, Li L, Liu Y, Zhao J, Wang Y (2018) Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ACM, New York, NY, USA, ASE, 2018, pp 120–131. https://doi.org/10.1145/3238147.3238202 Ma L, Juefei-Xu F, Zhang F, Sun J, Xue M, Li B, Chen C, Su T, Li L, Liu Y, Zhao J, Wang Y (2018) Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ACM, New York, NY, USA, ASE, 2018, pp 120–131. https://​doi.​org/​10.​1145/​3238147.​3238202
Zurück zum Zitat Majumdar R, Mathur A, Pirron M, Stegner L, Zufferey D (2019) Paracosm: A language and tool for testing autonomous driving systems. arXiv:1902.01084 Majumdar R, Mathur A, Pirron M, Stegner L, Zufferey D (2019) Paracosm: A language and tool for testing autonomous driving systems. arXiv:1902.​01084
Zurück zum Zitat McGehee DV, Mazzae EN, Baldwin GS (2000) Driver reaction time in crash avoidance research: Validation of a driving simulator study on a test track. In: Proceedings of the human factors and ergonomics society annual meeting 44(20):3–320–3–323 https://doi.org/10.1177/154193120004402026 McGehee DV, Mazzae EN, Baldwin GS (2000) Driver reaction time in crash avoidance research: Validation of a driving simulator study on a test track. In: Proceedings of the human factors and ergonomics society annual meeting 44(20):3–320–3–323 https://​doi.​org/​10.​1177/​1541931200044020​26
Zurück zum Zitat Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems, pp 305–313 Pomerleau DA (1989) Alvinn: An autonomous land vehicle in a neural network. In: Advances in neural information processing systems, pp 305–313
Zurück zum Zitat Rong G, Shin BH, Tabatabaee H, Lu Q, Lemke S, Mozeikǒ M, Boise E, Uhm G, Gerow M, Mehta S, Agafonov E, Kim TH, Sterner E, Ushiroda K, Reyes M, Zelenkovsky D, Kim S (2020) Lgsvl simulator: A high fidelity simulator for autonomous driving. In: 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC), pp 1–6 https://doi.org/10.1109/ITSC45102.2020.9294422 Rong G, Shin BH, Tabatabaee H, Lu Q, Lemke S, Mozeikǒ M, Boise E, Uhm G, Gerow M, Mehta S, Agafonov E, Kim TH, Sterner E, Ushiroda K, Reyes M, Zelenkovsky D, Kim S (2020) Lgsvl simulator: A high fidelity simulator for autonomous driving. In: 2020 IEEE 23rd International conference on intelligent transportation systems (ITSC), pp 1–6 https://​doi.​org/​10.​1109/​ITSC45102.​2020.​9294422
Zurück zum Zitat Shah S, Dey D, Lovett C, Kapoor A (2018) Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Hutter M, Siegwart R (eds) Field and service robotics, springer international publishing, Cham, pp 621–635 Shah S, Dey D, Lovett C, Kapoor A (2018) Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Hutter M, Siegwart R (eds) Field and service robotics, springer international publishing, Cham, pp 621–635
Zurück zum Zitat Sotiropoulos T, Waeselynck H, Guiochet J, Ingrand F (2017) Can robot navigation bugs be found in simulation? an exploratory study. In: 2017 IEEE International conference on software quality, reliability and security (QRS), pp 150–159 https://doi.org/10.1109/QRS.2017.25 Sotiropoulos T, Waeselynck H, Guiochet J, Ingrand F (2017) Can robot navigation bugs be found in simulation? an exploratory study. In: 2017 IEEE International conference on software quality, reliability and security (QRS), pp 150–159 https://​doi.​org/​10.​1109/​QRS.​2017.​25
Zurück zum Zitat Sutskever I, Vinyals O, Le Q V (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence N D, Weinberger K Q (eds) Advances in Neural Information Processing Systems. 27 Curran Associates Inc. pp 3104–3112 Sutskever I, Vinyals O, Le Q V (2014) Sequence to sequence learning with neural networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence N D, Weinberger K Q (eds) Advances in Neural Information Processing Systems. 27 Curran Associates Inc. pp 3104–3112
Zurück zum Zitat Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th international conference on software engineering, ACM, New York, NY, USA, ICSE ’18, pp 303–314. https://doi.org/10.1145/3180155.3180220 Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th international conference on software engineering, ACM, New York, NY, USA, ICSE ’18, pp 303–314. https://​doi.​org/​10.​1145/​3180155.​3180220
Zurück zum Zitat Wicker M, Huang X, Kwiatkowska M (2018) Feature-guided black-box safety testing of deep neural networks. In: Beyer D, Huisman M (eds) Tools and Algorithms for the Construction and Analysis of Systems. Springer International Publishing, Cham, pp 408–426 Wicker M, Huang X, Kwiatkowska M (2018) Feature-guided black-box safety testing of deep neural networks. In: Beyer D, Huisman M (eds) Tools and Algorithms for the Construction and Analysis of Systems. Springer International Publishing, Cham, pp 408–426
Zurück zum Zitat Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: Survey, landscapes and horizons. IEEE Trans Softw Eng 1–1 Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: Survey, landscapes and horizons. IEEE Trans Softw Eng 1–1
Zurück zum Zitat Zhang M, Zhang Y, Zhang L, Liu C, Khurshid S (2018) Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, acm, New York, NY, USA, ASE 2018, pp 132–142. https://doi.org/10.1145/3238147.3238187 Zhang M, Zhang Y, Zhang L, Liu C, Khurshid S (2018) Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, acm, New York, NY, USA, ASE 2018, pp 132–142. https://​doi.​org/​10.​1145/​3238147.​3238187
Zurück zum Zitat Zhou H, Li W, Zhu Y, Zhang Y, Yu B, Zhang L, Liu C (2018) Deepbillboard: Systematic physical-world testing of autonomous driving systems. arXiv:1812.10812 Zhou H, Li W, Zhu Y, Zhang Y, Yu B, Zhang L, Liu C (2018) Deepbillboard: Systematic physical-world testing of autonomous driving systems. arXiv:1812.​10812
Metadaten
Titel
Can Offline Testing of Deep Neural Networks Replace Their Online Testing?
A Case Study of Automated Driving Systems
verfasst von
Fitash Ul Haq
Donghwan Shin
Shiva Nejati
Lionel Briand
Publikationsdatum
01.09.2021
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 5/2021
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-021-09982-4

Weitere Artikel der Ausgabe 5/2021

Empirical Software Engineering 5/2021 Zur Ausgabe

Premium Partner