Skip to main content
Erschienen in: Empirical Software Engineering 5/2021

01.09.2021

To what extent do DNN-based image classification models make unreliable inferences?

verfasst von: Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, Shing-Chi Cheung, Xiangyu Zhang

Erschienen in: Empirical Software Engineering | Ausgabe 5/2021

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model’s overall accuracy. After including these unreliable inferences from the test set, the model’s accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Fußnoten
1
The latter study refers this concept as “prediction confidence”
 
3
in the Chi-square test, it is usually referred to as Cramér’s V (Cramer 1946)
 
7
The MR-3/4/5/6 are just our initial proposals. The detailed definition should be polished and their effectiveness should be thoroughly evaluated.
 
Literatur
Zurück zum Zitat Aggarwal A, Lohia P, Nagar S, Dey K, Saha D (2019) Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2019, New York, NY, USA, pp 625–635. https://doi.org/10.1145/3338906.3338937 Aggarwal A, Lohia P, Nagar S, Dey K, Saha D (2019) Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2019, New York, NY, USA, pp 625–635. https://​doi.​org/​10.​1145/​3338906.​3338937
Zurück zum Zitat Barr ET, Harman M, McMinn P, Shahbaz M, Yoo S (2015) The oracle problem in software testing: A survey. IEEE Trans Softw Eng 41 (5):507–525CrossRef Barr ET, Harman M, McMinn P, Shahbaz M, Yoo S (2015) The oracle problem in software testing: A survey. IEEE Trans Softw Eng 41 (5):507–525CrossRef
Zurück zum Zitat Ben-Baruch E, Ridnik T, Zamir N, Noy A, Friedman I, Protter M, Zelnik-Manor L (2020) Asymmetric loss for multi-label classification. arXiv:2009.14119 Ben-Baruch E, Ridnik T, Zamir N, Noy A, Friedman I, Protter M, Zelnik-Manor L (2020) Asymmetric loss for multi-label classification. arXiv:2009.​14119
Zurück zum Zitat Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4 Benesty J, Chen J, Huang Y, Cohen I (2009) Pearson correlation coefficient. In: Noise reduction in speech processing. Springer, pp 1–4
Zurück zum Zitat Chen TY, Cheung SC, Yiu SM (1998) Metamorphic testing: a new approach for generating next test cases. Tech. Rep. HKUST-CS98-01 Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong Chen TY, Cheung SC, Yiu SM (1998) Metamorphic testing: a new approach for generating next test cases. Tech. Rep. HKUST-CS98-01 Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong
Zurück zum Zitat Cochran W (1963) Sampling techniques, 2nd edn. [Wiley Publications in Statistics.], John Wiley & Sons, New YorkMATH Cochran W (1963) Sampling techniques, 2nd edn. [Wiley Publications in Statistics.], John Wiley & Sons, New YorkMATH
Zurück zum Zitat Cramer H (1946) Mathematical methods of statistics. Princeton University Press, PrincetonMATH Cramer H (1946) Mathematical methods of statistics. Princeton University Press, PrincetonMATH
Zurück zum Zitat Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: CVPR09 Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) ImageNet: A large-scale hierarchical image database. In: CVPR09
Zurück zum Zitat Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, pp 4171–4186. https://doi.org/10.18653/v1/n19-1423 Devlin J, Chang M, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Burstein J, Doran C, Solorio T (eds) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, pp 4171–4186. https://​doi.​org/​10.​18653/​v1/​n19-1423
Zurück zum Zitat Dwarakanath A, Ahuja M, Sikand S, Rao RM, Bose RPJC, Dubash N, Podder S (2018) Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2018. ACM, New York, NY, USA, pp 118–128. https://doi.org/10.1145/3213846.3213858 Dwarakanath A, Ahuja M, Sikand S, Rao RM, Bose RPJC, Dubash N, Podder S (2018) Identifying implementation bugs in machine learning based image classifiers using metamorphic testing. In: Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2018. ACM, New York, NY, USA, pp 118–128. https://​doi.​org/​10.​1145/​3213846.​3213858
Zurück zum Zitat Fahmy H, Pastore F, Bagherzadeh M, Briand L (2020) Supporting dnn safety analysis and retraining through heatmap-based unsupervised learning. arXiv:2002.00863 Fahmy H, Pastore F, Bagherzadeh M, Briand L (2020) Supporting dnn safety analysis and retraining through heatmap-based unsupervised learning. arXiv:2002.​00863
Zurück zum Zitat Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi P (ed) Theory, computational learning. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 23–37 Freund Y, Schapire RE (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Vitányi P (ed) Theory, computational learning. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 23–37
Zurück zum Zitat Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann FA, Brendel W (2019) Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In: 7th International conference on learning representations, ICLR 2019, May 6-9, 2019, OpenReview.net, New Orleans, LA, USA. https://openreview.net/forum?id=Bygh9j09KX Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann FA, Brendel W (2019) Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. In: 7th International conference on learning representations, ICLR 2019, May 6-9, 2019, OpenReview.net, New Orleans, LA, USA. https://​openreview.​net/​forum?​id=​Bygh9j09KX
Zurück zum Zitat Guo J, Jiang Y, Zhao Y, Chen Q, Sun J (2018) Dlfuzz: Differential fuzzing testing of deep learning systems. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2018, New York, NY, USA, pp 739–743. https://doi.org/10.1145/3236024.3264835 Guo J, Jiang Y, Zhao Y, Chen Q, Sun J (2018) Dlfuzz: Differential fuzzing testing of deep learning systems. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2018, New York, NY, USA, pp 739–743. https://​doi.​org/​10.​1145/​3236024.​3264835
Zurück zum Zitat Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861 Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.​04861
Zurück zum Zitat Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, July 21-26, 2017. IEEE Computer Society, Honolulu, HI, USA, pp 2261–2269. https://doi.org/10.1109/CVPR.2017.243 Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, July 21-26, 2017. IEEE Computer Society, Honolulu, HI, USA, pp 2261–2269. https://​doi.​org/​10.​1109/​CVPR.​2017.​243
Zurück zum Zitat Krasin I, Duerig T, Alldrin N, Ferrari V, Abu-El-Haija S, Kuznetsova A, Rom H, Uijlings J, Popov S, Kamali S, Malloci M, Pont-Tuset J, Veit A, Belongie S, Gomes V, Gupta A, Sun C, Chechik G, Cai D, Feng Z, Narayanan D, Murphy K (2017) Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://storagegoogleapiscom/openimages/web/indexhtml Krasin I, Duerig T, Alldrin N, Ferrari V, Abu-El-Haija S, Kuznetsova A, Rom H, Uijlings J, Popov S, Kamali S, Malloci M, Pont-Tuset J, Veit A, Belongie S, Gomes V, Gupta A, Sun C, Chechik G, Cai D, Feng Z, Narayanan D, Murphy K (2017) Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://​storagegoogleapi​scom/​openimages/​web/​indexhtml
Zurück zum Zitat Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174CrossRef Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33(1):159–174CrossRef
Zurück zum Zitat Lin T, Maire M, Belongie SJ, Bourdev LD, Girshick RB, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. arXiv:1405.0312 Lin T, Maire M, Belongie SJ, Bourdev LD, Girshick RB, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. arXiv:1405.​0312
Zurück zum Zitat Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T (2011) Large-scale image classification: Fast feature extraction and svm training. In: CVPR 2011, pp 1689–1696 Lin Y, Lv F, Zhu S, Yang M, Cour T, Yu K, Cao L, Huang T (2011) Large-scale image classification: Fast feature extraction and svm training. In: CVPR 2011, pp 1689–1696
Zurück zum Zitat Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 21–37 Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 21–37
Zurück zum Zitat Ma L, Juefei-Xu F, Zhang F, Sun J, Xue M, Li B, Chen C, Su T, Li L, Liu Y, Zhao J, Wang Y (2018a) Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018. ACM, New York, NY, USA, pp 120–131. https://doi.org/10.1145/3238147.3238202 Ma L, Juefei-Xu F, Zhang F, Sun J, Xue M, Li B, Chen C, Su T, Li L, Liu Y, Zhao J, Wang Y (2018a) Deepgauge: Multi-granularity testing criteria for deep learning systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018. ACM, New York, NY, USA, pp 120–131. https://​doi.​org/​10.​1145/​3238147.​3238202
Zurück zum Zitat Ma L, Zhang F, Sun J, Xue M, Li B, Juefei-Xu F, Xie C, Li L, Liu Y, Zhao J, Wang Y (2018b) Deepmutation: Mutation testing of deep learning systems. In: Ghosh S, Natella R, Cukic B, Poston R, Laranjeiro N (eds) 29th IEEE international symposium on software reliability engineering, ISSRE 2018, October 15-18, 2018. IEEE Computer Society, Memphis, TN, USA, pp 100–111. https://doi.org/10.1109/ISSRE.2018.00021 Ma L, Zhang F, Sun J, Xue M, Li B, Juefei-Xu F, Xie C, Li L, Liu Y, Zhao J, Wang Y (2018b) Deepmutation: Mutation testing of deep learning systems. In: Ghosh S, Natella R, Cukic B, Poston R, Laranjeiro N (eds) 29th IEEE international symposium on software reliability engineering, ISSRE 2018, October 15-18, 2018. IEEE Computer Society, Memphis, TN, USA, pp 100–111. https://​doi.​org/​10.​1109/​ISSRE.​2018.​00021
Zurück zum Zitat Ma S, Liu Y, Lee WC, Zhang X, Grama A (2018c) Mode: Automated neural network model debugging via state differential analysis and input selection. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2018, New York, NY, USA, pp 175–186. https://doi.org/10.1145/3236024.3236082 Ma S, Liu Y, Lee WC, Zhang X, Grama A (2018c) Mode: Automated neural network model debugging via state differential analysis and input selection. In: Proceedings of the 2018 26th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, association for computing machinery, ESEC/FSE 2018, New York, NY, USA, pp 175–186. https://​doi.​org/​10.​1145/​3236024.​3236082
Zurück zum Zitat Montavon G, Binder A, Lapuschkin S, Samek W, Müller KR (2019) Layer-wise relevance propagation: an overview. In: Explainable AI: interpreting, explaining and visualizing deep learning. Springer, pp 193–209 Montavon G, Binder A, Lapuschkin S, Samek W, Müller KR (2019) Layer-wise relevance propagation: an overview. In: Explainable AI: interpreting, explaining and visualizing deep learning. Springer, pp 193–209
Zurück zum Zitat Odena A, Olsson C, Andersen D, Goodfellow IJ (2019) Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, PMLR, Proceedings of machine learning research, vol 97, pp 4901–4911. http://proceedings.mlr.press/v97/odena19a.html Odena A, Olsson C, Andersen D, Goodfellow IJ (2019) Tensorfuzz: Debugging neural networks with coverage-guided fuzzing. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, PMLR, Proceedings of machine learning research, vol 97, pp 4901–4911. http://​proceedings.​mlr.​press/​v97/​odena19a.​html
Zurück zum Zitat Pham HV, Lutellier T, Qi W, Tan L (2019) CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In: Proceedings of the 41st international conference on software engineering, ICSE ’19. IEEE Press, pp 1027–1038. https://doi.org/10.1109/ICSE.2019.00107 Pham HV, Lutellier T, Qi W, Tan L (2019) CRADLE: cross-backend validation to detect and localize bugs in deep learning libraries. In: Proceedings of the 41st international conference on software engineering, ICSE ’19. IEEE Press, pp 1027–1038. https://​doi.​org/​10.​1109/​ICSE.​2019.​00107
Zurück zum Zitat Ribeiro MT, Singh S, Guestrin C (2016) “why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, August 13-17, 2016, San Francisco, CA, USA, pp 1135–1144 Ribeiro MT, Singh S, Guestrin C (2016) “why should I trust you?”: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, August 13-17, 2016, San Francisco, CA, USA, pp 1135–1144
Zurück zum Zitat Roobaert D, Zillich M, Eklundh J (2001) A pure learning approach to background-invariant object recognition using pedagogical support vector learning. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 2, pp II–II. https://doi.org/10.1109/CVPR.2001.990982 Roobaert D, Zillich M, Eklundh J (2001) A pure learning approach to background-invariant object recognition using pedagogical support vector learning. In: Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, vol 2, pp II–II. https://​doi.​org/​10.​1109/​CVPR.​2001.​990982
Zurück zum Zitat Sanchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR ’11. IEEE Computer Society, USA, pp 1665–1672. https://doi.org/10.1109/CVPR.2011.5995504 Sanchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, CVPR ’11. IEEE Computer Society, USA, pp 1665–1672. https://​doi.​org/​10.​1109/​CVPR.​2011.​5995504
Zurück zum Zitat Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision, ICCV 2017, October 22-29, 2017. IEEE Computer Society, Venice, Italy, pp 618–626. https://doi.org/10.1109/ICCV.2017.74 Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: IEEE international conference on computer vision, ICCV 2017, October 22-29, 2017. IEEE Computer Society, Venice, Italy, pp 618–626. https://​doi.​org/​10.​1109/​ICCV.​2017.​74
Zurück zum Zitat Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, May 7-9, 2015, conference track proceedings, San Diego, CA, USA Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd international conference on learning representations, ICLR 2015, May 7-9, 2015, conference track proceedings, San Diego, CA, USA
Zurück zum Zitat Stock P, Cissé M (2018) Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018 - 15th european conference, September 8-14, 2018, Proceedings, Part VI, Lecture Notes in Computer Science, vol 11210. Springer, Munich, Germany, pp 504–519. https://doi.org/10.1007/978-3-030-01231-1_31 Stock P, Cissé M (2018) Convnets and imagenet beyond accuracy: Understanding mistakes and uncovering biases. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision - ECCV 2018 - 15th european conference, September 8-14, 2018, Proceedings, Part VI, Lecture Notes in Computer Science, vol 11210. Springer, Munich, Germany, pp 504–519. https://​doi.​org/​10.​1007/​978-3-030-01231-1_​31
Zurück zum Zitat Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th international conference on software engineering, ICSE ’18. ACM, New York, NY, USA, pp 303–314. https://doi.org/10.1145/3180155.3180220 Tian Y, Pei K, Jana S, Ray B (2018) Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In: Proceedings of the 40th international conference on software engineering, ICSE ’18. ACM, New York, NY, USA, pp 303–314. https://​doi.​org/​10.​1145/​3180155.​3180220
Zurück zum Zitat Tian Y, Zeng Z, Wen M, Liu Y, Kuo Ty, Cheung SC (2020a) Evaldnn: A toolbox for evaluating deep neural network models. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering: companion proceedings, association for computing machinery, ICSE ’20, New York, NY, USA, pp 45–48. https://doi.org/10.1145/3377812.3382133 Tian Y, Zeng Z, Wen M, Liu Y, Kuo Ty, Cheung SC (2020a) Evaldnn: A toolbox for evaluating deep neural network models. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering: companion proceedings, association for computing machinery, ICSE ’20, New York, NY, USA, pp 45–48. https://​doi.​org/​10.​1145/​3377812.​3382133
Zurück zum Zitat Tian Y, Zhong Z, Ordonez V, Kaiser G, Ray B (2020b) Testing dnn image classifiers for confusion & bias errors. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA, pp 1122–1134. https://doi.org/10.1145/3377811.3380400 Tian Y, Zhong Z, Ordonez V, Kaiser G, Ray B (2020b) Testing dnn image classifiers for confusion & bias errors. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA, pp 1122–1134. https://​doi.​org/​10.​1145/​3377811.​3380400
Zurück zum Zitat Tramèr F, Atlidakis V, Geambasu R, Hsu D, Hubaux J, Humbert M, Juels A, Lin H (2017) Fairtest: Discovering unwarranted associations in data-driven applications. In: 2017 IEEE european symposium on security and privacy (EuroS P), pp 401–416. https://doi.org/10.1109/EuroSP.2017.29 Tramèr F, Atlidakis V, Geambasu R, Hsu D, Hubaux J, Humbert M, Juels A, Lin H (2017) Fairtest: Discovering unwarranted associations in data-driven applications. In: 2017 IEEE european symposium on security and privacy (EuroS P), pp 401–416. https://​doi.​org/​10.​1109/​EuroSP.​2017.​29
Zurück zum Zitat Wang S, Su Z (2020) Metamorphic object insertion for testing object detection systems. In: Proceedings of the 35th ACM/IEEE international conference on automated software engineering, ASE 2020. ACM, New York, NY, USA, pp 1053–1065. https://doi.org/10.1145/3324884.3416584 Wang S, Su Z (2020) Metamorphic object insertion for testing object detection systems. In: Proceedings of the 35th ACM/IEEE international conference on automated software engineering, ASE 2020. ACM, New York, NY, USA, pp 1053–1065. https://​doi.​org/​10.​1145/​3324884.​3416584
Zurück zum Zitat Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRef Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83CrossRef
Zurück zum Zitat Xie X, Ma L, Juefei-Xu F, Xue M, Chen H, Liu Y, Zhao J, Li B, Yin J, See S (2019a) Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In: Møller A, Zhang D (eds) Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2019, July 15-19, 2019. ACM, Beijing, China, pp 146–157. https://doi.org/10.1145/3293882.3330579 Xie X, Ma L, Juefei-Xu F, Xue M, Chen H, Liu Y, Zhao J, Li B, Yin J, See S (2019a) Deephunter: a coverage-guided fuzz testing framework for deep neural networks. In: Møller A, Zhang D (eds) Proceedings of the 28th ACM SIGSOFT international symposium on software testing and analysis, ISSTA 2019, July 15-19, 2019. ACM, Beijing, China, pp 146–157. https://​doi.​org/​10.​1145/​3293882.​3330579
Zurück zum Zitat Xie X, Ma L, Wang H, Li Y, Liu Y, Li X (2019b) Diffchaser: Detecting disagreements for deep neural networks. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19, International joint conferences on artificial intelligence organization, pp 5772–5778. https://doi.org/10.24963/ijcai.2019/800 Xie X, Ma L, Wang H, Li Y, Liu Y, Li X (2019b) Diffchaser: Detecting disagreements for deep neural networks. In: Proceedings of the twenty-eighth international joint conference on artificial intelligence, IJCAI-19, International joint conferences on artificial intelligence organization, pp 5772–5778. https://​doi.​org/​10.​24963/​ijcai.​2019/​800
Zurück zum Zitat Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, June 18-22, 2018. IEEE Computer Society, Salt Lake City, UT, USA, pp 5505–5514. https://doi.org/10.1109/CVPR.2018.00577 Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, June 18-22, 2018. IEEE Computer Society, Salt Lake City, UT, USA, pp 5505–5514. https://​doi.​org/​10.​1109/​CVPR.​2018.​00577
Zurück zum Zitat Zhang M, Zhang Y, Zhang L, Liu C, Khurshid S (2018) Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018. ACM, New York, NY, USA, pp 132–142. https://doi.org/10.1145/3238147.3238187 Zhang M, Zhang Y, Zhang L, Liu C, Khurshid S (2018) Deeproad: Gan-based metamorphic testing and input validation framework for autonomous driving systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018. ACM, New York, NY, USA, pp 132–142. https://​doi.​org/​10.​1145/​3238147.​3238187
Zurück zum Zitat Zhang P, Wang J, Sun J, Dong G, Wang X, Wang X, Dong JS, Ting D (2020a) White-box fairness testing through adversarial sampling. In: Proceedings of the 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA Zhang P, Wang J, Sun J, Dong G, Wang X, Wang X, Dong JS, Ting D (2020a) White-box fairness testing through adversarial sampling. In: Proceedings of the 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA
Zurück zum Zitat Zhang X, Xie X, Ma L, Du X, Hu Q, Liu Y, Zhao J, Sun M (2020b) Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA, pp 739–751. https://doi.org/10.1145/3377811.3380368 Zhang X, Xie X, Ma L, Du X, Hu Q, Liu Y, Zhao J, Sun M (2020b) Towards characterizing adversarial defects of deep learning software from the lens of uncertainty. In: Proceedings of the ACM/IEEE 42nd international conference on software engineering, association for computing machinery, ICSE ’20, New York, NY, USA, pp 739–751. https://​doi.​org/​10.​1145/​3377811.​3380368
Zurück zum Zitat Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2017) Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2941–2951. https://www.aclweb.org/anthology/D17-1319 Zhao J, Wang T, Yatskar M, Ordonez V, Chang KW (2017) Men also like shopping: Reducing gender bias amplification using corpus-level constraints. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 2941–2951. https://​www.​aclweb.​org/​anthology/​D17-1319
Metadaten
Titel
To what extent do DNN-based image classification models make unreliable inferences?
verfasst von
Yongqiang Tian
Shiqing Ma
Ming Wen
Yepang Liu
Shing-Chi Cheung
Xiangyu Zhang
Publikationsdatum
01.09.2021
Verlag
Springer US
Erschienen in
Empirical Software Engineering / Ausgabe 5/2021
Print ISSN: 1382-3256
Elektronische ISSN: 1573-7616
DOI
https://doi.org/10.1007/s10664-021-09985-1

Weitere Artikel der Ausgabe 5/2021

Empirical Software Engineering 5/2021 Zur Ausgabe

Premium Partner