skip to main content
research-article
Open Access

Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges

Authors Info & Claims
Published:25 May 2021Publication History
Skip Abstract Section

Abstract

Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic, and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence, and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our article provides a comprehensive survey of the state of the art in the assurance of ML, i.e., in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e., of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The article begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research.

Skip Supplemental Material Section

Supplemental Material

References

  1. Mahdieh Abbasi, Arezoo Rajabi, Azadeh Sadat Mozafari, Rakesh B. Bobba, and Christian Gagne. 2018. Controlling over-generalization and its effect on adversarial examples generation and detection. arXiv:1808.08282. Retrieved from https://arxiv.org/abs/1808.08282.Google ScholarGoogle Scholar
  2. Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138--52160.Google ScholarGoogle ScholarCross RefCross Ref
  3. Ajaya Adhikari, D. M. Tax, Riccardo Satta, and Matthias Fath. 2018. Example and Feature importance-based Explanations for Black-box Machine Learning Models. arXiv:1812.09044. Retrieved from https://arxiv.org/abs/1812.09044.Google ScholarGoogle Scholar
  4. Rocío Alaiz-Rodríguez and Nathalie Japkowicz. 2008. Assessing the impact of changing environments on classifier performance. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 13--24.Google ScholarGoogle ScholarCross RefCross Ref
  5. Rob Alexander, Heather Rebecca Hawkins, and Andrew John Rae. 2015. Situation Coverage—A Coverage Criterion for Testing Autonomous Robots. Technical Report YCS-2015-496. Department of Computer Science, University of York.Google ScholarGoogle Scholar
  6. Hassan Abu Alhaija, Siva Karthik Mustikovela, Lars Mescheder, Andreas Geiger, and Carsten Rother. 2018. Augmented reality meets computer vision: Efficient data generation for urban driving scenes. Int. J. Comput. Vis. 126, 9 (2018), 961--972.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Maksym Andriushchenko and Matthias Hein. 2019. Provably robust boosted decision stumps and trees against adversarial attacks. In Advances in Neural Information Processing Systems. 13017--13028.Google ScholarGoogle Scholar
  8. D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. 2012. Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In Proceedings of the International Workshop on Ambient Assisted Living. 216--223.Google ScholarGoogle Scholar
  9. Adina Aniculaesei, Daniel Arnsberger, Falk Howar, and Andreas Rausch. 2016. Towards the verification of safety-critical autonomous systems in dynamic environments. In Proceedings of the Workshop on Verification and Validation of Cyber-Physical Systems (V2CPS@IFM’16). 79--90.Google ScholarGoogle ScholarCross RefCross Ref
  10. Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv:1711.04340. Retrieved from https://arxiv.org/abs/1711.04340.Google ScholarGoogle Scholar
  11. Maziar Arjomandi, Shane Agostino, Matthew Mammone, Matthieu Nelson, and Tong Zhou. 2006. Classification of Unmanned Aerial Vehicles. Report for Mechanical Engineering Class. Technical Report. University of Adelaide, Australia.Google ScholarGoogle Scholar
  12. Rob Ashmore and Matthew Hill. 2018. Boxing clever: Practical techniques for gaining insights into training data and monitoring distribution shift. In Proceedings of the International Conference on Computer Safety, Reliability, and Security. Springer, 393--405.Google ScholarGoogle ScholarCross RefCross Ref
  13. Rob Ashmore and Elizabeth Lennon. 2017. Progress towards the assurance of non-traditional software. In Developments in System Safety Engineering, Proceedings of the 25th Safety-Critical Systems Symposium. 33--48.Google ScholarGoogle Scholar
  14. Rob Ashmore and Bhopinder Madahar. 2019. Rethinking diversity in the context of autonomous systems. In Engineering Safe Autonomy, Proceedings of the 27th Safety-Critical Systems Symposium. 175--192.Google ScholarGoogle Scholar
  15. Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, and Animashree Anandkumar. 2019. Regularized learning for domain adaptation under label shifts. arXiv:1903.09734. Retrieved from https://arxiv.org/abs/1903.09734.Google ScholarGoogle Scholar
  16. R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilović, S. Nagar, K. N. Ramamurthy, J. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney, and Y. Zhang. 2019. AI fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63, 4/5 (2019), 4:1--4:15.Google ScholarGoogle ScholarCross RefCross Ref
  17. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb.2012), 281--305.Google ScholarGoogle Scholar
  18. Steffen Bickel, Michael Brückner, and Tobias Scheffer. 2009. Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 9 (2009), 2137--2155.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Arijit Bishnu, Sameer Desai, Arijit Ghosh, Mayank Goswami, and Paul Subhabrata. 2015. Uniformity of point samples in metric spaces using gap ratio. In Proceedings of the 12th Annual Conference on Theory and Applications of Models of Computation. 347--358.Google ScholarGoogle ScholarCross RefCross Ref
  20. Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Robin Bloomfield and Peter Bishop. 2010. Safety and assurance cases: Past, present and possible future—An Adelard perspective. In Making Systems Safer. Springer, 51--67.Google ScholarGoogle Scholar
  22. Barry Boehm and Wilfred J. Hansen. 2000. Spiral Development: Experience, Principles, and Refinements. Technical Report CMU/SEI-2000-SR-008. Carnegie Mellon University.Google ScholarGoogle Scholar
  23. Chris Bogdiukiewicz, Michael Butler, Thai Son Hoang, Martin Paxton, James Snook, Xanthippe Waldron, and Toby Wilkinson. 2017. Formal development of policing functions for intelligent systems. In Proceedings of the 28th International Symposium on Software Reliability Engineering. IEEE, 194--204.Google ScholarGoogle ScholarCross RefCross Ref
  24. Andrew P. Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 7 (1997), 1145--1159.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Houssem Ben Braiek and Foutse Khomh. 2018. On Testing Machine Learning Programs. arXiv:1812.02257. Retrieved from https://arxiv.org/abs/1812.02257.Google ScholarGoogle Scholar
  26. Carla E. Brodley and Mark A. Friedl. 1999. Identifying mislabeled training data. J. Artif. Intell. Res. 11 (1999), 131--167.Google ScholarGoogle ScholarCross RefCross Ref
  27. Atilla Bulmus, Axel Freiwald, and Chris Wunderlich. 2017. Over the Air Software Update Realization within Generic Modules with Microcontrollers Using External Serial FLASH. Technical Report. SAE Technical Paper.Google ScholarGoogle Scholar
  28. Jonathod Byrd and Zachary Lipton. 2019. What is the effect of importance weighting in deep learning? arXiv:1812.03372. Retrieved from https://arxiv.org/abs/1812.03372.Google ScholarGoogle Scholar
  29. Radu Calinescu, Danny Weyns, Simos Gerasimou, Muhammad Usman Iftikhar, Ibrahim Habli, and Tim Kelly. 2018. Engineering trustworthy self-adaptive software with dynamic assurance cases. IEEE Trans. Softw. Eng. 44, 11 (2018), 1039--1069.Google ScholarGoogle ScholarCross RefCross Ref
  30. Cristian S. Calude and Giuseppe Longo. 2017. The deluge of spurious correlations in big data. Found. Sci. 22, 3 (2017), 595--612.Google ScholarGoogle ScholarCross RefCross Ref
  31. Richard Carlsson, Björn Gustavsson, Erik Johansson, Thomas Lindgren, Sven-Olof Nyström, Mikael Pettersson, and Robert Virding. 2000. Core Erlang 1.0 Language Specification. Technical Report. Information Technology Department, Uppsala University.Google ScholarGoogle Scholar
  32. Paul Caseley. 2016. Claims and architectures to rationate on automatic and autonomous functions. In Proceedings of the 11th International Conference on System Safety and Cyber-Security. IET, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  33. Nitesh V. Chawla, Aleksandar Lazarevic, Lawrence O. Hall, and Kevin W. Bowyer. 2003. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. 107--119.Google ScholarGoogle Scholar
  34. Liming Chen and Algirdas Avizienis. 1978. N-version programming: A fault-tolerance approach to reliability of software operation. In Proceedings of the 8th IEEE International Symposium on Fault-Tolerant Computing, Vol. 1. 3--9.Google ScholarGoogle Scholar
  35. Xinyun Chen, Chang Liu, Bo Li, Kimberley Lu, and Dawn Song. 2017. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv:1712.05526. Retrieved from https://arxiv.org/abs/1712.05526.Google ScholarGoogle Scholar
  36. Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Zakaria Anil, Rohan an Haque, Lichan Hong, Vihan Jain, Xiabing Liu, and Hemal Shah. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. 2018. Back to basics: Benchmarking canonical evolution strategies for playing Atari. arXiv:1802.08842. Retrieved from https://arxiv.org/abs/1802.08842.Google ScholarGoogle Scholar
  38. David A. Cieslak and Nitesh V. Chawla. 2009. A framework for monitoring classifiers performance: When and why failure occurs? Knowl. Inf. Syst. 18, 1 (2009), 83--108.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Adnan Darwiche. 2018. Human-level intelligence or animal-like abilities? Comm. ACM 61, 10 (2018), 56--67.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarGoogle ScholarCross RefCross Ref
  41. Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. 2017. Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28, 3 (2017), 653--664.Google ScholarGoogle ScholarCross RefCross Ref
  42. Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv:1702.08608. Retrieved from https://arxiv.org/abs/1702.08608.Google ScholarGoogle Scholar
  43. Tommaso Dreossi, Daniel J. Fremont, Shromona Ghosh, Edward Kim, Hadi Ravanbakhsh, Marcell Vazquez-Chanlatte, and Sanjit A Seshia. 2019. VERIFAI: A toolkit for the design and analysis of artificial intelligence-based systems. arXiv:1902.04245. Retrieved from https://arxiv.org/abs/1902.04245.Google ScholarGoogle Scholar
  44. Tommaso Dreossi, Shromona Ghosh, Xiangyu Yue, Kurt Keutzer, Alberto Sangiovanni-Vincentelli, and Sanjit A Seshia. 2018. Counterexample-guided data augmentation. arXiv:1805.06962. Retrieved from https://arxiv.org/abs/1805.06962.Google ScholarGoogle Scholar
  45. Tommaso Dreossi, Somesh Jha, and Sanjit A. Seshia. 2018. Semantic adversarial deep learning. arXiv:1804.07045. Retrieved from https://arxiv.org/abs/1804.07045.Google ScholarGoogle Scholar
  46. Chris Drummond and Robert C. Holte. 2006. Cost curves: An improved method for visualizing classifier performance. Mach. Learn. 65, 1 (2006), 95--130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Souradeep Dutta, Xin Chen, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. 2019. Sherlock-A tool for verification of neural network feedback systems: Demo abstract. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control. 262--263.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Ruediger Ehlers. 2017. Formal verification of piece-wise linear feed-forward neural networks. In Proceedings of the International Symposium on Automated Technology for Verification and Analysis. Springer, 269--286.Google ScholarGoogle ScholarCross RefCross Ref
  49. Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. 2018. Adversarial vulnerability for any classifier. arXiv:1802.08686. Retrieved from https://arxiv.org/abs/1802.08686.Google ScholarGoogle Scholar
  50. Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2015. Fundamental limits on adversarial robustness. In Proceedings of the ICML Workshop on Deep Learning.Google ScholarGoogle Scholar
  51. Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. 2016. Robustness of classifiers: From adversarial to random noise. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 1632--1640.Google ScholarGoogle Scholar
  52. Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259--268.Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Angelo Ferrando, Louise A. Dennis, Davide Ancona, Michael Fisher, and Viviana Mascardi. 2018. Verifying and validating autonomous systems: Towards an integrated approach. In Proceedings of the International Conference on Runtime Verification. Springer, 263--281.Google ScholarGoogle ScholarCross RefCross Ref
  54. Peter Flach. 2019. Performance evaluation in machine learning: The good, the bad, the ugly and the way forward. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 9808--9814.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Michael Forsting. 2017. Machine learning will change medicine. J. Nucl. Med. 58, 3 (2017), 357--358.Google ScholarGoogle ScholarCross RefCross Ref
  56. Yoav Freund, Robert Schapire, and Naoki Abe. 1999. A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14, 771--780 (1999), 1612.Google ScholarGoogle Scholar
  57. Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1 (1997), 119--139.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. 2018. AI2: Safety and robustness certification of neural networks with abstract interpretation. In Proceedings of the 2018 IEEE Symposium on Security and Privacy. IEEE, 3--18.Google ScholarGoogle ScholarCross RefCross Ref
  59. Aurélien Géron. 2017. Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc.Google ScholarGoogle Scholar
  60. Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. Vol. 1. MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https://arxiv.org/abs/1412.6572.Google ScholarGoogle Scholar
  62. Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv:1708.06733. Retrieved from https://arxiv.org/abs/1708.06733.Google ScholarGoogle Scholar
  63. Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73 (2017), 220--239.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Jeff Heaton. 2016. An empirical analysis of feature engineering for predictive modeling. In Proceedings of SoutheastCon’16. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  65. Constance L. Heitmeyer, Ralph D. Jeffords, and Bruce G. Labaw. 1996. Automated consistency checking of requirements specifications. ACM Trans. Softw. Eng. Methodol. 5, 3 (1996), 231--261.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Parker Hill, Babak Zamirai, Shengshuo Lu, Yu-Wei Chao, Michael Laurenzano, Mehrzad Samadi, Marios C. Papaefthymiou, Scott A. Mahlke, Thomas F. Wenisch, Jia Deng, Lingjia Tang, and Jason Mars. [n.d.]. Rethinking Numerical Representations for Deep Neural Networks. arXiv:1808.02513. Retrieved from https://arxiv.org/abs/1808.02513.Google ScholarGoogle Scholar
  67. Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. [n.d.]. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580. Retrieved from https://arxiv.org/abs/1207.0580.Google ScholarGoogle Scholar
  68. Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety verification of deep neural networks. In Proceedings of the 29th International Conference on Computer Aided Verification, Rupak Majumdar and Viktor Kuncak (Eds.), Lecture Notes in Computer Science, Vol. 10426. Springer, 3--29.Google ScholarGoogle Scholar
  69. Zhongling Huang, Zongxu Pan, and Bin Lei. 2017. Transfer learning with deep convolutional neural network for SAR target classification with limited labeled data. Remote Sens. 9, 9 (2017), 907.Google ScholarGoogle ScholarCross RefCross Ref
  70. Casidhe Hutchison, Milda Zizyte, Patrick E. Lanigan, David Guttendorf, Michael Wagner, Claire Le Goues, and Philip Koopman. 2018. Robustness testing of autonomy software. In Proceedings of the 40th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice. 276--285.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Frank Hutter, Jörg Lücke, and Lars Schmidt-Thieme. 2015. Beyond manual tuning of hyperparameters. Künstl. Intell. 29, 4 (2015), 329--337.Google ScholarGoogle ScholarCross RefCross Ref
  72. Didac Gil De La Iglesia and Danny Weyns. 2015. MAPE-K formal templates to rigorously design behaviors for self-adaptive systems. ACM Trans. Auton. Adapt. Syst. 10, 3 (2015), 15.Google ScholarGoogle Scholar
  73. Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167. Retrieved from https://arxiv.org/abs/1502.03167.Google ScholarGoogle Scholar
  74. Bandar Seri Iskandar. 2017. Terrorism detection based on sentiment analysis using machine learning. J. Eng. Appl. Sci. 12, 3 (2017), 691--698.Google ScholarGoogle Scholar
  75. ISO. 2018. Road Vehicles—Functional Safety: Part 6. Technical Report BS ISO 26262-6:2018. ISO.Google ScholarGoogle Scholar
  76. Nathalie Japkowicz. 2001. Concept-learning in the presence of between-class and within-class imbalances. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 67--77.Google ScholarGoogle ScholarCross RefCross Ref
  77. Nikita Johnson and Tim Kelly. 2019. Devil’s in the detail: Through-life safety and security co-assurance using SSAF. In Proceedings of the 38th International Conference on Computer Safety, Reliability, and Security. Springer, 299--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Taylor T. Johnson, Stanley Bak, Marco Caccamo, and Lui Sha. 2016. Real-time reachability for verified Simplex design. ACM Trans. Embed. Comput. Syst. 15, 2 (2016), 1--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. M. H. Kabir, M. R. Hoque, H. Seo, and S. H. Yang. 2015. Machine learning based adaptive context-aware system for smart home environment. Int. J. Smart Home 9, 11 (2015), 55--62.Google ScholarGoogle ScholarCross RefCross Ref
  80. Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1 (2012), 1--33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An efficient SMT solver for verifying deep neural networks. In Proceedings of the International Conference on Computer Aided Verification. Springer, 97--117.Google ScholarGoogle Scholar
  82. Guy Katz, Derek A. Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, et al. 2019. The marabou framework for verification and analysis of deep neural networks. In Proceedings of the International Conference on Computer Aided Verification. Springer, 443--452.Google ScholarGoogle ScholarCross RefCross Ref
  83. Shachar Kaufman, Saharon Rosset, Claudia Perlich, and Ori Stitelman. 2012. Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6, 4 (2012), 15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Jeffrey O. Kephart and David M. Chess. 2003. The vision of autonomic computing. Computer 36, 1 (2003), 41--50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  85. Muhammad Taimoor Khan, Dimitrios Serpanos, and Howard Shrobe. 2016. A rigorous and efficient run-time security monitor for real-time critical embedded system applications. In Proceedings of the 3rd World Forum on Internet of Things. IEEE, 100--105.Google ScholarGoogle ScholarCross RefCross Ref
  86. Udayan Khurana, Horst Samulowitz, and Deepak Turaga. 2018. Feature engineering for predictive modeling using reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 3407--3414.Google ScholarGoogle Scholar
  87. Roger E. Kirk. 2007. Experimental design. Wiley Online Library.Google ScholarGoogle Scholar
  88. Tom Ko, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. 2015. Audio augmentation for speech recognition. In Proceedings of the 16th Annual Conference of the International Speech Communication Association.Google ScholarGoogle ScholarCross RefCross Ref
  89. Patrick Koch, Brett Wujek, Oleg Golovidov, and Steven Gardner. 2017. Automated hyperparameter tuning for effective machine learning. In Proceedings of the SAS Global Forum Conference.Google ScholarGoogle Scholar
  90. Matthieu Komorowski, Leo A. Celi, Omar Badawi, Anthony C. Gordon, and A. Aldo Faisal. 2018. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 11 (2018), 1716--1720.Google ScholarGoogle ScholarCross RefCross Ref
  91. Philip Koopman and Frank Fratrik. 2019. How many operational design domains, objects, and events? In Proceedings of the AAAI Workshop on Artificial Intelligence Safety.Google ScholarGoogle Scholar
  92. Philip Koopman, Aaron Kane, and Jen Black. 2019. Credible autonomy safety argumentation. In Proceedings of the 27th Safety-Critical Systems Symposium.Google ScholarGoogle Scholar
  93. S. B. Kotsiantis, Dimitris Kanellopoulos, and P. E. Pintelas. 2006. Data preprocessing for supervised leaning. Int. J. Comput. Sci. 1, 2 (2006), 111--117.Google ScholarGoogle Scholar
  94. S. B. Kotsiantis, D. Kanellopoulos, and P. E. Pintelas. 2007. Data preprocessing for supervised leaning. Int. J. Comput. Electr. Autom. Contr. Inf. Eng. 1, 12 (2007), 4104--4109.Google ScholarGoogle Scholar
  95. Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9, 1 (2017), 44--55.Google ScholarGoogle ScholarCross RefCross Ref
  96. Isaac Lage, Andrew Ross, Kim Been, Samuel Gershman, and Finale Doshi-Velez. 2018. Human-in-the-loop interpretability prior. In Proceedings of the Conference on Neural Information Processing Systems. 10180--10189.Google ScholarGoogle Scholar
  97. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2323.Google ScholarGoogle ScholarCross RefCross Ref
  98. Joseph Lemley, Filip Jagodzinski, and Razvan Andonie. 2016. Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data. In Proceedings of the IEEE Computer Software and Applications Conference. 563--571.Google ScholarGoogle ScholarCross RefCross Ref
  99. Zachary C. Lipton. 2016. The mythos of model interpretability. arXiv:1606.03490. Retrieved from https://arxiv.org/abs/1606.03490.Google ScholarGoogle Scholar
  100. Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. 2019. ABS: Scanning neural networks for back-doors by artificial brain stimulation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1265--1282.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Victoria López, Alberto Fernández, Salvador García, Vasile Palade, and Francisco Herrera. 2013. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250 (2013), 113--141.Google ScholarGoogle ScholarCross RefCross Ref
  102. Gustavo A. Lujan-Moreno, Phillip R. Howard, Omar G. Rojas, and Douglas C. Montgomery. 2018. Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study. Expert Syst. Appl. 109 (2018), 195--205.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. DeepCT: Tomographic combinatorial testing for deep learning systems. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 614--618.Google ScholarGoogle ScholarCross RefCross Ref
  104. Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 120--131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Mathilde Machin, Jérémie Guiochet, Hélène Waeselynck, Jean-Paul Blanquart, Matthieu Roy, and Lola Masson. 2018. SMOF: A safety monitoring framework for autonomous systems. IEEE Trans. Syst. Man Cybernet. Syst. 48, 5 (2018), 702--715.Google ScholarGoogle ScholarCross RefCross Ref
  106. Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5188--5196.Google ScholarGoogle ScholarCross RefCross Ref
  107. Spyros Makridakis. 2017. The forthcoming Artificial Intelligence (AI) revolution: Its impact on society and firms. Futures 90 (2017), 46--60.Google ScholarGoogle ScholarCross RefCross Ref
  108. Pedro Marcelino. 2018. Transfer learning from pre-trained models. In Towards Data Science (2018).Google ScholarGoogle Scholar
  109. George Mason, Radu Calinescu, Daniel Kudenko, and Alec Banks. 2017. Assured reinforcement learning with formally verified abstract policies. In Proceedings of the 9th International Conference on Agents and Artificial Intelligence. 105--117.Google ScholarGoogle ScholarCross RefCross Ref
  110. Michael Maurer, Ivan Breskovic, Vincent C. Emeakaroha, and Ivona Brandic. 2011. Revealing the MAPE loop for the autonomic management of cloud infrastructures. In Proceedings of the Symposium on Computers and Communications. IEEE, 147--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  111. Markus Maurer, J. Christian Gerdes, Barbara Lenz, and Hermann Winner. 2016. Autonomous Driving: Technical, Legal and Social Aspects. Springer Nature.Google ScholarGoogle Scholar
  112. Christopher Meyer and Jörg Schwenk. 2013. SoK: Lessons learned from SSL/TLS attacks. In Proceedings of the International Workshop on Information Security Applications. Springer, 189--209.Google ScholarGoogle Scholar
  113. Microsoft. 2019. How to choose algorithms for Azure Machine Learning Studio. Retrieved February 2019 from https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-choice.Google ScholarGoogle Scholar
  114. Tom M. Mitchell. 1997. Machine Learning. McGraw–Hill.Google ScholarGoogle Scholar
  115. Model Zoos Caffe 2019. Caffe Model Zoo. Retrieved March 2019 from http://caffe.berkeleyvision.org/model_zoo.html.Google ScholarGoogle Scholar
  116. Model Zoos Github 2019. Model Zoos of machine and deep learning technologies. Retrieved March 2019 from https://github.com/collections/ai-model-zoos.Google ScholarGoogle Scholar
  117. Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1765--1773.Google ScholarGoogle ScholarCross RefCross Ref
  118. Jose G. Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recogn. 45, 1 (2012), 521--530.Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Pamela A. Munro and Barbara G. Kanki. 2003. An analysis of ASRS maintenance reports on the use of minimum equipment lists. In Proceedings of the 12th International Symposium on Aviation Psychology.Google ScholarGoogle Scholar
  120. Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Partha Niyogi and Federico Girosi. 1996. On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Comput. 8, 4 (1996), 819--842.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Object Management Group. 2018. Structured Assurance Case Metamodel (SACM). Version 2.0.Google ScholarGoogle Scholar
  123. Augustus Odena and Ian Goodfellow. 2018. TensorFuzz: Debugging neural networks with coverage-guided fuzzing. arXiv:1807.10875. Retrived from https://arxiv.org/abs/1807.10875.Google ScholarGoogle Scholar
  124. Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1717--1724.Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 506--519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  126. Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1--18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  127. Teresa Placho, Christoph Schmittner, Arndt Bonitz, and Oliver Wana. 2020. Management of automotive software updates. Microprocess. Microsystems. 78 (2020), 103257.Google ScholarGoogle ScholarCross RefCross Ref
  128. Michael J. Pont and Royan H. L. Ong. 2002. Using watchdog timers to improve the reliability of single-processor embedded systems: Seven new patterns and a case study. In Proceedings of the 1st Nordic Conference on Pattern Languages of Programs.Google ScholarGoogle Scholar
  129. Lutz Prechelt. 1998. Early stopping-but when? In Neural Networks: Tricks of the Trade. Springer, 55--69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Philipp Probst, Bernd Bischl, and Anne-Laure Boulesteix. 2018. Tunability: Importance of hyperparameters of machine learning algorithms. arXiv:1802.09596. Retrieved from https://arxiv.org/abs/1802.09596.Google ScholarGoogle Scholar
  131. Foster Provost and Tom Fawcett. 2001. Robust classification for imprecise environments. Mach. Learn. 42, 3 (2001), 203--231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. J. Provost Foster, Fawcett Tom, and Kohavi Ron. 1998. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th International Conference on Machine Learning. 445--453.Google ScholarGoogle Scholar
  133. R-Bloggers Data Analysis 2019. How to Use Data Analysis for Machine Learning. Retrieved February 2019 from https://www.r-bloggers.com/how-to-use-data-analysis-for-machine-learning-example-part-1.Google ScholarGoogle Scholar
  134. Stephan Rabanser, Stephan Günnemann, and Zachary C. Lipton. 2019. Failing loudly: An empirical study of methods for detecting dataset shift. Advances in Neural Information Processing Systems 32 (2019).Google ScholarGoogle Scholar
  135. Jan Ramon, Kurt Driessens, and Tom Croonenborghs. 2007. Transfer learning in reinforcement learning problems through partial policy recycling. In Proceedings of the European Conference on Machine Learning. Springer, 699--707.Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Francesco Ranzato and Marco Zanella. 2019. Robustness verification of support vector machines. In Proceedings of the International Static Analysis Symposium. Springer, 271--295.Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Jorge-L. Reyes-Ortiz, Luca Oneto, Albert Samà, Xavier Parra, and Davide Anguita. 2016. Transition-aware human activity recognition using smartphones. Neurocomputing 171 (2016), 754--767.Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. F. Ricci, L. Rokach, and B. Shapira. 2015. Recommender systems: Introduction and challenges. Recommender Systems Handbook (2015), 1--34.Google ScholarGoogle Scholar
  140. German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M. Lopez. 2016. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3234--3243.Google ScholarGoogle Scholar
  141. Andrew Slavin Ross and Finale Doshi-Velez. 2018. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 1660--1669.Google ScholarGoogle Scholar
  142. Saharon Rosset, Claudia Perlich, Grzergorz Świrszcz, Prem Melville, and Yan Liu. 2010. Medical data mining: Insights from winning two competitions. Data Min. Knowl. Discov. 20, 3 (2010), 439--468.Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. RTCA. 2011. Software Considerations in Airborne Systems and Equipment Certification. Technical Report DO-178C.Google ScholarGoogle Scholar
  144. Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 5 (2019), 206--215.Google ScholarGoogle ScholarCross RefCross Ref
  145. Stuart J. Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Pearson Education Limited.Google ScholarGoogle Scholar
  146. Jerome Sacks, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. 1989. Design and analysis of computer experiments. Stat. Sci. (1989), 409--423.Google ScholarGoogle Scholar
  147. Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Data Min. Knowl. Discov. 8, 4 (2018), e1249.Google ScholarGoogle Scholar
  148. Ahmed Salem, Michael Backes, and Yang Zhang. 2020. Don’t Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks. arXiv:2010.03282. Retrieved from https://arxiv.org/abs/2010.03282.Google ScholarGoogle Scholar
  149. Robert G. Sargent. 2009. Verification and validation of simulation models. In Proceedings of the Winter Simulation Conference. 162--176.Google ScholarGoogle ScholarCross RefCross Ref
  150. Lawrence K. Saul and Sam T. Roweis. 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4(Jun.2003), 119--155.Google ScholarGoogle Scholar
  151. Christoph Schorn, Andre Guntoro, and Gerd Ascheid. 2018. Efficient on-line error detection and mitigation for deep neural network accelerators. In Proceedings of the International Conference on Computer Safety, Reliability, and Security. Springer, 205--219.Google ScholarGoogle ScholarCross RefCross Ref
  152. Scikit-Taxonomy 2019. Scikit—Choosing the right estimator. Retrieved February 2019 from https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html.Google ScholarGoogle Scholar
  153. Noam Segev, Maayan Harel, Shie Mannor, Koby Crammer, and Ran El-Yaniv. 2017. Learn on source, refine on target: A model transfer learning framework with random forests. IEEE Trans. Pattern Anal. Mach. Intell. 39, 9 (2017), 1811--1824.Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. Daniel Selsam, Percy Liang, and David L. Dill. 2017. Developing bug-free machine learning systems with formal mathematics. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 3047--3056.Google ScholarGoogle Scholar
  155. Victor S. Sheng and Jing Zhang. 2019. Machine learning with crowdsourcing: A brief summary of the past research and future directions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9837--9843.Google ScholarGoogle Scholar
  156. Andy Shih, Arthur Choi, and Adnan Darwiche. 2018. Formal verification of Bayesian network classifiers. In Proceedings of the International Conference on Probabilistic Graphical Models. 427--438.Google ScholarGoogle Scholar
  157. Padhraic Smyth. 1996. Bounds on the mean classification error rate of multiple experts. Pattern Recogn. Lett. 17, 12 (1996), 1253--1257.Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 4 (2009), 427--437.Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929--1958.Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Sanatan Sukhija, Narayanan C. Krishnan, and Deepak Kumar. 2018. Supervised heterogeneous transfer learning using random forests. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data. ACM, 157--166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 109--119.Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv:1312.6199. Retrieved from https://arxiv.org/abs/1312.6199.Google ScholarGoogle Scholar
  163. A. Taber and E. Normand. 1993. Single event upset in avionics. IEEE Trans. Nucl. Sci. 40, 2 (1993), 120--126.Google ScholarGoogle ScholarCross RefCross Ref
  164. Mariarosaria Taddeo, Tom McCutcheon, and Luciano Floridi. 2019. Trusting artificial intelligence in cybersecurity is a double-edged sword. Nat. Mach. Intell. (2019), 557--560.Google ScholarGoogle Scholar
  165. Luke Taylor and Geoff Nitschke. 2017. Improving deep learning using generic data augmentation. arXiv:1708.06020. Retrieved from https://arxiv.org/abs/1708.06020.Google ScholarGoogle Scholar
  166. Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 847--855.Google ScholarGoogle ScholarDigital LibraryDigital Library
  167. Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303--314.Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. John Törnblom and Simin Nadjm-Tehrani. 2018. Formal verification of random forests in safety-critical applications. In Proceedings of the International Workshop on Formal Techniques for Safety-Critical Systems. Springer, 55--71.Google ScholarGoogle Scholar
  169. Hoang-Dung Tran, Stanley Bak, Weiming Xiang, and Taylor T. Johnson. 2020. Verification of deep convolutional neural networks using ImageStars. arXiv:2004.05511. Retrieved from https://arxiv.org/abs/2004.05511.Google ScholarGoogle Scholar
  170. Hoang-Dung Tran, Xiaodong Yang, Diego Manzanas Lopez, Patrick Musau, Luan Viet Nguyen, Weiming Xiang, Stanley Bak, and Taylor T. Johnson. 2020. NNV: The neural network verification tool for deep neural networks and learning-enabled cyber-physical systems. arXiv:2004.05519. Retrieved from https://arxiv.org/abs/2004.05519.Google ScholarGoogle Scholar
  171. John W. Tukey. 1977. Exploratory Data Analysis. Vol. 2. Reading, MA.Google ScholarGoogle Scholar
  172. Jasper van der Waa, Jurriaan van Diggelen, Mark A Neerincx, and Stephan Raaijmakers. 2018. ICM: An intuitive model independent and accurate certainty measure for machine learning.. In Proceedings of the International Conference on Agents and Artificial Intelligence (ICAART’18). 314--321.Google ScholarGoogle ScholarCross RefCross Ref
  173. Perry Van Wesel and Alwyn E. Goodloe. 2017. Challenges in the verification of reinforcement learning algorithms. (2017).Google ScholarGoogle Scholar
  174. Kiri Wagstaff. 2012. Machine learning that matters. arXiv:1206.4656. Retrieved from https://arxiv.org/abs/1206.4656.Google ScholarGoogle Scholar
  175. Kiri L. Wagstaff and Benjamin Bornstein. 2009. K-means in space: A radiation sensitivity evaluation. In Proceedings of the 26th Annual International Conference on Machine Learning. 1097--1104.Google ScholarGoogle Scholar
  176. Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. 2013. Regularization of neural networks using dropconnect. In Proceedings of the International Conference on Machine Learning. 1058--1066.Google ScholarGoogle Scholar
  177. Binghui Wang and Neil Zhenqiang Gong. 2018. Stealing hyperparameters in machine learning. In Proceedings of the 2018 IEEE Symposium on Security and Privacy. IEEE, 36--52.Google ScholarGoogle ScholarCross RefCross Ref
  178. Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proceedings of the 2019 IEEE Symposium on Security and Privacy. IEEE, 707--723.Google ScholarGoogle ScholarCross RefCross Ref
  179. Ke Wang, Senqiang Zhou, Chee Ada Fu, and Jeffrey Xu Yu. 2003. Mining changes of classification by correspondence tracing. In Proceedings of the 2003 SIAM International Conference on Data Mining. SIAM, 95--106.Google ScholarGoogle ScholarCross RefCross Ref
  180. Lu Wang, Xuanqing Liu, Jinfeng Yi, Zhi-Hua Zhou, and Cho-Jui Hsieh. 2019. Evaluating the robustness of nearest neighbor classifiers: A primal-dual perspective. arXiv:1906.03972. Retrieved from https://arxiv.org/abs/1906.03972.Google ScholarGoogle Scholar
  181. Yihan Wang, Huan Zhang, Hongge Chen, Duane Boning, and Cho-Jui Hsieh. 2020. On -norm robustness of ensemble stumps and trees. arXiv:2008.08755. Retrieved from https://arxiv.org/abs/2008.08755.Google ScholarGoogle Scholar
  182. Gary M. Weiss. 2004. Mining with rarity: A unifying framework. ACM SIGKDD Expl. Newslett. 6, 1 (2004), 7--19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  183. Karl Weiss, Taghi M. Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. J. Big Data 3, 1 (2016), 9.Google ScholarGoogle ScholarCross RefCross Ref
  184. Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Straschulat, and Per Strenström. 2008. The worst-case execution-time problem—Overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7, 3 (2008), 36.Google ScholarGoogle ScholarDigital LibraryDigital Library
  185. Sebastien C. Wong, Adam Gatt, Victor Stamatescu, and Mark D. McDonnell. 2016. Understanding data augmentation for classification: When to warp? In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications. IEEE, 1--6.Google ScholarGoogle Scholar
  186. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V., Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.0814. Retrieved from https://arxiv.org/abs/1609.0814.Google ScholarGoogle Scholar
  187. Steven R. Young, Derek C. Rose, Thomas P. Karnowski, Seung-Hwan Lim, and Robert M Patton. 2015. Optimizing deep learning hyper-parameters through an evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments. ACM, 4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  188. X. Yuan, Y. Chen, Y. Zhao, Y. Long, X. Liu, K. Chen, S. Zhang, H. Huang, X. Wang, and C. A. Gunter. 2018. CommanderSong: A systematic approach for practical adversarial voice recognition. arXiv:1801.08535. Retrieved from https://arxix.org/abs/1801.08535.Google ScholarGoogle Scholar
  189. Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, and Corey Zumar. 2018. Accelerating the machine learning lifecycle with MLflow. Data Eng. 41, 4 (2018), 39--45.Google ScholarGoogle Scholar
  190. Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic autonomous driving system testing. arXiv:1802.02295. Retrieved from https://arxiv.org/abs/1802.02295.Google ScholarGoogle Scholar
  191. Shichao Zhang, Chengqi Zhang, and Qiang Yang. 2003. Data preparation for data mining. Appl. Artif. Intell. 17, 5--6 (2003), 375--381.Google ScholarGoogle ScholarCross RefCross Ref
  192. Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. 2016. Improving the robustness of deep neural networks via stability training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4480--4488.Google ScholarGoogle ScholarCross RefCross Ref
  193. Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2017. Random erasing data augmentation. arXiv:1708.04896. Retrieved from https://arxiv.org/abs/1708.04896.Google ScholarGoogle Scholar

Index Terms

  1. Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 54, Issue 5
          June 2022
          719 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/3467690
          Issue’s Table of Contents

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 May 2021
          • Accepted: 1 February 2021
          • Revised: 1 December 2020
          • Received: 1 May 2019
          Published in csur Volume 54, Issue 5

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format