Abstract
Machine learning has evolved into an enabling technology for a wide range of highly successful applications. The potential for this success to continue and accelerate has placed machine learning (ML) at the top of research, economic, and political agendas. Such unprecedented interest is fuelled by a vision of ML applicability extending to healthcare, transportation, defence, and other domains of great societal importance. Achieving this vision requires the use of ML in safety-critical applications that demand levels of assurance beyond those needed for current ML applications. Our article provides a comprehensive survey of the state of the art in the assurance of ML, i.e., in the generation of evidence that ML is sufficiently safe for its intended use. The survey covers the methods capable of providing such evidence at different stages of the machine learning lifecycle, i.e., of the complex, iterative process that starts with the collection of the data used to train an ML component for a system, and ends with the deployment of that component within the system. The article begins with a systematic presentation of the ML lifecycle and its stages. We then define assurance desiderata for each stage, review existing methods that contribute to achieving these desiderata, and identify open challenges that require further research.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges
- Mahdieh Abbasi, Arezoo Rajabi, Azadeh Sadat Mozafari, Rakesh B. Bobba, and Christian Gagne. 2018. Controlling over-generalization and its effect on adversarial examples generation and detection. arXiv:1808.08282. Retrieved from https://arxiv.org/abs/1808.08282.Google Scholar
- Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: A survey on Explainable Artificial Intelligence (XAI). IEEE Access 6 (2018), 52138--52160.Google ScholarCross Ref
- Ajaya Adhikari, D. M. Tax, Riccardo Satta, and Matthias Fath. 2018. Example and Feature importance-based Explanations for Black-box Machine Learning Models. arXiv:1812.09044. Retrieved from https://arxiv.org/abs/1812.09044.Google Scholar
- Rocío Alaiz-Rodríguez and Nathalie Japkowicz. 2008. Assessing the impact of changing environments on classifier performance. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 13--24.Google ScholarCross Ref
- Rob Alexander, Heather Rebecca Hawkins, and Andrew John Rae. 2015. Situation Coverage—A Coverage Criterion for Testing Autonomous Robots. Technical Report YCS-2015-496. Department of Computer Science, University of York.Google Scholar
- Hassan Abu Alhaija, Siva Karthik Mustikovela, Lars Mescheder, Andreas Geiger, and Carsten Rother. 2018. Augmented reality meets computer vision: Efficient data generation for urban driving scenes. Int. J. Comput. Vis. 126, 9 (2018), 961--972.Google ScholarDigital Library
- Maksym Andriushchenko and Matthias Hein. 2019. Provably robust boosted decision stumps and trees against adversarial attacks. In Advances in Neural Information Processing Systems. 13017--13028.Google Scholar
- D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. 2012. Human activity recognition on smartphones using a multiclass hardware-friendly support vector machine. In Proceedings of the International Workshop on Ambient Assisted Living. 216--223.Google Scholar
- Adina Aniculaesei, Daniel Arnsberger, Falk Howar, and Andreas Rausch. 2016. Towards the verification of safety-critical autonomous systems in dynamic environments. In Proceedings of the Workshop on Verification and Validation of Cyber-Physical Systems (V2CPS@IFM’16). 79--90.Google ScholarCross Ref
- Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv:1711.04340. Retrieved from https://arxiv.org/abs/1711.04340.Google Scholar
- Maziar Arjomandi, Shane Agostino, Matthew Mammone, Matthieu Nelson, and Tong Zhou. 2006. Classification of Unmanned Aerial Vehicles. Report for Mechanical Engineering Class. Technical Report. University of Adelaide, Australia.Google Scholar
- Rob Ashmore and Matthew Hill. 2018. Boxing clever: Practical techniques for gaining insights into training data and monitoring distribution shift. In Proceedings of the International Conference on Computer Safety, Reliability, and Security. Springer, 393--405.Google ScholarCross Ref
- Rob Ashmore and Elizabeth Lennon. 2017. Progress towards the assurance of non-traditional software. In Developments in System Safety Engineering, Proceedings of the 25th Safety-Critical Systems Symposium. 33--48.Google Scholar
- Rob Ashmore and Bhopinder Madahar. 2019. Rethinking diversity in the context of autonomous systems. In Engineering Safe Autonomy, Proceedings of the 27th Safety-Critical Systems Symposium. 175--192.Google Scholar
- Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, and Animashree Anandkumar. 2019. Regularized learning for domain adaptation under label shifts. arXiv:1903.09734. Retrieved from https://arxiv.org/abs/1903.09734.Google Scholar
- R. K. E. Bellamy, K. Dey, M. Hind, S. C. Hoffman, S. Houde, K. Kannan, P. Lohia, J. Martino, S. Mehta, A. Mojsilović, S. Nagar, K. N. Ramamurthy, J. Richards, D. Saha, P. Sattigeri, M. Singh, K. R. Varshney, and Y. Zhang. 2019. AI fairness 360: An extensible toolkit for detecting and mitigating algorithmic bias. IBM J. Res. Dev. 63, 4/5 (2019), 4:1--4:15.Google ScholarCross Ref
- James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb.2012), 281--305.Google Scholar
- Steffen Bickel, Michael Brückner, and Tobias Scheffer. 2009. Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 9 (2009), 2137--2155.Google ScholarDigital Library
- Arijit Bishnu, Sameer Desai, Arijit Ghosh, Mayank Goswami, and Paul Subhabrata. 2015. Uniformity of point samples in metric spaces using gap ratio. In Proceedings of the 12th Annual Conference on Theory and Applications of Models of Computation. 347--358.Google ScholarCross Ref
- Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer.Google ScholarDigital Library
- Robin Bloomfield and Peter Bishop. 2010. Safety and assurance cases: Past, present and possible future—An Adelard perspective. In Making Systems Safer. Springer, 51--67.Google Scholar
- Barry Boehm and Wilfred J. Hansen. 2000. Spiral Development: Experience, Principles, and Refinements. Technical Report CMU/SEI-2000-SR-008. Carnegie Mellon University.Google Scholar
- Chris Bogdiukiewicz, Michael Butler, Thai Son Hoang, Martin Paxton, James Snook, Xanthippe Waldron, and Toby Wilkinson. 2017. Formal development of policing functions for intelligent systems. In Proceedings of the 28th International Symposium on Software Reliability Engineering. IEEE, 194--204.Google ScholarCross Ref
- Andrew P. Bradley. 1997. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 7 (1997), 1145--1159.Google ScholarDigital Library
- Houssem Ben Braiek and Foutse Khomh. 2018. On Testing Machine Learning Programs. arXiv:1812.02257. Retrieved from https://arxiv.org/abs/1812.02257.Google Scholar
- Carla E. Brodley and Mark A. Friedl. 1999. Identifying mislabeled training data. J. Artif. Intell. Res. 11 (1999), 131--167.Google ScholarCross Ref
- Atilla Bulmus, Axel Freiwald, and Chris Wunderlich. 2017. Over the Air Software Update Realization within Generic Modules with Microcontrollers Using External Serial FLASH. Technical Report. SAE Technical Paper.Google Scholar
- Jonathod Byrd and Zachary Lipton. 2019. What is the effect of importance weighting in deep learning? arXiv:1812.03372. Retrieved from https://arxiv.org/abs/1812.03372.Google Scholar
- Radu Calinescu, Danny Weyns, Simos Gerasimou, Muhammad Usman Iftikhar, Ibrahim Habli, and Tim Kelly. 2018. Engineering trustworthy self-adaptive software with dynamic assurance cases. IEEE Trans. Softw. Eng. 44, 11 (2018), 1039--1069.Google ScholarCross Ref
- Cristian S. Calude and Giuseppe Longo. 2017. The deluge of spurious correlations in big data. Found. Sci. 22, 3 (2017), 595--612.Google ScholarCross Ref
- Richard Carlsson, Björn Gustavsson, Erik Johansson, Thomas Lindgren, Sven-Olof Nyström, Mikael Pettersson, and Robert Virding. 2000. Core Erlang 1.0 Language Specification. Technical Report. Information Technology Department, Uppsala University.Google Scholar
- Paul Caseley. 2016. Claims and architectures to rationate on automatic and autonomous functions. In Proceedings of the 11th International Conference on System Safety and Cyber-Security. IET, 1--6.Google ScholarCross Ref
- Nitesh V. Chawla, Aleksandar Lazarevic, Lawrence O. Hall, and Kevin W. Bowyer. 2003. SMOTEBoost: Improving prediction of the minority class in boosting. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery. 107--119.Google Scholar
- Liming Chen and Algirdas Avizienis. 1978. N-version programming: A fault-tolerance approach to reliability of software operation. In Proceedings of the 8th IEEE International Symposium on Fault-Tolerant Computing, Vol. 1. 3--9.Google Scholar
- Xinyun Chen, Chang Liu, Bo Li, Kimberley Lu, and Dawn Song. 2017. Targeted Backdoor Attacks on Deep Learning Systems Using Data Poisoning. arXiv:1712.05526. Retrieved from https://arxiv.org/abs/1712.05526.Google Scholar
- Heng-Tze Cheng, Levent Koc, Jeremiah Harmsen, Tal Shaked, Tushar Chandra, Hrishi Aradhye, Glen Anderson, Greg Corrado, Wei Chai, Mustafa Ispir, Zakaria Anil, Rohan an Haque, Lichan Hong, Vihan Jain, Xiabing Liu, and Hemal Shah. 2016. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 7--10.Google ScholarDigital Library
- Patryk Chrabaszcz, Ilya Loshchilov, and Frank Hutter. 2018. Back to basics: Benchmarking canonical evolution strategies for playing Atari. arXiv:1802.08842. Retrieved from https://arxiv.org/abs/1802.08842.Google Scholar
- David A. Cieslak and Nitesh V. Chawla. 2009. A framework for monitoring classifiers performance: When and why failure occurs? Knowl. Inf. Syst. 18, 1 (2009), 83--108.Google ScholarDigital Library
- Adnan Darwiche. 2018. Human-level intelligence or animal-like abilities? Comm. ACM 61, 10 (2018), 56--67.Google ScholarDigital Library
- Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248--255.Google ScholarCross Ref
- Yue Deng, Feng Bao, Youyong Kong, Zhiquan Ren, and Qionghai Dai. 2017. Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28, 3 (2017), 653--664.Google ScholarCross Ref
- Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv:1702.08608. Retrieved from https://arxiv.org/abs/1702.08608.Google Scholar
- Tommaso Dreossi, Daniel J. Fremont, Shromona Ghosh, Edward Kim, Hadi Ravanbakhsh, Marcell Vazquez-Chanlatte, and Sanjit A Seshia. 2019. VERIFAI: A toolkit for the design and analysis of artificial intelligence-based systems. arXiv:1902.04245. Retrieved from https://arxiv.org/abs/1902.04245.Google Scholar
- Tommaso Dreossi, Shromona Ghosh, Xiangyu Yue, Kurt Keutzer, Alberto Sangiovanni-Vincentelli, and Sanjit A Seshia. 2018. Counterexample-guided data augmentation. arXiv:1805.06962. Retrieved from https://arxiv.org/abs/1805.06962.Google Scholar
- Tommaso Dreossi, Somesh Jha, and Sanjit A. Seshia. 2018. Semantic adversarial deep learning. arXiv:1804.07045. Retrieved from https://arxiv.org/abs/1804.07045.Google Scholar
- Chris Drummond and Robert C. Holte. 2006. Cost curves: An improved method for visualizing classifier performance. Mach. Learn. 65, 1 (2006), 95--130.Google ScholarDigital Library
- Souradeep Dutta, Xin Chen, Susmit Jha, Sriram Sankaranarayanan, and Ashish Tiwari. 2019. Sherlock-A tool for verification of neural network feedback systems: Demo abstract. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control. 262--263.Google ScholarDigital Library
- Ruediger Ehlers. 2017. Formal verification of piece-wise linear feed-forward neural networks. In Proceedings of the International Symposium on Automated Technology for Verification and Analysis. Springer, 269--286.Google ScholarCross Ref
- Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. 2018. Adversarial vulnerability for any classifier. arXiv:1802.08686. Retrieved from https://arxiv.org/abs/1802.08686.Google Scholar
- Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2015. Fundamental limits on adversarial robustness. In Proceedings of the ICML Workshop on Deep Learning.Google Scholar
- Alhussein Fawzi, Seyed-Mohsen Moosavi-Dezfooli, and Pascal Frossard. 2016. Robustness of classifiers: From adversarial to random noise. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., Red Hook, NY, 1632--1640.Google Scholar
- Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 259--268.Google ScholarDigital Library
- Angelo Ferrando, Louise A. Dennis, Davide Ancona, Michael Fisher, and Viviana Mascardi. 2018. Verifying and validating autonomous systems: Towards an integrated approach. In Proceedings of the International Conference on Runtime Verification. Springer, 263--281.Google ScholarCross Ref
- Peter Flach. 2019. Performance evaluation in machine learning: The good, the bad, the ugly and the way forward. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 9808--9814.Google ScholarDigital Library
- Michael Forsting. 2017. Machine learning will change medicine. J. Nucl. Med. 58, 3 (2017), 357--358.Google ScholarCross Ref
- Yoav Freund, Robert Schapire, and Naoki Abe. 1999. A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 14, 771--780 (1999), 1612.Google Scholar
- Yoav Freund and Robert E. Schapire. 1997. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 1 (1997), 119--139.Google ScholarDigital Library
- Timon Gehr, Matthew Mirman, Dana Drachsler-Cohen, Petar Tsankov, Swarat Chaudhuri, and Martin Vechev. 2018. AI2: Safety and robustness certification of neural networks with abstract interpretation. In Proceedings of the 2018 IEEE Symposium on Security and Privacy. IEEE, 3--18.Google ScholarCross Ref
- Aurélien Géron. 2017. Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc.Google Scholar
- Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep Learning. Vol. 1. MIT Press.Google ScholarDigital Library
- Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv:1412.6572. Retrieved from https://arxiv.org/abs/1412.6572.Google Scholar
- Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. 2017. BadNets: Identifying vulnerabilities in the machine learning model supply chain. arXiv:1708.06733. Retrieved from https://arxiv.org/abs/1708.06733.Google Scholar
- Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. 2017. Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73 (2017), 220--239.Google ScholarDigital Library
- Jeff Heaton. 2016. An empirical analysis of feature engineering for predictive modeling. In Proceedings of SoutheastCon’16. IEEE, 1--6.Google ScholarCross Ref
- Constance L. Heitmeyer, Ralph D. Jeffords, and Bruce G. Labaw. 1996. Automated consistency checking of requirements specifications. ACM Trans. Softw. Eng. Methodol. 5, 3 (1996), 231--261.Google ScholarDigital Library
- Parker Hill, Babak Zamirai, Shengshuo Lu, Yu-Wei Chao, Michael Laurenzano, Mehrzad Samadi, Marios C. Papaefthymiou, Scott A. Mahlke, Thomas F. Wenisch, Jia Deng, Lingjia Tang, and Jason Mars. [n.d.]. Rethinking Numerical Representations for Deep Neural Networks. arXiv:1808.02513. Retrieved from https://arxiv.org/abs/1808.02513.Google Scholar
- Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. [n.d.]. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580. Retrieved from https://arxiv.org/abs/1207.0580.Google Scholar
- Xiaowei Huang, Marta Kwiatkowska, Sen Wang, and Min Wu. 2017. Safety verification of deep neural networks. In Proceedings of the 29th International Conference on Computer Aided Verification, Rupak Majumdar and Viktor Kuncak (Eds.), Lecture Notes in Computer Science, Vol. 10426. Springer, 3--29.Google Scholar
- Zhongling Huang, Zongxu Pan, and Bin Lei. 2017. Transfer learning with deep convolutional neural network for SAR target classification with limited labeled data. Remote Sens. 9, 9 (2017), 907.Google ScholarCross Ref
- Casidhe Hutchison, Milda Zizyte, Patrick E. Lanigan, David Guttendorf, Michael Wagner, Claire Le Goues, and Philip Koopman. 2018. Robustness testing of autonomy software. In Proceedings of the 40th IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice. 276--285.Google ScholarDigital Library
- Frank Hutter, Jörg Lücke, and Lars Schmidt-Thieme. 2015. Beyond manual tuning of hyperparameters. Künstl. Intell. 29, 4 (2015), 329--337.Google ScholarCross Ref
- Didac Gil De La Iglesia and Danny Weyns. 2015. MAPE-K formal templates to rigorously design behaviors for self-adaptive systems. ACM Trans. Auton. Adapt. Syst. 10, 3 (2015), 15.Google Scholar
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167. Retrieved from https://arxiv.org/abs/1502.03167.Google Scholar
- Bandar Seri Iskandar. 2017. Terrorism detection based on sentiment analysis using machine learning. J. Eng. Appl. Sci. 12, 3 (2017), 691--698.Google Scholar
- ISO. 2018. Road Vehicles—Functional Safety: Part 6. Technical Report BS ISO 26262-6:2018. ISO.Google Scholar
- Nathalie Japkowicz. 2001. Concept-learning in the presence of between-class and within-class imbalances. In Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence. Springer, 67--77.Google ScholarCross Ref
- Nikita Johnson and Tim Kelly. 2019. Devil’s in the detail: Through-life safety and security co-assurance using SSAF. In Proceedings of the 38th International Conference on Computer Safety, Reliability, and Security. Springer, 299--314.Google ScholarDigital Library
- Taylor T. Johnson, Stanley Bak, Marco Caccamo, and Lui Sha. 2016. Real-time reachability for verified Simplex design. ACM Trans. Embed. Comput. Syst. 15, 2 (2016), 1--27.Google ScholarDigital Library
- M. H. Kabir, M. R. Hoque, H. Seo, and S. H. Yang. 2015. Machine learning based adaptive context-aware system for smart home environment. Int. J. Smart Home 9, 11 (2015), 55--62.Google ScholarCross Ref
- Faisal Kamiran and Toon Calders. 2012. Data preprocessing techniques for classification without discrimination. Knowl. Inf. Syst. 33, 1 (2012), 1--33.Google ScholarDigital Library
- Guy Katz, Clark Barrett, David L. Dill, Kyle Julian, and Mykel J. Kochenderfer. 2017. Reluplex: An efficient SMT solver for verifying deep neural networks. In Proceedings of the International Conference on Computer Aided Verification. Springer, 97--117.Google Scholar
- Guy Katz, Derek A. Huang, Duligur Ibeling, Kyle Julian, Christopher Lazarus, Rachel Lim, Parth Shah, Shantanu Thakoor, Haoze Wu, Aleksandar Zeljić, et al. 2019. The marabou framework for verification and analysis of deep neural networks. In Proceedings of the International Conference on Computer Aided Verification. Springer, 443--452.Google ScholarCross Ref
- Shachar Kaufman, Saharon Rosset, Claudia Perlich, and Ori Stitelman. 2012. Leakage in data mining: Formulation, detection, and avoidance. ACM Trans. Knowl. Discov. Data 6, 4 (2012), 15.Google ScholarDigital Library
- Jeffrey O. Kephart and David M. Chess. 2003. The vision of autonomic computing. Computer 36, 1 (2003), 41--50.Google ScholarDigital Library
- Muhammad Taimoor Khan, Dimitrios Serpanos, and Howard Shrobe. 2016. A rigorous and efficient run-time security monitor for real-time critical embedded system applications. In Proceedings of the 3rd World Forum on Internet of Things. IEEE, 100--105.Google ScholarCross Ref
- Udayan Khurana, Horst Samulowitz, and Deepak Turaga. 2018. Feature engineering for predictive modeling using reinforcement learning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 3407--3414.Google Scholar
- Roger E. Kirk. 2007. Experimental design. Wiley Online Library.Google Scholar
- Tom Ko, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur. 2015. Audio augmentation for speech recognition. In Proceedings of the 16th Annual Conference of the International Speech Communication Association.Google ScholarCross Ref
- Patrick Koch, Brett Wujek, Oleg Golovidov, and Steven Gardner. 2017. Automated hyperparameter tuning for effective machine learning. In Proceedings of the SAS Global Forum Conference.Google Scholar
- Matthieu Komorowski, Leo A. Celi, Omar Badawi, Anthony C. Gordon, and A. Aldo Faisal. 2018. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat. Med. 24, 11 (2018), 1716--1720.Google ScholarCross Ref
- Philip Koopman and Frank Fratrik. 2019. How many operational design domains, objects, and events? In Proceedings of the AAAI Workshop on Artificial Intelligence Safety.Google Scholar
- Philip Koopman, Aaron Kane, and Jen Black. 2019. Credible autonomy safety argumentation. In Proceedings of the 27th Safety-Critical Systems Symposium.Google Scholar
- S. B. Kotsiantis, Dimitris Kanellopoulos, and P. E. Pintelas. 2006. Data preprocessing for supervised leaning. Int. J. Comput. Sci. 1, 2 (2006), 111--117.Google Scholar
- S. B. Kotsiantis, D. Kanellopoulos, and P. E. Pintelas. 2007. Data preprocessing for supervised leaning. Int. J. Comput. Electr. Autom. Contr. Inf. Eng. 1, 12 (2007), 4104--4109.Google Scholar
- Samantha Krening, Brent Harrison, Karen M. Feigh, Charles Lee Isbell, Mark Riedl, and Andrea Thomaz. 2017. Learning from explanations using sentiment and advice in RL. IEEE Trans. Cogn. Dev. Syst. 9, 1 (2017), 44--55.Google ScholarCross Ref
- Isaac Lage, Andrew Ross, Kim Been, Samuel Gershman, and Finale Doshi-Velez. 2018. Human-in-the-loop interpretability prior. In Proceedings of the Conference on Neural Information Processing Systems. 10180--10189.Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278--2323.Google ScholarCross Ref
- Joseph Lemley, Filip Jagodzinski, and Razvan Andonie. 2016. Big holes in big data: A Monte Carlo algorithm for detecting large hyper-rectangles in high dimensional data. In Proceedings of the IEEE Computer Software and Applications Conference. 563--571.Google ScholarCross Ref
- Zachary C. Lipton. 2016. The mythos of model interpretability. arXiv:1606.03490. Retrieved from https://arxiv.org/abs/1606.03490.Google Scholar
- Yingqi Liu, Wen-Chuan Lee, Guanhong Tao, Shiqing Ma, Yousra Aafer, and Xiangyu Zhang. 2019. ABS: Scanning neural networks for back-doors by artificial brain stimulation. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security. 1265--1282.Google ScholarDigital Library
- Victoria López, Alberto Fernández, Salvador García, Vasile Palade, and Francisco Herrera. 2013. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250 (2013), 113--141.Google ScholarCross Ref
- Gustavo A. Lujan-Moreno, Phillip R. Howard, Omar G. Rojas, and Douglas C. Montgomery. 2018. Design of experiments and response surface methodology to tune machine learning hyperparameters, with a random forest case-study. Expert Syst. Appl. 109 (2018), 195--205.Google ScholarDigital Library
- Lei Ma, Felix Juefei-Xu, Minhui Xue, Bo Li, Li Li, Yang Liu, and Jianjun Zhao. 2019. DeepCT: Tomographic combinatorial testing for deep learning systems. In Proceedings of the 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. IEEE, 614--618.Google ScholarCross Ref
- Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, Yang Liu, Jianjun Zhao, and Yadong Wang. 2018. DeepGauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 120--131.Google ScholarDigital Library
- Mathilde Machin, Jérémie Guiochet, Hélène Waeselynck, Jean-Paul Blanquart, Matthieu Roy, and Lola Masson. 2018. SMOF: A safety monitoring framework for autonomous systems. IEEE Trans. Syst. Man Cybernet. Syst. 48, 5 (2018), 702--715.Google ScholarCross Ref
- Aravindh Mahendran and Andrea Vedaldi. 2015. Understanding deep image representations by inverting them. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5188--5196.Google ScholarCross Ref
- Spyros Makridakis. 2017. The forthcoming Artificial Intelligence (AI) revolution: Its impact on society and firms. Futures 90 (2017), 46--60.Google ScholarCross Ref
- Pedro Marcelino. 2018. Transfer learning from pre-trained models. In Towards Data Science (2018).Google Scholar
- George Mason, Radu Calinescu, Daniel Kudenko, and Alec Banks. 2017. Assured reinforcement learning with formally verified abstract policies. In Proceedings of the 9th International Conference on Agents and Artificial Intelligence. 105--117.Google ScholarCross Ref
- Michael Maurer, Ivan Breskovic, Vincent C. Emeakaroha, and Ivona Brandic. 2011. Revealing the MAPE loop for the autonomic management of cloud infrastructures. In Proceedings of the Symposium on Computers and Communications. IEEE, 147--152.Google ScholarDigital Library
- Markus Maurer, J. Christian Gerdes, Barbara Lenz, and Hermann Winner. 2016. Autonomous Driving: Technical, Legal and Social Aspects. Springer Nature.Google Scholar
- Christopher Meyer and Jörg Schwenk. 2013. SoK: Lessons learned from SSL/TLS attacks. In Proceedings of the International Workshop on Information Security Applications. Springer, 189--209.Google Scholar
- Microsoft. 2019. How to choose algorithms for Azure Machine Learning Studio. Retrieved February 2019 from https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-choice.Google Scholar
- Tom M. Mitchell. 1997. Machine Learning. McGraw–Hill.Google Scholar
- Model Zoos Caffe 2019. Caffe Model Zoo. Retrieved March 2019 from http://caffe.berkeleyvision.org/model_zoo.html.Google Scholar
- Model Zoos Github 2019. Model Zoos of machine and deep learning technologies. Retrieved March 2019 from https://github.com/collections/ai-model-zoos.Google Scholar
- Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1765--1773.Google ScholarCross Ref
- Jose G. Moreno-Torres, Troy Raeder, Rocío Alaiz-Rodríguez, Nitesh V. Chawla, and Francisco Herrera. 2012. A unifying view on dataset shift in classification. Pattern Recogn. 45, 1 (2012), 521--530.Google ScholarDigital Library
- Pamela A. Munro and Barbara G. Kanki. 2003. An analysis of ASRS maintenance reports on the use of minimum equipment lists. In Proceedings of the 12th International Symposium on Aviation Psychology.Google Scholar
- Kevin P. Murphy. 2012. Machine Learning: A Probabilistic Perspective. The MIT Press.Google ScholarDigital Library
- Partha Niyogi and Federico Girosi. 1996. On the relationship between generalization error, hypothesis complexity, and sample complexity for radial basis functions. Neural Comput. 8, 4 (1996), 819--842.Google ScholarDigital Library
- Object Management Group. 2018. Structured Assurance Case Metamodel (SACM). Version 2.0.Google Scholar
- Augustus Odena and Ian Goodfellow. 2018. TensorFuzz: Debugging neural networks with coverage-guided fuzzing. arXiv:1807.10875. Retrived from https://arxiv.org/abs/1807.10875.Google Scholar
- Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1717--1724.Google ScholarDigital Library
- Nicolas Papernot, Patrick McDaniel, Ian Goodfellow, Somesh Jha, Z. Berkay Celik, and Ananthram Swami. 2017. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. ACM, 506--519.Google ScholarDigital Library
- Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM, 1--18.Google ScholarDigital Library
- Teresa Placho, Christoph Schmittner, Arndt Bonitz, and Oliver Wana. 2020. Management of automotive software updates. Microprocess. Microsystems. 78 (2020), 103257.Google ScholarCross Ref
- Michael J. Pont and Royan H. L. Ong. 2002. Using watchdog timers to improve the reliability of single-processor embedded systems: Seven new patterns and a case study. In Proceedings of the 1st Nordic Conference on Pattern Languages of Programs.Google Scholar
- Lutz Prechelt. 1998. Early stopping-but when? In Neural Networks: Tricks of the Trade. Springer, 55--69.Google ScholarDigital Library
- Philipp Probst, Bernd Bischl, and Anne-Laure Boulesteix. 2018. Tunability: Importance of hyperparameters of machine learning algorithms. arXiv:1802.09596. Retrieved from https://arxiv.org/abs/1802.09596.Google Scholar
- Foster Provost and Tom Fawcett. 2001. Robust classification for imprecise environments. Mach. Learn. 42, 3 (2001), 203--231.Google ScholarDigital Library
- J. Provost Foster, Fawcett Tom, and Kohavi Ron. 1998. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th International Conference on Machine Learning. 445--453.Google Scholar
- R-Bloggers Data Analysis 2019. How to Use Data Analysis for Machine Learning. Retrieved February 2019 from https://www.r-bloggers.com/how-to-use-data-analysis-for-machine-learning-example-part-1.Google Scholar
- Stephan Rabanser, Stephan Günnemann, and Zachary C. Lipton. 2019. Failing loudly: An empirical study of methods for detecting dataset shift. Advances in Neural Information Processing Systems 32 (2019).Google Scholar
- Jan Ramon, Kurt Driessens, and Tom Croonenborghs. 2007. Transfer learning in reinforcement learning problems through partial policy recycling. In Proceedings of the European Conference on Machine Learning. Springer, 699--707.Google ScholarDigital Library
- Francesco Ranzato and Marco Zanella. 2019. Robustness verification of support vector machines. In Proceedings of the International Static Analysis Symposium. Springer, 271--295.Google ScholarDigital Library
- Jorge-L. Reyes-Ortiz, Luca Oneto, Albert Samà, Xavier Parra, and Davide Anguita. 2016. Transition-aware human activity recognition using smartphones. Neurocomputing 171 (2016), 754--767.Google ScholarDigital Library
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1135--1144.Google ScholarDigital Library
- F. Ricci, L. Rokach, and B. Shapira. 2015. Recommender systems: Introduction and challenges. Recommender Systems Handbook (2015), 1--34.Google Scholar
- German Ros, Laura Sellart, Joanna Materzynska, David Vazquez, and Antonio M. Lopez. 2016. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3234--3243.Google Scholar
- Andrew Slavin Ross and Finale Doshi-Velez. 2018. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 1660--1669.Google Scholar
- Saharon Rosset, Claudia Perlich, Grzergorz Świrszcz, Prem Melville, and Yan Liu. 2010. Medical data mining: Insights from winning two competitions. Data Min. Knowl. Discov. 20, 3 (2010), 439--468.Google ScholarDigital Library
- RTCA. 2011. Software Considerations in Airborne Systems and Equipment Certification. Technical Report DO-178C.Google Scholar
- Cynthia Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 5 (2019), 206--215.Google ScholarCross Ref
- Stuart J. Russell and Peter Norvig. 2016. Artificial Intelligence: A Modern Approach. Pearson Education Limited.Google Scholar
- Jerome Sacks, William J. Welch, Toby J. Mitchell, and Henry P. Wynn. 1989. Design and analysis of computer experiments. Stat. Sci. (1989), 409--423.Google Scholar
- Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Data Min. Knowl. Discov. 8, 4 (2018), e1249.Google Scholar
- Ahmed Salem, Michael Backes, and Yang Zhang. 2020. Don’t Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks. arXiv:2010.03282. Retrieved from https://arxiv.org/abs/2010.03282.Google Scholar
- Robert G. Sargent. 2009. Verification and validation of simulation models. In Proceedings of the Winter Simulation Conference. 162--176.Google ScholarCross Ref
- Lawrence K. Saul and Sam T. Roweis. 2003. Think globally, fit locally: Unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4(Jun.2003), 119--155.Google Scholar
- Christoph Schorn, Andre Guntoro, and Gerd Ascheid. 2018. Efficient on-line error detection and mitigation for deep neural network accelerators. In Proceedings of the International Conference on Computer Safety, Reliability, and Security. Springer, 205--219.Google ScholarCross Ref
- Scikit-Taxonomy 2019. Scikit—Choosing the right estimator. Retrieved February 2019 from https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html.Google Scholar
- Noam Segev, Maayan Harel, Shie Mannor, Koby Crammer, and Ran El-Yaniv. 2017. Learn on source, refine on target: A model transfer learning framework with random forests. IEEE Trans. Pattern Anal. Mach. Intell. 39, 9 (2017), 1811--1824.Google ScholarDigital Library
- Daniel Selsam, Percy Liang, and David L. Dill. 2017. Developing bug-free machine learning systems with formal mathematics. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 3047--3056.Google Scholar
- Victor S. Sheng and Jing Zhang. 2019. Machine learning with crowdsourcing: A brief summary of the past research and future directions. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 9837--9843.Google Scholar
- Andy Shih, Arthur Choi, and Adnan Darwiche. 2018. Formal verification of Bayesian network classifiers. In Proceedings of the International Conference on Probabilistic Graphical Models. 427--438.Google Scholar
- Padhraic Smyth. 1996. Bounds on the mean classification error rate of multiple experts. Pattern Recogn. Lett. 17, 12 (1996), 1253--1257.Google ScholarDigital Library
- Marina Sokolova and Guy Lapalme. 2009. A systematic analysis of performance measures for classification tasks. Inf. Process. Manage. 45, 4 (2009), 427--437.Google ScholarDigital Library
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1 (2014), 1929--1958.Google ScholarDigital Library
- Sanatan Sukhija, Narayanan C. Krishnan, and Deepak Kumar. 2018. Supervised heterogeneous transfer learning using random forests. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data. ACM, 157--166.Google ScholarDigital Library
- Youcheng Sun, Min Wu, Wenjie Ruan, Xiaowei Huang, Marta Kwiatkowska, and Daniel Kroening. 2018. Concolic testing for deep neural networks. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 109--119.Google ScholarDigital Library
- Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv:1312.6199. Retrieved from https://arxiv.org/abs/1312.6199.Google Scholar
- A. Taber and E. Normand. 1993. Single event upset in avionics. IEEE Trans. Nucl. Sci. 40, 2 (1993), 120--126.Google ScholarCross Ref
- Mariarosaria Taddeo, Tom McCutcheon, and Luciano Floridi. 2019. Trusting artificial intelligence in cybersecurity is a double-edged sword. Nat. Mach. Intell. (2019), 557--560.Google Scholar
- Luke Taylor and Geoff Nitschke. 2017. Improving deep learning using generic data augmentation. arXiv:1708.06020. Retrieved from https://arxiv.org/abs/1708.06020.Google Scholar
- Chris Thornton, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 847--855.Google ScholarDigital Library
- Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM, 303--314.Google ScholarDigital Library
- John Törnblom and Simin Nadjm-Tehrani. 2018. Formal verification of random forests in safety-critical applications. In Proceedings of the International Workshop on Formal Techniques for Safety-Critical Systems. Springer, 55--71.Google Scholar
- Hoang-Dung Tran, Stanley Bak, Weiming Xiang, and Taylor T. Johnson. 2020. Verification of deep convolutional neural networks using ImageStars. arXiv:2004.05511. Retrieved from https://arxiv.org/abs/2004.05511.Google Scholar
- Hoang-Dung Tran, Xiaodong Yang, Diego Manzanas Lopez, Patrick Musau, Luan Viet Nguyen, Weiming Xiang, Stanley Bak, and Taylor T. Johnson. 2020. NNV: The neural network verification tool for deep neural networks and learning-enabled cyber-physical systems. arXiv:2004.05519. Retrieved from https://arxiv.org/abs/2004.05519.Google Scholar
- John W. Tukey. 1977. Exploratory Data Analysis. Vol. 2. Reading, MA.Google Scholar
- Jasper van der Waa, Jurriaan van Diggelen, Mark A Neerincx, and Stephan Raaijmakers. 2018. ICM: An intuitive model independent and accurate certainty measure for machine learning.. In Proceedings of the International Conference on Agents and Artificial Intelligence (ICAART’18). 314--321.Google ScholarCross Ref
- Perry Van Wesel and Alwyn E. Goodloe. 2017. Challenges in the verification of reinforcement learning algorithms. (2017).Google Scholar
- Kiri Wagstaff. 2012. Machine learning that matters. arXiv:1206.4656. Retrieved from https://arxiv.org/abs/1206.4656.Google Scholar
- Kiri L. Wagstaff and Benjamin Bornstein. 2009. K-means in space: A radiation sensitivity evaluation. In Proceedings of the 26th Annual International Conference on Machine Learning. 1097--1104.Google Scholar
- Li Wan, Matthew Zeiler, Sixin Zhang, Yann Le Cun, and Rob Fergus. 2013. Regularization of neural networks using dropconnect. In Proceedings of the International Conference on Machine Learning. 1058--1066.Google Scholar
- Binghui Wang and Neil Zhenqiang Gong. 2018. Stealing hyperparameters in machine learning. In Proceedings of the 2018 IEEE Symposium on Security and Privacy. IEEE, 36--52.Google ScholarCross Ref
- Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Proceedings of the 2019 IEEE Symposium on Security and Privacy. IEEE, 707--723.Google ScholarCross Ref
- Ke Wang, Senqiang Zhou, Chee Ada Fu, and Jeffrey Xu Yu. 2003. Mining changes of classification by correspondence tracing. In Proceedings of the 2003 SIAM International Conference on Data Mining. SIAM, 95--106.Google ScholarCross Ref
- Lu Wang, Xuanqing Liu, Jinfeng Yi, Zhi-Hua Zhou, and Cho-Jui Hsieh. 2019. Evaluating the robustness of nearest neighbor classifiers: A primal-dual perspective. arXiv:1906.03972. Retrieved from https://arxiv.org/abs/1906.03972.Google Scholar
- Yihan Wang, Huan Zhang, Hongge Chen, Duane Boning, and Cho-Jui Hsieh. 2020. On -norm robustness of ensemble stumps and trees. arXiv:2008.08755. Retrieved from https://arxiv.org/abs/2008.08755.Google Scholar
- Gary M. Weiss. 2004. Mining with rarity: A unifying framework. ACM SIGKDD Expl. Newslett. 6, 1 (2004), 7--19.Google ScholarDigital Library
- Karl Weiss, Taghi M. Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. J. Big Data 3, 1 (2016), 9.Google ScholarCross Ref
- Reinhard Wilhelm, Jakob Engblom, Andreas Ermedahl, Niklas Holsti, Stephan Thesing, David Whalley, Guillem Bernat, Christian Ferdinand, Reinhold Heckmann, Tulika Mitra, Frank Mueller, Isabelle Puaut, Peter Puschner, Jan Straschulat, and Per Strenström. 2008. The worst-case execution-time problem—Overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7, 3 (2008), 36.Google ScholarDigital Library
- Sebastien C. Wong, Adam Gatt, Victor Stamatescu, and Mark D. McDonnell. 2016. Understanding data augmentation for classification: When to warp? In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications. IEEE, 1--6.Google Scholar
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V., Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv:1609.0814. Retrieved from https://arxiv.org/abs/1609.0814.Google Scholar
- Steven R. Young, Derek C. Rose, Thomas P. Karnowski, Seung-Hwan Lim, and Robert M Patton. 2015. Optimizing deep learning hyper-parameters through an evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments. ACM, 4.Google ScholarDigital Library
- X. Yuan, Y. Chen, Y. Zhao, Y. Long, X. Liu, K. Chen, S. Zhang, H. Huang, X. Wang, and C. A. Gunter. 2018. CommanderSong: A systematic approach for practical adversarial voice recognition. arXiv:1801.08535. Retrieved from https://arxix.org/abs/1801.08535.Google Scholar
- Matei Zaharia, Andrew Chen, Aaron Davidson, Ali Ghodsi, Sue Ann Hong, Andy Konwinski, Siddharth Murching, Tomas Nykodym, Paul Ogilvie, Mani Parkhe, Fen Xie, and Corey Zumar. 2018. Accelerating the machine learning lifecycle with MLflow. Data Eng. 41, 4 (2018), 39--45.Google Scholar
- Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic autonomous driving system testing. arXiv:1802.02295. Retrieved from https://arxiv.org/abs/1802.02295.Google Scholar
- Shichao Zhang, Chengqi Zhang, and Qiang Yang. 2003. Data preparation for data mining. Appl. Artif. Intell. 17, 5--6 (2003), 375--381.Google ScholarCross Ref
- Stephan Zheng, Yang Song, Thomas Leung, and Ian Goodfellow. 2016. Improving the robustness of deep neural networks via stability training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4480--4488.Google ScholarCross Ref
- Zhun Zhong, Liang Zheng, Guoliang Kang, Shaozi Li, and Yi Yang. 2017. Random erasing data augmentation. arXiv:1708.04896. Retrieved from https://arxiv.org/abs/1708.04896.Google Scholar
Index Terms
- Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges
Recommendations
Assured Deep Multi-Agent Reinforcement Learning for Safe Robotic Systems
Agents and Artificial IntelligenceAbstractUsing multi-agent reinforcement learning to find solutions to complex decision-making problems in shared environments has become standard practice in many scenarios. However, this is not the case in safety-critical scenarios, where the ...
MLife: a lite framework for machine learning lifecycle initialization
AbstractMachine learning (ML) lifecycle is a cyclic process to build an efficient ML system. Though a lot of commercial and community (non-commercial) frameworks have been proposed to streamline the major stages in the ML lifecycle, they are normally ...
Utilising Assured Multi-Agent Reinforcement Learning within Safety-Critical Scenarios
AbstractMulti-agent reinforcement learning allows a team of agents to learn how to work together to solve complex decision-making problems in a shared environment. However, this learning process utilises stochastic mechanisms, meaning that its use in ...
Comments