research-article

Open Access

Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing

Authors:
Inioluwa Deborah Raji

Partnership on AI

Partnership on AI
View Profile

,
Andrew Smart

Google

Google
View Profile

,
Rebecca N. White

Google

Google
View Profile

,
Margaret Mitchell

Google

Google
View Profile

,
Timnit Gebru

Google

Google
View Profile

,
Ben Hutchinson

Google

Google
View Profile

,
Jamila Smith-Loud

Google

Google
View Profile

,
Daniel Theron

Google

Google
View Profile

,
Parker Barnes

Google

Google
View Profile

FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and TransparencyJanuary 2020Pages 33–44https://doi.org/10.1145/3351095.3372873

Published:27 January 2020Publication History

FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

Pages 33–44

ABSTRACT

Rising concern for the societal implications of artificial intelligence systems has inspired a wave of academic and journalistic literature in which deployed systems are audited for harm by investigators from outside the organizations deploying the algorithms. However, it remains challenging for practitioners to identify the harmful repercussions of their own systems prior to deployment, and, once deployed, emergent issues can become difficult or impossible to trace back to their source.

In this paper, we introduce a framework for algorithmic auditing that supports artificial intelligence system development end-to-end, to be applied throughout the internal organization development life-cycle. Each stage of the audit yields a set of documents that together form an overall audit report, drawing on an organization's values or principles to assess the fit of decisions made throughout the process. The proposed auditing framework is intended to contribute to closing the accountability gap in the development and deployment of large-scale artificial intelligence systems by embedding a robust process to ensure audit integrity.

Supplemental Material

Available for Download

pdf

p33-raji-supp.pdf (932.5 KB)

Supplemental material.

References

Omar Y Al-Jarrah, Paul D Yoo, Sami Muhaidat, George K Karagiannidis, and Kamal Taha. 2015. Efficient machine learning for big data: A review. Big Data Research 2, 3 (2015), 87--93.Google ScholarDigital Library
Amel Bennaceur, Thein Than Tun, Yijun Yu, and Bashar Nuseibeh. 2019. Requirements Engineering. In Handbook of Software Engineering. Springer, 51--92.Google Scholar
Li Bing, Akintola Akintoye, Peter J Edwards, and Cliff Hardcastle. 2005. The allocation of risk in PPP/PFI construction projects in the UK. International Journal of project management 23, 1 (2005), 25--35.Google Scholar
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D Sculley. 2017. The ml test score: A rubric for ml production readiness and technical debt reduction. In 2017 IEEE International Conference on Big Data (Big Data). IEEE, 1123--1132.Google ScholarCross Ref
Shona L Brown and Kathleen M Eisenhardt. 1995. Product development: Past research, present findings, and future directions. Academy of management review 20, 2 (1995), 343--378.Google Scholar
Chad Brubaker, Suman Jana, Baishakhi Ray, Sarfraz Khurshid, and Vitaly Shmatikov. 2014. Using Frankencerts for Automated Adversarial Testing of Certificate Validation. In in SSL/TLS Implementations, âĂİ IEEE Symposium on Security and Privacy. Citeseer.Google Scholar
Joanna J Bryson, Mihailis E Diamantis, and Thomas D Grant. 2017. Of, for, and by the people: the legal lacuna of synthetic persons. Artificial Intelligence and Law 25, 3 (2017), 273--291.Google ScholarDigital Library
Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77--91.Google Scholar
Jenna Burrell. 2016. How the machine "thinks": Understanding opacity in machine learning algorithms. Big Data & Society 3, 1 (2016), 2053951715622512.Google ScholarCross Ref
Paul Eric Byrnes, Abdullah Al-Awadhi, Benita Gullvist, Helen Brown-Liburd, Ryan Teeter, J Donald Warren Jr, and Miklos Vasarhelyi. 2018. Evolution of Auditing: From the Traditional Approach to the Future Audit 1. In Continuous Auditing: Theory and Application. Emerald Publishing Limited, 285--297.Google Scholar
Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Conference on Fairness, Accountability and Transparency. 134--148.Google Scholar
Angèle Christin. 2017. Algorithms in practice: Comparing web journalism and criminal justice. Big Data & Society 4, 2 (2017), 2053951717718855.Google ScholarCross Ref
Kai Lai Chung and Paul Erdös. 1952. On the application of the Borel-Cantelli lemma. Trans. Amer. Math. Soc. 72, 1 (1952), 179--186.Google ScholarCross Ref
Rachel Courtland. 2018. Bias detectives: the researchers striving to make algorithms fair. Nature 558, 7710 (2018), 357--357.Google Scholar
Stephanie Cuccaro-Alamin, Regan Foust, Rhema Vaithianathan, and Emily Putnam-Hornstein. 2017. Risk assessment and decision making in child protective services: Predictive risk modeling in context. Children and Youth Services Review 79 (2017), 291--298.Google ScholarCross Ref
Michael A Cusumano and Stanley A Smith. 1995. Beyond the waterfall: Software development at Microsoft. (1995).Google Scholar
Nicholas Diakopoulos. 2014. Algorithmic accountability reporting: On the investigation of black boxes. (2014).Google Scholar
Roel Dobbe, Sarah Dean, Thomas Gilbert, and Nitin Kohli. 2018. A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics. arXiv preprint arXiv:1807.00553 (2018).Google Scholar
Kevin Driscoll, Brendan Hall, Håkan Sivencrona, and Phil Zumsteg. 2003. Byzantine fault tolerance, from theory to reality. In International Conference on Computer Safety, Reliability, and Security. Springer, 235--248.Google ScholarCross Ref
Danielle Ensign, Sorelle A Friedler, Scott Neville, Carlos Scheidegger, and Suresh Venkatasubramanian. 2017. Runaway feedback loops in predictive policing. arXiv preprint arXiv:1706.09847 (2017).Google Scholar
Virginia Eubanks. 2018. A child abuse prediction model fails poor families. Wired Magazine (2018).Google Scholar
Sellywati Mohd Faizal, Mohd Rizal Palil, Ruhanita Maelah, and Rosiati Ramli. 2017. Perception on justice, trust and tax compliance behavior in Malaysia. Kasetsart Journal of Social Sciences 38, 3 (2017), 226--232.Google ScholarCross Ref
Jonathan Furner. 2016. "Data": The data. In Information Cultures in the Digital Age. Springer, 287--306.Google Scholar
Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumeé III, and Kate Crawford. 2018. Datasheets for datasets. arXiv preprint arXiv:1803.09010 (2018).Google Scholar
Jeremy Goldhaber-Fiebert and Lea Prince. 2019. Impact Evaluation of a Predictive Risk Modeling Tool for Allegheny CountyâĂ&Zacute;s Child Welfare Office. Pittsburgh: Allegheny County.[Google Scholar] (2019).Google Scholar
Ben Green and Yiling Chen. 2019. Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 90--99.Google ScholarDigital Library
Daniel Greene, Anna Lauren Hoffmann, and Luke Stark. 2019. Better, nicer, clearer, fairer: A critical assessment of the movement for ethical artificial intelligence and machine learning. In Proceedings of the 52nd Hawaii International Conference on System Sciences.Google ScholarCross Ref
Shixiang Gu and Luca Rigazio. 2014. Towards deep neural network architectures robust to adversarial examples. arXiv preprint arXiv:1412.5068 (2014).Google Scholar
John Haigh. 2012. Probability: A very short introduction. Vol. 310. Oxford University Press.Google ScholarCross Ref
Brendan Hall and Kevin Driscoll. 2014. Distributed System Design Checklist. (2014).Google Scholar
Kenneth Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudík, and Hanna Wallach. 2018. Improving fairness in machine learning systems: What do industry practitioners need? arXiv preprint arXiv:1812.05239 (2018).Google Scholar
IEEE. 2008. IEEE Standard for Software Reviews and Audits. IEEE Std 1028-2008 (Aug 2008), 1--53. Google ScholarCross Ref
Kristen Intemann. 2010. 25 years of feminist empiricism and standpoint theory: Where are we now? Hypatia 25, 4 (2010), 778--796.Google ScholarCross Ref
Anna Jobin, Marcello Ienca, and Effy Vayena. 2019. Artificial Intelligence: the global landscape of ethics guidelines. arXiv preprint arXiv:1906.11668 (2019).Google Scholar
Paul A Judas and Lorraine E Prokop. 2011. A historical compilation of software metrics with applicability to NASA's Orion spacecraft flight software sizing. Innovations in Systems and Software Engineering 7, 3 (2011), 161--170.Google ScholarDigital Library
Emily Keddell. 2019. Algorithmic Justice in Child Protection: Statistical Fairness, Social Justice and the Implications for Practice. Social Sciences 8, 10 (2019), 281.Google ScholarCross Ref
Svetlana Kiritchenko and Saif M Mohammad. 2018. Examining gender and race bias in two hundred sentiment analysis systems. arXiv preprint arXiv:1805.04508 (2018).Google Scholar
Nitin Kohli, Renata Barreto, and Joshua A Kroll. 2018. Translation Tutorial: A Shared Lexicon for Research and Practice in Human-Centered Software Systems. In 1st Conference on Fairness, Accountability, and Transparancy. New York, NY, USA. 7.Google Scholar
Joshua A Kroll, Solon Barocas, Edward W Felten, Joel R Reidenberg, David G Robinson, and Harlan Yu. 2016. Accountable algorithms. U. Pa. L. Rev. 165 (2016), 633.Google Scholar
Arie W Kruglanski. 1996. Motivated social cognition: Principles of the interface. (1996).Google Scholar
Joel Lehman. 2019. Evolutionary Computation and AI Safety: Research Problems Impeding Routine and Safe Real-world Application of Evolution. arXiv preprint arXiv:1906.10189 (2019).Google Scholar
Nancy Leveson. 2011. Engineering a safer world: Systems thinking applied to safety. MIT press.Google Scholar
Jie Liu. 2012. The enterprise risk management and the risk oriented internal audit. Ibusiness 4, 03 (2012), 287.Google ScholarCross Ref
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision. 3730--3738.Google ScholarDigital Library
Amanda H Lynch and Siri Veland. 2018. Urgency in the Anthropocene. MIT Press.Google Scholar
Thomas Maillart, Mingyi Zhao, Jens Grossklags, and John Chuang. 2017. Given enough eyeballs, all bugs are shallow? Revisiting Eric Raymond with bug bounty programs. Journal of Cybersecurity 3, 2 (2017), 81--90.Google ScholarCross Ref
Michele Merler, Nalini Ratha, Rogerio S Feris, and John R Smith. 2019. Diversity in faces. arXiv preprint arXiv:1901.10436 (2019).Google Scholar
Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 220--229.Google ScholarDigital Library
Brent Mittelstadt. 2019. AI Ethics: Too Principled to Fail? SSRN (2019).Google Scholar
Brent Daniel Mittelstadt and Luciano Floridi. 2016. The ethics of big data: current and foreseeable issues in biomedical contexts. Science and engineering ethics 22, 2 (2016), 303--341.Google Scholar
Laura Moy. 2019. How Police Technology Aggravates Racial Inequity: A Taxonomy of Problems and a Path Forward. Available at SSRN 3340898 (2019).Google Scholar
Fabian Muniesa, Marc Lenglet, et al. 2013. Responsible innovation in finance: directions and implications. Responsible Innova-tion: Managing the Responsible Emergence of Science and Innovation in Society. Wiley, London (2013), 185--198.Google Scholar
Kristina Murphy. 2003. Procedural justice and tax compliance. Australian Journal of Social Issues (Australian Council of Social Service) 38, 3 (2003).Google Scholar
Safiya Umoja Noble. 2018. Algorithms of oppression: How search engines reinforce racism. nyu Press.Google Scholar
Institute of Internal Auditors. Research Foundation and Institute of Internal Auditors. 2007. The Professional Practices Framework. Inst of Internal Auditors.Google Scholar
General Assembly of the World Medical Association et al. 2014. World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. The Journal of the American College of Dentists 81, 3 (2014), 14.Google Scholar
Cathy O'neil. 2016. Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books.Google Scholar
Charles Parker. 2012. Unexpected challenges in large scale machine learning. In Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications. ACM, 1--6.Google ScholarDigital Library
Fiona D Patterson and Kevin Neailey. 2002. A risk register database system to aid the management of project risk. International Journal of Project Management 20, 5 (2002), 365--374.Google ScholarCross Ref
W Price and II Nicholson. 2017. Regulating black-box medicine. Mich. L. Rev. 116 (2017), 421.Google ScholarCross Ref
James Quesada, Laurie Kain Hart, and Philippe Bourgois. 2011. Structural vulnerability and health: Latino migrant laborers in the United States. Medical anthropology 30, 4 (2011), 339--362.Google Scholar
Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial ai products. In AAAI/ACM Conf. on AI Ethics and Society.Google ScholarDigital Library
Clarence Rodrigues and Stephen Cusick. 2011. Commercial aviation safety 5/e. McGraw Hill Professional.Google Scholar
G Sirgo Rodríguez, M Olona Cabases, MC Martin Delgado, F Esteban Reboll, A Pobo Peris, M Bodí Saera, et al. 2014. Audits in real time for safety in critical care: definition and pilot study. Medicina intensiva 38, 8 (2014), 473--482.Google Scholar
Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and discrimination: converting critical concerns into productive inquiry 22 (2014).Google Scholar
David Satava, Cam Caldwell, and Linda Richards. 2006. Ethics and the auditing culture: Rethinking the foundation of accounting and auditing. Journal of Business Ethics 64, 3 (2006), 271--284.Google ScholarCross Ref
David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, and Michael Young. 2014. Machine learning: The high interest credit card of technical debt. (2014).Google Scholar
Andrew D Selbst and Solon Barocas. 2018. The intuitive appeal of explainable machines. Fordham L. Rev. 87 (2018), 1085.Google Scholar
Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 59--68.Google ScholarDigital Library
Hetan Shah. 2018. Algorithmic accountability. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 376, 2128 (2018), 20170362.Google ScholarCross Ref
Dominic SB Soh and Nonna Martinov-Bennie. 2011. The internal audit function: Perceptions of internal audit roles, effectiveness and evaluation. Managerial Auditing Journal 26, 7 (2011), 605--622.Google ScholarCross Ref
Diomidis H Stamatis. 2003. Failure mode and effect analysis: FMEA from theory to execution. ASQ Quality press.Google Scholar
Jack Stilgoe, Richard Owen, and Phil Macnaghten. 2013. Developing a framework for responsible innovation. Research Policy 42, 9 (2013), 1568--1580.Google ScholarCross Ref
Alexander Styhre. 2015. The financialization of the firm: Managerial and social implications. Edward Elgar Publishing.Google Scholar
Alexander Styhre. 2018. The unfinished business of governance: towards new governance regimes. In The Unfinished Business of Governance. Edward Elgar Publishing.Google Scholar
JohnK Taylor. 2018. Quality assurance of chemical measurements. Routledge.Google Scholar
Marie B Teixeira, Marie Teixeira, and Richard Bradley. 2013. Design controls for the medical device industry. CRC press.Google Scholar
Manuel Trajtenberg. 2018. AI as the next GPT: a Political-Economy Perspective. Technical Report. National Bureau of Economic Research.Google Scholar
Frank Vanclay. 2003. International principles for social impact assessment. Impact assessment and project appraisal 21, 1 (2003), 5--12.Google Scholar
Tim Vanderveen. 2005. Averting highest-risk errors is first priority. Patient Safety and Quality Healthcare 2 (2005), 16--21.Google Scholar
Ajit Kumar Verma, Srividya Ajit, Durga Rao Karanki, et al. 2010. Reliability and safety engineering. Vol. 43. Springer.Google Scholar
Jess Whittlestone, Rune Nyrup, Anna Alexandrova, and Stephen Cave. 2019. The Role and Limits of Principles in AI Ethics: Towards a Focus on Tensions. In Proceedings of the AAAI/ACM Conference on AI Ethics and Society, Honolulu, HI, USA. 27--28.Google ScholarDigital Library
Yi Zeng, Enmeng Lu, and Cunqing Huangfu. 2018. Linking Artificial Intelligence Principles. arXiv preprint arXiv:1812.04814 (2018).Google Scholar

Index Terms

Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. System management
        Technology audits
2. Software and its engineering
  1. Software creation and management
    1. Software development process management

Recommendations

Cyber Audit Readiness: Closing the Gap
Read More
A Sociotechnical Audit: Assessing Police Use of Facial Recognition
FAccT '23: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency

Algorithmic audits are increasingly used to hold people accountable for the algorithms they implement. However, much work remains to integrate ethical and legal evaluations of how algorithms are used into audits. In this paper, we present a ...
Read More
Accountable key infrastructure (AKI): a proposal for a public-key validation infrastructure
WWW '13: Proceedings of the 22nd international conference on World Wide Web

Recent trends in public-key infrastructure research explore the tradeoff between decreased trust in Certificate Authorities (CAs), resilience against attacks, communication overhead (bandwidth and latency) for setting up an SSL/TLS connection, and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency
January 2020
895 pages
ISBN:9781450369367
DOI:10.1145/3351095
General Chairs:
Mireille Hildebrandt,
Carlos Castillo,
Program Chairs:
Elisa Celis,
Salvatore Ruggieri,
Linnet Taylor,
Gabriela Zanfir-Fortuna
Copyright © 2020 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 January 2020
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
accountability
algorithmic audits
machine learning
responsible innovation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 344
  Total Citations
  View Citations
- 23,190
  Total Downloads
- Downloads (Last 12 months)6,984
- Downloads (Last 6 weeks)1,063
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing

FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Cyber Audit Readiness: Closing the Gap

A Sociotechnical Audit: Assessing Police Use of Facial Recognition

Accountable key infrastructure (AKI): a proposal for a public-key validation infrastructure

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing

FAT* '20: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Cyber Audit Readiness: Closing the Gap

A Sociotechnical Audit: Assessing Police Use of Facial Recognition

Accountable key infrastructure (AKI): a proposal for a public-key validation infrastructure

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media