skip to main content
research-article

Responsible data management

Published:01 August 2020Publication History
Skip Abstract Section

Abstract

The need for responsible data management intensifies with the growing impact of data on society. One central locus of the societal impact of data are Automated Decision Systems (ADS), socio-legal-technical systems that are used broadly in industry, non-profits, and government. ADS process data about people, help make decisions that are consequential to people's lives, are designed with the stated goals of improving efficiency and promoting equitable access to opportunity, involve a combination of human and automated decision making, and are subject to auditing for legal compliance and to public disclosure. They may or may not use AI, and may or may not operate with a high degree of autonomy, but they rely heavily on data.

In this article, we argue that the data management community is uniquely positioned to lead the responsible design, development, use, and oversight of ADS. We outline a technical research agenda that requires that we step outside our comfort zone of engineering for efficiency and accuracy, to also incorporate reasoning about values and beliefs. This seems high-risk, but one of the upsides is being able to explain to our children what we do and why it matters.

References

  1. Open provenance. https://openprovenance.org. [Online; accessed 14-August-2019].Google ScholarGoogle Scholar
  2. S. Abiteboul and J. Stoyanovich. Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation. J. Data and Information Quality, 11(3):15:1--15:9, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Asudeh, H. V. Jagadish, G. Miklau, and J. Stoyanovich. On obtaining stable rankings. PVLDB, 12(3):237--250, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. Asudeh, Z. Jin, and H. V. Jagadish. Assessing and remedying coverage for a given dataset. In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019, pages 554--565. IEEE, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  5. Automated Decision Systems Task Force. New York City Automated Decision Systems Task Force Report. https://www1.nyc.gov/assets/adstaskforce/downloads/pdf/ADS-Report-11192019.pdf, 2019. [Online; accessed 14-August-2019].Google ScholarGoogle Scholar
  6. M. Babaioff, N. Immorlica, D. Kempe, and R. Kleinberg. Online auctions and generalized secretary problems. SIGecom Exchanges, 7(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Baeza-Yates. Bias on the web. Commun. ACM, 61(6):54--61, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Biessmann, D. Salinas, S. Schelter, P. Schmidt, and D. Lange. Deep learning for missing value imputation in tables with non-numerical data. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 2017--2025. ACM, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Bogen and A. Rieke. Help wanted: An examination of hiring algorithms, equity, and bias. Upturn, 2018.Google ScholarGoogle Scholar
  10. L. E. Celis, D. Straszak, and N. K. Vishnoi. Ranking with fairness constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP, pages 28:1--28:15, 2018.Google ScholarGoogle Scholar
  11. I. Y. Chen, F. D. Johansson, and D. A. Sontag. Why is my classifier discriminatory? In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 3543--3554, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. K. Citron and F. A. Pasquale. The scored society: Due process for automated predictions. Washington Law Review, 89, 2014.Google ScholarGoogle Scholar
  13. S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 797--806. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Crenshaw. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, (1):139--167, 1989.Google ScholarGoogle Scholar
  15. A. Datta, S. Sen, and Y. Zick. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016, pages 598--617. IEEE Computer Society, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Drosou, H. V. Jagadish, E. Pitoura, and J. Stoyanovich. Diversity in big data: A review. Big Data, 5(2):73--84, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  17. E. Dynkin. The optimum choice of the instant for stopping a markov process. Sov. Math. Dokl., 4, 1963.Google ScholarGoogle Scholar
  18. S. A. Friedler, C. Scheidegger, and S. Venkatasubramanian. On the (im)possibility of fairness. CoRR, abs/1609.07236, 2016.Google ScholarGoogle Scholar
  19. B. Friedman and H. Nissenbaum. Bias in computer systems. ACM Trans. Inf. Syst., 14(3):330--347, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. M. Wallach, H. D. III, and K. Crawford. Datasheets for datasets. CoRR, abs/1803.09010, 2018.Google ScholarGoogle Scholar
  21. M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. In D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 3315--3323, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Heidari, M. Loi, K. P. Gummadi, and A. Krause. A moral framework for understanding fair ML through economic models of equality of opportunity. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 181--190. ACM, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Herndon, M. Ash, and R. Pollin. Does high public debt consistently stifle economic growth? a critique of Reinhart and Rogof. Political Economy Research Institute working Paper Series, (322), 2013.Google ScholarGoogle Scholar
  24. M. Herschel, R. Diestelkämper, and H. Ben Lahmar. A survey on provenance: What for? what form? what from? VLDB J., 26(6):881--906, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Holland, A. Hosny, S. Newman, J. Joseph, and K. Chmielinski. The dataset nutrition label: A framework to drive higher data quality standards. CoRR, abs/1805.03677, 2018.Google ScholarGoogle Scholar
  26. H. V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi. Big data and its technical challenges. Commun. ACM, 57(7):86--94, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Kappelhof. Total Survey Error in Practice, chapter Survey Research and the Quality of Survey Data Among Ethnic Minorities. 2017.Google ScholarGoogle Scholar
  29. N. Kilbertus, M. R. Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656--666, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. K. Kirkpatrick. It's not the algorithm, it's the data. Commun. ACM, 60(2):21--23, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. M. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In C. H. Papadimitriou, editor, 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, volume 67 of LIPIcs, pages 43:1--43:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017.Google ScholarGoogle Scholar
  32. R. Koch. What is the LGPD? Brazil's version of the GDPR. https://gdpr.eu/gdpr-vs-lgpd/, 2018. [Online; accessed 14-August-2019].Google ScholarGoogle Scholar
  33. A. R. Koene, L. Dowthwaite, and S. Seth. IEEE p7003 standard for algorithmic bias considerations: work in progress paper. In Proceedings of the International Workshop on Software Fairness, FairWare@ICSE 2018, Gothenburg, Sweden, May 29, 2018, pages 38--41, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. A. Kroll, J. Huey, S. Barocas, E. W. Felten, J. R. Reidenberg, D. G. Robinson, and H. Yu. Accountable algorithms. University of Pennsylvania Law Review, 165, 2017.Google ScholarGoogle Scholar
  35. M. J. Kusner, J. R. Loftus, C. Russell, and R. Silva. Counterfactual fairness. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 4066--4076, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Lehr and P. Ohm. Playing with the data: What legal scholars should learn about machine learning. UC Davis Law Review, 51(2):653--717, 2017.Google ScholarGoogle Scholar
  37. D. V. Lindley. Dynamic programming and decision theory. Journal of the Royal Statistical Society, 10(1):39--51, 03 1961.Google ScholarGoogle Scholar
  38. G. Loewenstein. Confronting reality: pitfalls of calorie posting. The American Journal of Clinical Nutrition, 93(4):679--680, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  39. K. Lum and W. Isaac. To predict and serve? Significance, 13(5), 2016.Google ScholarGoogle Scholar
  40. M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, and T. Gebru. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 220--229, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. S. Mitchell, E. Potash, S. Barocas, A. D'Amour, and K. Lum. Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions. CoRR, abs/1811.07867, 2020.Google ScholarGoogle Scholar
  42. L. Moreau, B. Ludäscher, I. Altintas, R. S. Barga, S. Bowers, S. P. Callahan, G. C. Jr., B. Clifford, S. Cohen, S. C. Boulakia, S. B. Davidson, E. Deelman, L. A. Digiampietri, I. T. Foster, J. Freire, J. Frew, J. Futrelle, T. Gibson, Y. Gil, C. A. Goble, J. Golbeck, P. T. Groth, D. A. Holland, S. Jiang, J. Kim, D. Koop, A. Krenek, T. M. McPhillips, G. Mehta, S. Miles, D. Metzger, S. Munroe, J. Myers, B. Plale, N. Podhorszki, V. Ratnakar, E. Santos, C. E. Scheidegger, K. Schuchardt, M. I. Seltzer, Y. L. Simmhan, C. T. Silva, P. Slaughter, E. G. Stephan, R. Stevens, D. Turi, H. T. Vo, M. Wilde, J. Zhao, and Y. Zhao. Special issue: The first provenance challenge. Concurrency and Computation: Practice and Experience, 20(5):409--418, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. R. Nabi and I. Shpitser. Fair inference on outcomes. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 1931--1940. AAAI Press, 2018.Google ScholarGoogle Scholar
  44. A. Narayanan. How to recognize ai snake oil. https://www.cs.princeton.edu/~arvindn/talks/MIT-STS-AI-snakeoil.pdf, 2019. Arthur Miller lecture on science and ethics, MIT.Google ScholarGoogle Scholar
  45. S. E. Page. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies-New Edition. Princeton University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. B. Pan, H. Hembrooke, T. Joachims, L. Lorigo, G. Gay, and L. Granka. In google we trust: Users' decisions on rank, position, and relevance. Journal of computer-mediated communication, 12(3):801--823, 2007.Google ScholarGoogle Scholar
  47. J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Personal Information Protection Commission, Japan. Amended Act on the Protection of Personal Information. 2016.Google ScholarGoogle Scholar
  49. J. Rawls. A theory of justice. Harvard University Press, 1971.Google ScholarGoogle Scholar
  50. R. V. Reeves and D. Halikias. Race gaps in sat scores highlight inequality and hinder upward mobility. https://www.brookings.edu/research/race-gaps-in-sat-scores-highlight-inequality\-and-hinder-upward-mobility, 2017. [Online; accessed 14-August-2019].Google ScholarGoogle Scholar
  51. J. E. Roemer. Equality of opportunity: a progress report. Social Choice and Welfare, 19(2):405--471, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  52. C. Russell, M. J. Kusner, J. Loftus, and R. Silva. When worlds collide: integrating different counterfactual assumptions in fairness. In Advances in Neural Information Processing Systems, pages 6414--6423, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. B. Salimi, H. Parikh, M. Kayali, L. Getoor, S. Roy, and D. Suciu. Causal relational learning. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, June 14-19, 2020, pages 241--256. ACM, 2020. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. B. Salimi, L. Rodriguez, B. Howe, and D. Suciu. Interventional fairness: Causal database repair for algorithmic fairness. In P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 793--810. ACM, 2019. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. S. Schelter, F. Biessmann, T. Januschowski, D. Salinas, S. Seufert, G. Szarvas, M. Vartak, S. Madden, H. Miao, A. Deshpande, et al. On challenges in machine learning model management. IEEE Data Eng. Bull., 41(4):5--15, 2018.Google ScholarGoogle Scholar
  56. S. Schelter, Y. He, J. Khilnani, and J. Stoyanovich. Fairprep: Promoting data to a first-class citizen in studies on fairness-enhancing interventions. In A. Bonifati, Y. Zhou, M. A. V. Salles, A. Böhm, D. Olteanu, G. H. L. Fletcher, A. Khan, and B. Yang, editors, Proceedings of the 23nd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020, pages 395--398. OpenProceedings.org, 2020.Google ScholarGoogle Scholar
  57. D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison. Hidden technical debt in machine learning systems. In Advances in neural information processing systems, pages 2503--2511, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. P. Stone, R. Brooks, E. Brynjolfsson, R. Calo, O. Etzioni, G. Hager, J. Hirschberg, S. Kalyanakrishnan, E. Kamar, S. Kraus, K. Leyton-Brown, D. Parkes, W. Press, A. A. Saxenian, J. Shah, M. Tambe, and A. Teller. One hundred year study on artificial intelligence: Report of the 2015-2016 study panel. Stanford University, 2016.Google ScholarGoogle Scholar
  59. J. Stoyanovich, S. Amer-Yahia, and T. Milo. Making interval-based clustering rank-aware. In A. Ailamaki, S. Amer-Yahia, J. M. Patel, T. Risch, P. Senellart, and J. Stoyanovich, editors, EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21-24, 2011, Proceedings, pages 437--448. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. Stoyanovich and B. Howe. Nutritional labels for data and models. IEEE Data Eng. Bull., 42(3):13--23, 2019.Google ScholarGoogle Scholar
  61. J. Stoyanovich, B. Howe, S. Abiteboul, G. Miklau, A. Sahuguet, and G. Weikum. Fides: Towards a platform for responsible data science. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27-29, 2017, pages 26:1--26:6. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. J. Stoyanovich and A. Lewis. Teaching responsible data science: Charting new pedagogical territory. CoRR, abs/1912.10564, 2019.Google ScholarGoogle Scholar
  63. J. Stoyanovich, J. J. Van Bavel, and T. V. West. The imperative of interpretable machines. Nature Machine Intelligence, 2:197--199, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  64. J. Stoyanovich, K. Yang, and H. V. Jagadish. Online set selection with fairness and diversity constraints. In M. H. Böhlen, R. Pichler, N. May, E. Rahm, S. Wu, and K. Hose, editors, Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018, pages 241--252. OpenProceedings.org, 2018.Google ScholarGoogle Scholar
  65. J. Surowiecki. The wisdom of crowds. Anchor, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. The European Union. Regulation (EU) 2016/679: General Data Protection Regulation (GDPR). 2016.Google ScholarGoogle Scholar
  67. The New York City Council. A local law to amend the administrative code of the city of new york, in relation to the sale of automated employment decision tools. https://legistar.council.nyc.gov/LegislationDetail.aspx?ID=4344524&GUID=B051915D-A9AC-451E-81F8-6596032FA3F9, 2020.Google ScholarGoogle Scholar
  68. J. Yang. The future of work: Protecting workres' civil rights in the digital age. Testimony before the Education and Labor Committee, United States House of Representatives, 2020.Google ScholarGoogle Scholar
  69. K. Yang, V. Gkatzelis, and J. Stoyanovich. Balanced ranking with diversity constraints. In S. Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 6035--6042. ijcai.org, 2019. Google ScholarGoogle ScholarCross RefCross Ref
  70. K. Yang, J. R. Loftus, and J. Stoyanovich. Causal intersectionality for fair ranking. CoRR, abs/2006.08688, 2020.Google ScholarGoogle Scholar
  71. K. Yang and J. Stoyanovich. Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27-29, 2017, pages 22:1--22:6. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. K. Yang, J. Stoyanovich, A. Asudeh, B. Howe, H. V. Jagadish, and G. Miklau. A nutritional label for rankings. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pages 1773--1776. ACM, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. B. Zhang and A. Dafoe. Artificial intelligence: American attitudes and trends. Center for the Governance of AI, Future of Humanity Institute, University of Oxford, 2019.Google ScholarGoogle Scholar
  74. J. Zhang and E. Bareinboim. Fairness in decision-making - the causal explanation formula. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 2037--2045. AAAI Press, 2018.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 13, Issue 12
    August 2020
    1710 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 August 2020
    Published in pvldb Volume 13, Issue 12

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader