skip to main content
research-article
Open Access

Responsible data management

Published:20 May 2022Publication History
Skip Abstract Section

Abstract

Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems.

References

  1. Abiteboul, S. and Stoyanovich, J. Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation. J. of Data and Information Quality 11, 3 (2019), 15:1--15:9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Asudeh, A., Jin, Z., and Jagadish, H.V. Assessing and remedying coverage for a given dataset. In 35th IEEE International Conference on Data Engineering (April 2019), 554--565.Google ScholarGoogle ScholarCross RefCross Ref
  3. Baeza-Yates, R. Bias on the web. Communications of the ACM 61, 6 (2018), 54--61.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Biessmann, F., Salinas, D., Schelter, S., Schmidt, P., and Lange, D. Deep learning for missing value imputation in tables with non-numerical data. In Proceedings of the 27th ACM Intern. Conf. on Information and Knowledge Management (2018), 2017--2025.Google ScholarGoogle Scholar
  5. Bogen, M. and Rieke, A. Help wanted: An examination of hiring algorithms, equity, and bias. Upturn (2018).Google ScholarGoogle Scholar
  6. Cauwenberghs, G. and Poggio, T. Incremental and decremental support vector machine learning. NeurIPS (2001), 409--415.Google ScholarGoogle Scholar
  7. Chen, I., Johansson, F., and Sontag, D. Why is my classifier discriminatory? S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 3543--3554.Google ScholarGoogle Scholar
  8. Chouldechova, A. and Roth, A. A snapshot of the frontiers of fairness in machine learning. Communications of the ACM 63, 5 (2020), 82--89.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Crenshaw, K. Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum 1 (1989), 139--167.Google ScholarGoogle Scholar
  10. Datta, A., Sen, S., and Zick, Y. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy (May 2016), 598--617.Google ScholarGoogle ScholarCross RefCross Ref
  11. Friedler, S., Scheidegger, C., and Venkatasubramanian, S. The (im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Communications of the ACM 64, 4 (2021), 136--143.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Friedman, B. and Nissenbaum, H. Bias in computer systems. ACM Transactions on Information Systems 14, 3 (1996), 330--347.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J., Wallach, H., Daumé III, H., and Crawford, K. Datasheets for datasets. CoRR (2018), abs/1803.09010.Google ScholarGoogle Scholar
  14. Ginart, A., Guan, M., Valiant, G., and Zou, J. Making AI forget you: Data deletion in machine learning. In NeurIPS (2019), 3513--3526.Google ScholarGoogle Scholar
  15. Grafberger, S., Stoyanovich, J., and Schelter, S. Lightweight inspection of data preprocessing in native machine learning pipelines. In 11th Conf. on Innovative Data Sys. Research, Online Proceedings (January 2021), http://www.cidrdb.org.Google ScholarGoogle Scholar
  16. Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. A survey of methods for explaining black box models. ACM Computing Surveys 51, 5 (2019), 93:1--93:42.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Herschel, M., Diestelkämper, R., and Ben Lahmar, H. A survey on provenance: What for? What form? What from? VLDB Journal 26, 6 (2017), 881--906.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Holland, S., Hosny, A., Newman, S., Joseph, J., and Chmielinski, K. The dataset nutrition label: A framework to drive higher data quality standards. CoRR (2018), abs/1805.03677.Google ScholarGoogle Scholar
  19. Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J., Ramakrishnan, R., and Shahabi, C. Big data and its technical challenges. Communications of the ACM 57, 7 (2014), 86--94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kappelhof, J. Survey research and the quality of survey data among ethnic minorities. In Total Survey Error in Practice, Wiley (2017).Google ScholarGoogle ScholarCross RefCross Ref
  21. Kilbertus, N., Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., and Schölkopf, B. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems (2017), 656--666.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kusner, M., Loftus, J., Russell, C., and Silva, R. Counterfactual fairness. I. Guyon, U. von Luxburg, S. Bengio, H.M. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, In Advances in Neural Information Processing Systems 30: (2017), 4066--4076.Google ScholarGoogle Scholar
  23. Lehr, D. and Ohm, P. Playing with the data: What legal scholars should learn about machine learning. UC Davis Law Review 51, 2 (2017), 653--717.Google ScholarGoogle Scholar
  24. Lewis, A. and Stoyanovich, J. Teaching responsible data science. Intern. J. of Artificial Intelligence in Education (2021).Google ScholarGoogle Scholar
  25. Mitchell, M., et al. Model cards for model reporting. In Proceedings of the Conf. on Fairness, Accountability, and Transparency 2019, 220--229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Olteanu, A., Castillo, C., Diaz, F., and Kiciman, E. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers Big Data 2, 13 (2019).Google ScholarGoogle ScholarCross RefCross Ref
  27. Rabanser, S., Günnemann, S., and Lipton, Z. Failing loudly: An empirical study of methods for detecting dataset shift. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Gannett, editors. In Advances in Neural Information Processing Systems 32 (December 2019), 1394--1406.Google ScholarGoogle Scholar
  28. Reeves, R. and Halikias, D. Race gaps in SAT scores highlight inequality and hinder upward mobility. Brookings (2017), https://www.brookings.edu/research/race-gaps-in-sat-scores-highlight-inequality-and-hinder-upward-mobility.Google ScholarGoogle Scholar
  29. Salimi, B., Rodriguez, L., Howe, B., and Suciu, D. Interventional fairness: Causal database repair for algorithmic fairness. P.A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, editors. In Proceedings of the 2019 Intern. Conf. on Management of Data, 793--810.Google ScholarGoogle Scholar
  30. Sarkar, S., Papon, T., Staratzis, D., and Athanassoulis, M. Lethe: A tunable delete-aware LSM engine. In Proceedings of the 2020 Intern. Conf. on Management of Data.Google ScholarGoogle Scholar
  31. Schelter, S. "Amnesia"--a selection of machine learning models that can forget user data very fast. Conf. on Innovative Data Systems Research, 2020.Google ScholarGoogle Scholar
  32. Schelter, S., Grafberger, S., and Dunning, T. HedgeCut: Maintaining randomised trees for low-latency machine unlearning. In Proceedings of the 2021 Intern. Conf. on Management of Data.Google ScholarGoogle Scholar
  33. Schelter, S. and Stoyanovich, J. Taming technical bias in machine learning pipelines. IEEE Data Engineering Bulletin 43, 4 (2020).Google ScholarGoogle Scholar
  34. Selbst, A. Disparate impact in big data policing. Georgia Law Review 52, 109 (2017).Google ScholarGoogle Scholar
  35. Shastri, S., Banakar, V., Wasserman, M., Kumar, A., and Chidambaram, V. Understanding and benchmarking the impact of GDPR on database systems. PVLDB (2020).Google ScholarGoogle Scholar
  36. Stoyanovich, J. and Howe, B. Nutritional labels for data and models. IEEE Data Engineering Bulletin 42, 3 (2019), 13--23.Google ScholarGoogle Scholar
  37. Stoyanovich, J., Howe, B., and Jagadish, H.V. Responsible data management. In Proceedings of the VLDB Endowment 13, 12 (2020), 3474--3488.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Yang, K., Loftus, J., and Stoyanovich, J. Causal intersectionality and fair ranking. K. Ligett and S. Gupta, editors. In 2nd Symposium on Foundations of Responsible Computing, Volume 192 of LIPICS, Schloss Dagstuhl--Leibniz Center for Informatics (June 2021), 7:1--7:20.Google ScholarGoogle Scholar
  39. Yang, K., Stoyanovich, J., Asudeh, A., Howe, B., Jagadish, H.V., and Miklau, G. A nutritional label for rankings. G. Das, C. Jermaine, and P. Bernstein, editors. In Proceedings of the 2018 Intern. Conf. on Management of Data, 1773--1776.Google ScholarGoogle Scholar
  40. Zehlike, M., Yang, K., and Stoyanovich, J. Fairness in ranking: A survey. CoRR (2021), abs/2103.14000.Google ScholarGoogle Scholar

Index Terms

  1. Responsible data management

                                  Recommendations

                                  Comments

                                  Login options

                                  Check if you have access through your login credentials or your institution to get full access on this article.

                                  Sign in

                                  Full Access

                                  • Published in

                                    cover image Communications of the ACM
                                    Communications of the ACM  Volume 65, Issue 6
                                    June 2022
                                    98 pages
                                    ISSN:0001-0782
                                    EISSN:1557-7317
                                    DOI:10.1145/3538687
                                    Issue’s Table of Contents

                                    Copyright © 2022 ACM

                                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                                    Publisher

                                    Association for Computing Machinery

                                    New York, NY, United States

                                    Publication History

                                    • Published: 20 May 2022

                                    Permissions

                                    Request permissions about this article.

                                    Request Permissions

                                    Check for updates

                                    Qualifiers

                                    • research-article
                                    • Popular
                                    • Refereed

                                  PDF Format

                                  View or Download as a PDF file.

                                  PDF

                                  eReader

                                  View online with eReader.

                                  eReader

                                  HTML Format

                                  View this article in HTML Format .

                                  View HTML Format