Responsible data management

Authors:
Julia Stoyanovich

New York University, New York, NY

New York University, New York, NY
View Profile

,
Serge Abiteboul

Inria & École Normale Supérieure, Paris, France

Inria & École Normale Supérieure, Paris, France
View Profile

,
Bill Howe

University of Washington, Seattle, WA

University of Washington, Seattle, WA
View Profile

,
H. V. Jagadish

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Sebastian Schelter

University of Amsterdam, Amsterdam, The Netherlands

University of Amsterdam, Amsterdam, The Netherlands
View Profile

Authors Info & Claims

Communications of the ACM Volume 65 Issue 6June 2022pp 64–74https://doi.org/10.1145/3488717

Published:20 May 2022Publication History

Communications of the ACM

Abstract

Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems.

References

Abiteboul, S. and Stoyanovich, J. Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation. J. of Data and Information Quality 11, 3 (2019), 15:1--15:9.Google ScholarDigital Library
Asudeh, A., Jin, Z., and Jagadish, H.V. Assessing and remedying coverage for a given dataset. In 35^th IEEE International Conference on Data Engineering (April 2019), 554--565.Google ScholarCross Ref
Baeza-Yates, R. Bias on the web. Communications of the ACM 61, 6 (2018), 54--61.Google ScholarDigital Library
Biessmann, F., Salinas, D., Schelter, S., Schmidt, P., and Lange, D. Deep learning for missing value imputation in tables with non-numerical data. In Proceedings of the 27^th ACM Intern. Conf. on Information and Knowledge Management (2018), 2017--2025.Google Scholar
Bogen, M. and Rieke, A. Help wanted: An examination of hiring algorithms, equity, and bias. Upturn (2018).Google Scholar
Cauwenberghs, G. and Poggio, T. Incremental and decremental support vector machine learning. NeurIPS (2001), 409--415.Google Scholar
Chen, I., Johansson, F., and Sontag, D. Why is my classifier discriminatory? S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 3543--3554.Google Scholar
Chouldechova, A. and Roth, A. A snapshot of the frontiers of fairness in machine learning. Communications of the ACM 63, 5 (2020), 82--89.Google ScholarDigital Library
Crenshaw, K. Demarginalizing the intersection of race and sex: A Black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum 1 (1989), 139--167.Google Scholar
Datta, A., Sen, S., and Zick, Y. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy (May 2016), 598--617.Google ScholarCross Ref
Friedler, S., Scheidegger, C., and Venkatasubramanian, S. The (im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Communications of the ACM 64, 4 (2021), 136--143.Google ScholarDigital Library
Friedman, B. and Nissenbaum, H. Bias in computer systems. ACM Transactions on Information Systems 14, 3 (1996), 330--347.Google ScholarDigital Library
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J., Wallach, H., Daumé III, H., and Crawford, K. Datasheets for datasets. CoRR (2018), abs/1803.09010.Google Scholar
Ginart, A., Guan, M., Valiant, G., and Zou, J. Making AI forget you: Data deletion in machine learning. In NeurIPS (2019), 3513--3526.Google Scholar
Grafberger, S., Stoyanovich, J., and Schelter, S. Lightweight inspection of data preprocessing in native machine learning pipelines. In 11^th Conf. on Innovative Data Sys. Research, Online Proceedings (January 2021), http://www.cidrdb.org.Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., and Pedreschi, D. A survey of methods for explaining black box models. ACM Computing Surveys 51, 5 (2019), 93:1--93:42.Google ScholarDigital Library
Herschel, M., Diestelkämper, R., and Ben Lahmar, H. A survey on provenance: What for? What form? What from? VLDB Journal 26, 6 (2017), 881--906.Google ScholarDigital Library
Holland, S., Hosny, A., Newman, S., Joseph, J., and Chmielinski, K. The dataset nutrition label: A framework to drive higher data quality standards. CoRR (2018), abs/1805.03677.Google Scholar
Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J., Ramakrishnan, R., and Shahabi, C. Big data and its technical challenges. Communications of the ACM 57, 7 (2014), 86--94.Google ScholarDigital Library
Kappelhof, J. Survey research and the quality of survey data among ethnic minorities. In Total Survey Error in Practice, Wiley (2017).Google ScholarCross Ref
Kilbertus, N., Carulla, M., Parascandolo, G., Hardt, M., Janzing, D., and Schölkopf, B. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems (2017), 656--666.Google ScholarDigital Library
Kusner, M., Loftus, J., Russell, C., and Silva, R. Counterfactual fairness. I. Guyon, U. von Luxburg, S. Bengio, H.M. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, In Advances in Neural Information Processing Systems 30: (2017), 4066--4076.Google Scholar
Lehr, D. and Ohm, P. Playing with the data: What legal scholars should learn about machine learning. UC Davis Law Review 51, 2 (2017), 653--717.Google Scholar
Lewis, A. and Stoyanovich, J. Teaching responsible data science. Intern. J. of Artificial Intelligence in Education (2021).Google Scholar
Mitchell, M., et al. Model cards for model reporting. In Proceedings of the Conf. on Fairness, Accountability, and Transparency 2019, 220--229.Google ScholarDigital Library
Olteanu, A., Castillo, C., Diaz, F., and Kiciman, E. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers Big Data 2, 13 (2019).Google ScholarCross Ref
Rabanser, S., Günnemann, S., and Lipton, Z. Failing loudly: An empirical study of methods for detecting dataset shift. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Gannett, editors. In Advances in Neural Information Processing Systems 32 (December 2019), 1394--1406.Google Scholar
Reeves, R. and Halikias, D. Race gaps in SAT scores highlight inequality and hinder upward mobility. Brookings (2017), https://www.brookings.edu/research/race-gaps-in-sat-scores-highlight-inequality-and-hinder-upward-mobility.Google Scholar
Salimi, B., Rodriguez, L., Howe, B., and Suciu, D. Interventional fairness: Causal database repair for algorithmic fairness. P.A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, editors. In Proceedings of the 2019 Intern. Conf. on Management of Data, 793--810.Google Scholar
Sarkar, S., Papon, T., Staratzis, D., and Athanassoulis, M. Lethe: A tunable delete-aware LSM engine. In Proceedings of the 2020 Intern. Conf. on Management of Data.Google Scholar
Schelter, S. "Amnesia"--a selection of machine learning models that can forget user data very fast. Conf. on Innovative Data Systems Research, 2020.Google Scholar
Schelter, S., Grafberger, S., and Dunning, T. HedgeCut: Maintaining randomised trees for low-latency machine unlearning. In Proceedings of the 2021 Intern. Conf. on Management of Data.Google Scholar
Schelter, S. and Stoyanovich, J. Taming technical bias in machine learning pipelines. IEEE Data Engineering Bulletin 43, 4 (2020).Google Scholar
Selbst, A. Disparate impact in big data policing. Georgia Law Review 52, 109 (2017).Google Scholar
Shastri, S., Banakar, V., Wasserman, M., Kumar, A., and Chidambaram, V. Understanding and benchmarking the impact of GDPR on database systems. PVLDB (2020).Google Scholar
Stoyanovich, J. and Howe, B. Nutritional labels for data and models. IEEE Data Engineering Bulletin 42, 3 (2019), 13--23.Google Scholar
Stoyanovich, J., Howe, B., and Jagadish, H.V. Responsible data management. In Proceedings of the VLDB Endowment 13, 12 (2020), 3474--3488.Google ScholarDigital Library
Yang, K., Loftus, J., and Stoyanovich, J. Causal intersectionality and fair ranking. K. Ligett and S. Gupta, editors. In 2^nd Symposium on Foundations of Responsible Computing, Volume 192 of LIPICS, Schloss Dagstuhl--Leibniz Center for Informatics (June 2021), 7:1--7:20.Google Scholar
Yang, K., Stoyanovich, J., Asudeh, A., Howe, B., Jagadish, H.V., and Miklau, G. A nutritional label for rankings. G. Das, C. Jermaine, and P. Bernstein, editors. In Proceedings of the 2018 Intern. Conf. on Management of Data, 1773--1776.Google Scholar
Zehlike, M., Yang, K., and Stoyanovich, J. Fairness in ranking: A survey. CoRR (2021), abs/2103.14000.Google Scholar

Index Terms

Responsible data management

Recommendations

OM Forum—A Vision of Responsible Research in Operations Management
Are we contributing positively to the society at large by research that we conduct in the field of Operations Management? Currently, the answer is probably closer to “no” than to “yes.” We often do not realize it, but there is very real cost of conducting ...
Read More
Data management and model management: a relational synthesis
ACM-SE 20: Proceedings of the 20th annual Southeast regional conference

The successful implementation of data base management systems has led to suggestions that similar systems, called model management systems, be developed for decision models to facilitate and control user access to models and to integrate sets of models. ...
Read More
The DAMA Guide to the Data Management Body of Knowledge
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 65, Issue 6
June 2022
98 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3538687
Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 May 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 16,280
  Total Downloads
- Downloads (Last 12 months)2,191
- Downloads (Last 6 weeks)190
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Responsible data management

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

OM Forum—A Vision of Responsible Research in Operations Management

Data management and model management: a relational synthesis

The DAMA Guide to the Data Management Body of Knowledge