research-article

Responsible data management

Authors:
Julia Stoyanovich

New York Univerisity

New York Univerisity
View Profile

,
Bill Howe

Univerisity of Washington

Univerisity of Washington
View Profile

,
H. V. Jagadish

Univerisity of Michigan

Univerisity of Michigan
View Profile

Proceedings of the VLDB Endowment Volume 13 Issue 12pp 3474–3488https://doi.org/10.14778/3415478.3415570

Published:01 August 2020Publication History

Proceedings of the VLDB Endowment

Abstract

The need for responsible data management intensifies with the growing impact of data on society. One central locus of the societal impact of data are Automated Decision Systems (ADS), socio-legal-technical systems that are used broadly in industry, non-profits, and government. ADS process data about people, help make decisions that are consequential to people's lives, are designed with the stated goals of improving efficiency and promoting equitable access to opportunity, involve a combination of human and automated decision making, and are subject to auditing for legal compliance and to public disclosure. They may or may not use AI, and may or may not operate with a high degree of autonomy, but they rely heavily on data.

In this article, we argue that the data management community is uniquely positioned to lead the responsible design, development, use, and oversight of ADS. We outline a technical research agenda that requires that we step outside our comfort zone of engineering for efficiency and accuracy, to also incorporate reasoning about values and beliefs. This seems high-risk, but one of the upsides is being able to explain to our children what we do and why it matters.

References

Open provenance. https://openprovenance.org. [Online; accessed 14-August-2019].Google Scholar
S. Abiteboul and J. Stoyanovich. Transparency, fairness, data protection, neutrality: Data management challenges in the face of new regulation. J. Data and Information Quality, 11(3):15:1--15:9, 2019. Google ScholarDigital Library
A. Asudeh, H. V. Jagadish, G. Miklau, and J. Stoyanovich. On obtaining stable rankings. PVLDB, 12(3):237--250, 2018. Google ScholarDigital Library
A. Asudeh, Z. Jin, and H. V. Jagadish. Assessing and remedying coverage for a given dataset. In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019, pages 554--565. IEEE, 2019.Google ScholarCross Ref
Automated Decision Systems Task Force. New York City Automated Decision Systems Task Force Report. https://www1.nyc.gov/assets/adstaskforce/downloads/pdf/ADS-Report-11192019.pdf, 2019. [Online; accessed 14-August-2019].Google Scholar
M. Babaioff, N. Immorlica, D. Kempe, and R. Kleinberg. Online auctions and generalized secretary problems. SIGecom Exchanges, 7(2), 2008. Google ScholarDigital Library
R. Baeza-Yates. Bias on the web. Commun. ACM, 61(6):54--61, 2018. Google ScholarDigital Library
F. Biessmann, D. Salinas, S. Schelter, P. Schmidt, and D. Lange. Deep learning for missing value imputation in tables with non-numerical data. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pages 2017--2025. ACM, 2018. Google ScholarDigital Library
M. Bogen and A. Rieke. Help wanted: An examination of hiring algorithms, equity, and bias. Upturn, 2018.Google Scholar
L. E. Celis, D. Straszak, and N. K. Vishnoi. Ranking with fairness constraints. In 45th International Colloquium on Automata, Languages, and Programming, ICALP, pages 28:1--28:15, 2018.Google Scholar
I. Y. Chen, F. D. Johansson, and D. A. Sontag. Why is my classifier discriminatory? In S. Bengio, H. M. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 3543--3554, 2018. Google ScholarDigital Library
D. K. Citron and F. A. Pasquale. The scored society: Due process for automated predictions. Washington Law Review, 89, 2014.Google Scholar
S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. Algorithmic decision making and the cost of fairness. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, August 13 - 17, 2017, pages 797--806. ACM, 2017. Google ScholarDigital Library
K. Crenshaw. Demarginalizing the intersection of race and sex: A black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. University of Chicago Legal Forum, (1):139--167, 1989.Google Scholar
A. Datta, S. Sen, and Y. Zick. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA, USA, May 22-26, 2016, pages 598--617. IEEE Computer Society, 2016.Google ScholarCross Ref
M. Drosou, H. V. Jagadish, E. Pitoura, and J. Stoyanovich. Diversity in big data: A review. Big Data, 5(2):73--84, 2017.Google ScholarCross Ref
E. Dynkin. The optimum choice of the instant for stopping a markov process. Sov. Math. Dokl., 4, 1963.Google Scholar
S. A. Friedler, C. Scheidegger, and S. Venkatasubramanian. On the (im)possibility of fairness. CoRR, abs/1609.07236, 2016.Google Scholar
B. Friedman and H. Nissenbaum. Bias in computer systems. ACM Trans. Inf. Syst., 14(3):330--347, 1996. Google ScholarDigital Library
T. Gebru, J. Morgenstern, B. Vecchione, J. W. Vaughan, H. M. Wallach, H. D. III, and K. Crawford. Datasheets for datasets. CoRR, abs/1803.09010, 2018.Google Scholar
M. Hardt, E. Price, and N. Srebro. Equality of opportunity in supervised learning. In D. D. Lee, M. Sugiyama, U. von Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain, pages 3315--3323, 2016. Google ScholarDigital Library
H. Heidari, M. Loi, K. P. Gummadi, and A. Krause. A moral framework for understanding fair ML through economic models of equality of opportunity. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT^* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 181--190. ACM, 2019. Google ScholarDigital Library
T. Herndon, M. Ash, and R. Pollin. Does high public debt consistently stifle economic growth? a critique of Reinhart and Rogof. Political Economy Research Institute working Paper Series, (322), 2013.Google Scholar
M. Herschel, R. Diestelkämper, and H. Ben Lahmar. A survey on provenance: What for? what form? what from? VLDB J., 26(6):881--906, 2017. Google ScholarDigital Library
S. Holland, A. Hosny, S. Newman, J. Joseph, and K. Chmielinski. The dataset nutrition label: A framework to drive higher data quality standards. CoRR, abs/1805.03677, 2018.Google Scholar
H. V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, J. M. Patel, R. Ramakrishnan, and C. Shahabi. Big data and its technical challenges. Commun. ACM, 57(7):86--94, 2014. Google ScholarDigital Library
K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4):422--446, 2002. Google ScholarDigital Library
J. Kappelhof. Total Survey Error in Practice, chapter Survey Research and the Quality of Survey Data Among Ethnic Minorities. 2017.Google Scholar
N. Kilbertus, M. R. Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. Avoiding discrimination through causal reasoning. In Advances in Neural Information Processing Systems, pages 656--666, 2017. Google ScholarDigital Library
K. Kirkpatrick. It's not the algorithm, it's the data. Commun. ACM, 60(2):21--23, 2017. Google ScholarDigital Library
J. M. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In C. H. Papadimitriou, editor, 8th Innovations in Theoretical Computer Science Conference, ITCS 2017, January 9-11, 2017, Berkeley, CA, USA, volume 67 of LIPIcs, pages 43:1--43:23. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2017.Google Scholar
R. Koch. What is the LGPD? Brazil's version of the GDPR. https://gdpr.eu/gdpr-vs-lgpd/, 2018. [Online; accessed 14-August-2019].Google Scholar
A. R. Koene, L. Dowthwaite, and S. Seth. IEEE p7003^™ standard for algorithmic bias considerations: work in progress paper. In Proceedings of the International Workshop on Software Fairness, FairWare@ICSE 2018, Gothenburg, Sweden, May 29, 2018, pages 38--41, 2018. Google ScholarDigital Library
J. A. Kroll, J. Huey, S. Barocas, E. W. Felten, J. R. Reidenberg, D. G. Robinson, and H. Yu. Accountable algorithms. University of Pennsylvania Law Review, 165, 2017.Google Scholar
M. J. Kusner, J. R. Loftus, C. Russell, and R. Silva. Counterfactual fairness. In I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4-9 December 2017, Long Beach, CA, USA, pages 4066--4076, 2017. Google ScholarDigital Library
D. Lehr and P. Ohm. Playing with the data: What legal scholars should learn about machine learning. UC Davis Law Review, 51(2):653--717, 2017.Google Scholar
D. V. Lindley. Dynamic programming and decision theory. Journal of the Royal Statistical Society, 10(1):39--51, 03 1961.Google Scholar
G. Loewenstein. Confronting reality: pitfalls of calorie posting. The American Journal of Clinical Nutrition, 93(4):679--680, 2011.Google ScholarCross Ref
K. Lum and W. Isaac. To predict and serve? Significance, 13(5), 2016.Google Scholar
M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, and T. Gebru. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT^* 2019, Atlanta, GA, USA, January 29-31, 2019, pages 220--229, 2019. Google ScholarDigital Library
S. Mitchell, E. Potash, S. Barocas, A. D'Amour, and K. Lum. Prediction-based decisions and fairness: A catalogue of choices, assumptions, and definitions. CoRR, abs/1811.07867, 2020.Google Scholar
L. Moreau, B. Ludäscher, I. Altintas, R. S. Barga, S. Bowers, S. P. Callahan, G. C. Jr., B. Clifford, S. Cohen, S. C. Boulakia, S. B. Davidson, E. Deelman, L. A. Digiampietri, I. T. Foster, J. Freire, J. Frew, J. Futrelle, T. Gibson, Y. Gil, C. A. Goble, J. Golbeck, P. T. Groth, D. A. Holland, S. Jiang, J. Kim, D. Koop, A. Krenek, T. M. McPhillips, G. Mehta, S. Miles, D. Metzger, S. Munroe, J. Myers, B. Plale, N. Podhorszki, V. Ratnakar, E. Santos, C. E. Scheidegger, K. Schuchardt, M. I. Seltzer, Y. L. Simmhan, C. T. Silva, P. Slaughter, E. G. Stephan, R. Stevens, D. Turi, H. T. Vo, M. Wilde, J. Zhao, and Y. Zhao. Special issue: The first provenance challenge. Concurrency and Computation: Practice and Experience, 20(5):409--418, 2008. Google ScholarDigital Library
R. Nabi and I. Shpitser. Fair inference on outcomes. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 1931--1940. AAAI Press, 2018.Google Scholar
A. Narayanan. How to recognize ai snake oil. https://www.cs.princeton.edu/~arvindn/talks/MIT-STS-AI-snakeoil.pdf, 2019. Arthur Miller lecture on science and ethics, MIT.Google Scholar
S. E. Page. The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies-New Edition. Princeton University Press, 2008. Google ScholarDigital Library
B. Pan, H. Hembrooke, T. Joachims, L. Lorigo, G. Gay, and L. Granka. In google we trust: Users' decisions on rank, position, and relevance. Journal of computer-mediated communication, 12(3):801--823, 2007.Google Scholar
J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009. Google ScholarDigital Library
Personal Information Protection Commission, Japan. Amended Act on the Protection of Personal Information. 2016.Google Scholar
J. Rawls. A theory of justice. Harvard University Press, 1971.Google Scholar
R. V. Reeves and D. Halikias. Race gaps in sat scores highlight inequality and hinder upward mobility. https://www.brookings.edu/research/race-gaps-in-sat-scores-highlight-inequality\-and-hinder-upward-mobility, 2017. [Online; accessed 14-August-2019].Google Scholar
J. E. Roemer. Equality of opportunity: a progress report. Social Choice and Welfare, 19(2):405--471, 2002.Google ScholarCross Ref
C. Russell, M. J. Kusner, J. Loftus, and R. Silva. When worlds collide: integrating different counterfactual assumptions in fairness. In Advances in Neural Information Processing Systems, pages 6414--6423, 2017. Google ScholarDigital Library
B. Salimi, H. Parikh, M. Kayali, L. Getoor, S. Roy, and D. Suciu. Causal relational learning. In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, June 14-19, 2020, pages 241--256. ACM, 2020. Google ScholarDigital Library
B. Salimi, L. Rodriguez, B. Howe, and D. Suciu. Interventional fairness: Causal database repair for algorithmic fairness. In P. A. Boncz, S. Manegold, A. Ailamaki, A. Deshpande, and T. Kraska, editors, Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, pages 793--810. ACM, 2019. Google ScholarDigital Library
S. Schelter, F. Biessmann, T. Januschowski, D. Salinas, S. Seufert, G. Szarvas, M. Vartak, S. Madden, H. Miao, A. Deshpande, et al. On challenges in machine learning model management. IEEE Data Eng. Bull., 41(4):5--15, 2018.Google Scholar
S. Schelter, Y. He, J. Khilnani, and J. Stoyanovich. Fairprep: Promoting data to a first-class citizen in studies on fairness-enhancing interventions. In A. Bonifati, Y. Zhou, M. A. V. Salles, A. Böhm, D. Olteanu, G. H. L. Fletcher, A. Khan, and B. Yang, editors, Proceedings of the 23nd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020, pages 395--398. OpenProceedings.org, 2020.Google Scholar
D. Sculley, G. Holt, D. Golovin, E. Davydov, T. Phillips, D. Ebner, V. Chaudhary, M. Young, J.-F. Crespo, and D. Dennison. Hidden technical debt in machine learning systems. In Advances in neural information processing systems, pages 2503--2511, 2015. Google ScholarDigital Library
P. Stone, R. Brooks, E. Brynjolfsson, R. Calo, O. Etzioni, G. Hager, J. Hirschberg, S. Kalyanakrishnan, E. Kamar, S. Kraus, K. Leyton-Brown, D. Parkes, W. Press, A. A. Saxenian, J. Shah, M. Tambe, and A. Teller. One hundred year study on artificial intelligence: Report of the 2015-2016 study panel. Stanford University, 2016.Google Scholar
J. Stoyanovich, S. Amer-Yahia, and T. Milo. Making interval-based clustering rank-aware. In A. Ailamaki, S. Amer-Yahia, J. M. Patel, T. Risch, P. Senellart, and J. Stoyanovich, editors, EDBT 2011, 14th International Conference on Extending Database Technology, Uppsala, Sweden, March 21-24, 2011, Proceedings, pages 437--448. ACM, 2011. Google ScholarDigital Library
J. Stoyanovich and B. Howe. Nutritional labels for data and models. IEEE Data Eng. Bull., 42(3):13--23, 2019.Google Scholar
J. Stoyanovich, B. Howe, S. Abiteboul, G. Miklau, A. Sahuguet, and G. Weikum. Fides: Towards a platform for responsible data science. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27-29, 2017, pages 26:1--26:6. ACM, 2017. Google ScholarDigital Library
J. Stoyanovich and A. Lewis. Teaching responsible data science: Charting new pedagogical territory. CoRR, abs/1912.10564, 2019.Google Scholar
J. Stoyanovich, J. J. Van Bavel, and T. V. West. The imperative of interpretable machines. Nature Machine Intelligence, 2:197--199, 2020.Google ScholarCross Ref
J. Stoyanovich, K. Yang, and H. V. Jagadish. Online set selection with fairness and diversity constraints. In M. H. Böhlen, R. Pichler, N. May, E. Rahm, S. Wu, and K. Hose, editors, Proceedings of the 21th International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018, pages 241--252. OpenProceedings.org, 2018.Google Scholar
J. Surowiecki. The wisdom of crowds. Anchor, 2005. Google ScholarDigital Library
The European Union. Regulation (EU) 2016/679: General Data Protection Regulation (GDPR). 2016.Google Scholar
The New York City Council. A local law to amend the administrative code of the city of new york, in relation to the sale of automated employment decision tools. https://legistar.council.nyc.gov/LegislationDetail.aspx?ID=4344524&GUID=B051915D-A9AC-451E-81F8-6596032FA3F9, 2020.Google Scholar
J. Yang. The future of work: Protecting workres' civil rights in the digital age. Testimony before the Education and Labor Committee, United States House of Representatives, 2020.Google Scholar
K. Yang, V. Gkatzelis, and J. Stoyanovich. Balanced ranking with diversity constraints. In S. Kraus, editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 6035--6042. ijcai.org, 2019. Google ScholarCross Ref
K. Yang, J. R. Loftus, and J. Stoyanovich. Causal intersectionality for fair ranking. CoRR, abs/2006.08688, 2020.Google Scholar
K. Yang and J. Stoyanovich. Measuring fairness in ranked outputs. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management, Chicago, IL, USA, June 27-29, 2017, pages 22:1--22:6. ACM, 2017. Google ScholarDigital Library
K. Yang, J. Stoyanovich, A. Asudeh, B. Howe, H. V. Jagadish, and G. Miklau. A nutritional label for rankings. In G. Das, C. M. Jermaine, and P. A. Bernstein, editors, Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, pages 1773--1776. ACM, 2018. Google ScholarDigital Library
B. Zhang and A. Dafoe. Artificial intelligence: American attitudes and trends. Center for the Governance of AI, Future of Humanity Institute, University of Oxford, 2019.Google Scholar
J. Zhang and E. Bareinboim. Fairness in decision-making - the causal explanation formula. In S. A. McIlraith and K. Q. Weinberger, editors, Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018, pages 2037--2045. AAAI Press, 2018.Google Scholar

Recommendations

Responsible data management

Perspectives on the role and responsibility of the data-management research community in designing, developing, using, and overseeing automated decision systems.

Read More
Ontology Based Data Management: A Study in a Brazilian Federal Agency
Electronic Government
Abstract
The Ministry of Transparency and Comptroller General is the agency of the Federal Government in charge of assisting the President regarding the treasury and public assets and the government’s transparency policies. The Agency takes care of active ...
Read More
The Issue of Small Municipalities - The Possibility of Applying the Principles of Socially Responsible Management
Computational Science and Its Applications – ICCSA 2023
Abstract
The problem of efficient and socially responsible management of small municipalities is a discussed topic in many countries of the world. Applying the concept of social responsibility in the management of self-government strengthens the image, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2020
Published in pvldb Volume 13, Issue 12
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 30
  Total Citations
  View Citations
- 499
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Responsible data management

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Responsible data management

Ontology Based Data Management: A Study in a Brazilian Federal Agency

The Issue of Small Municipalities - The Possibility of Applying the Principles of Socially Responsible Management

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Responsible data management

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Responsible data management

Ontology Based Data Management: A Study in a Brazilian Federal Agency

The Issue of Small Municipalities - The Possibility of Applying the Principles of Socially Responsible Management

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media