research-article

Fairly evaluating and scoring items in a data set

Authors:
Abolfazl Asudeh

University of Illinois at Chicago

University of Illinois at Chicago
View Profile

,
H. V. Jagadish

University of Michigan

University of Michigan
View Profile

Proceedings of the VLDB Endowment Volume 13 Issue 12pp 3445–3448https://doi.org/10.14778/3415478.3415566

Published:01 August 2020Publication History

Proceedings of the VLDB Endowment

Abstract

We frequently compute a score for each item in a data set, sometimes for its intrinsic value, but more often as a step towards classification, ranking, and so forth. The importance of computing this score fairly cannot be overstated. In this tutorial, we will develop a framework for how to think about this task, and then present techniques for responsible scoring and link these to traditional data management challenges.

References

J. Mason. The secret trust scores companies use to judge us all. The Wall Street Journal, April 6, 2019.Google Scholar
The breast cancer risk assessment tool. bcrisktool.cancer.gov, (accessed March 2020).Google Scholar
World cup 2018 seeding: Pots, procedure & all you need to know ahead of the draw. GOAL.COM, 12/1/2017.Google Scholar
How u.s. news calculated the 2020 best graduate schools rankings. bit.ly/39HjnGQ, 3/11/2019.Google Scholar
P. Mozur, R. Zhong, and A. Krolik. In coronavirus fight, china gives citizens a color code, with red flags. The New York Times, March 1, 2020.Google Scholar
A. Olteanu, C. Castillo, F. Diaz, and E. Kiciman. Social data: Biases, methodological pitfalls, and ethical boundaries. Frontiers in Big Data, 2:13, 2019.Google ScholarCross Ref
S. Barocas and A. D. Selbst. Big data's disparate impact. Calif. L. Rev., 104:671, 2016.Google Scholar
R. Buddin. Gender gaps in high school gpa and act scores. ACT Research & Policy, 2014.Google Scholar
C. Roth. Women job applicants punished for higher grades, study finds. WOSU Public Media, Mar 26, 2018.Google Scholar
A. D. Selbst. Disparate impact in big data policing. Ga. L. Rev., 52:109, 2017.Google Scholar
J. L. Santos, N. L. Cabrera, and K. J. Fosnacht. Is "race-neutral" really race-neutral?: Disparate impact towards underrepresented minorities in post-209 uc system admissions. J. High. Educ, 81(6):605--631, 2010.Google ScholarCross Ref
M. F. Vidal and J. Menajovsky. Algorithm bias in credit scoring: What's inside the black box? CGAP blog, 2019.Google Scholar
P. T. Kim. Data-driven discrimination at work. Wm. & Mary L. Rev., 58:857, 2016.Google Scholar
I. Žliobaitė. Measuring discrimination in algorithmic decision making. DATA MIN KNOWL DISC, 31(4):1060--1089, 2017. Google ScholarDigital Library
S. Barocas, M. Hardt, and A. Narayanan. Fairness and machine learning: Limitations and opportunities. fairmlbook.org, 2019.Google Scholar
S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. Algorithmic decision making and the cost of fairness. In SIGKDD. ACM, 2017. Google ScholarDigital Library
A. K. Menon and R. C. Williamson. The cost of fairness in binary classification. In FAT^*, 2018.Google Scholar
J. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. CoRR, abs/1609.05807, 2016.Google Scholar
S. A. Friedler, C. Scheidegger, and S. Venkatasubramanian. On the (im) possibility of fairness. CoRR, abs/1609.07236, 2016.Google Scholar
J. Neyman and E. S. Pearson. Contributions to the theory of testing statistical hypotheses. Statistical Research Memoirs, 1936.Google Scholar
A. Asudeh, Z. Jin, and H. Jagadish. Assessing and remedying coverage for a given dataset. In ICDE, 2019.Google ScholarCross Ref
R. A. Baeza-Yates. Big data or right data? In AMW, 2013.Google Scholar
A. Narayanan. Translation tutorial: 21 fairness definitions and their politics. In FAT^*, 2018.Google Scholar
C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In ITCS, pages 214--226, 2012. Google ScholarDigital Library
M. Kearns, S. Neel, A. Roth, and Z. S. Wu. An empirical study of rich subgroup fairness for machine learning. In FAT^*, pages 100--109, 2019. Google ScholarDigital Library
M. Kearns, S. Neel, A. Roth, and Z. S. Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In ICML, pages 2564--2572, 2018.Google Scholar
M. Drosou, H. Jagadish, E. Pitoura, and J. Stoyanovich. Diversity in big data: A review. Big data, 5(2):73--84, 2017.Google ScholarCross Ref
A. Asudeh, H. Jagadish, G. Miklau, and J. Stoyanovich. On obtaining stable rankings. PVLDB, 12(3):237--250, 2018. Google ScholarDigital Library
S. A. Friedler, C. Scheidegger, S. Venkatasubramanian, S. Choudhary, E. P. Hamilton, and D. Roth. A comparative study of fairness-enhancing interventions in machine learning. In FAT^*, 2019. Google ScholarDigital Library
J. Steinhardt. Robust Learning: Information Theory and Algorithms. PhD thesis, Stanford University, 2018.Google Scholar
A. Asudeh, H. Jagadish, and J. Stoyanovich. Towards responsible data-driven decision making in score-based systems. Data Engineering, 42(3):76--87, 2019.Google Scholar
A. Narayanan. How to recognize ai snake oil www.cs.princeton.edu/~arvindn/talks. Technical report, MIT-STS-AI-snakeoil.pdf, 2019.Google Scholar
F. Kamiran and T. Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1--33, 2012. Google ScholarDigital Library
M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. In SIGKDD, 2015. Google ScholarDigital Library
F. Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney. Optimized pre-processing for discrimination prevention. In NIPS, pages 3992--4001, 2017. Google ScholarDigital Library
B. Salimi, L. Rodriguez, B. Howe, and D. Suciu. Interventional fairness: Causal database repair for algorithmic fairness. In SIGMOD, pages 793--810, 2019. Google ScholarDigital Library
Z. Jin, M. Xu, C. Sun, A. Asudeh, and H. Jagadish. MithraCoverage: A system for investigating population bias for intersectional fairness. SIGMOD, 2020. Google ScholarDigital Library
Y. Lin, Y. Guan, A. Asudeh, and J. H. V. Identifying insufficient data coverage in databases with multiple relations. PVLDB, 13(11):2229--2242, 2020. Google ScholarDigital Library
C. Sun, A. Asudeh, H. Jagadish, B. Howe, and J. Stoyanovich. MithraLabel: Flexible dataset nutritional labels for responsible data science. In CIKM, 2019. Google ScholarDigital Library
T. Kamishima, S. Akaho, H. Asoh, and J. Sakuma. Fairness-aware classifier with prejudice remover regularizer. In ECML PKDD, pages 35--50. Springer, 2012.Google Scholar
R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In ICML, 2013. Google ScholarDigital Library
M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classification. CoRR, abs/1507.05259, 2015.Google Scholar
A. Asudeh, H. Jagadish, J. Stoyanovich, and G. Das. Designing fair ranking schemes. In SIGMOD, 2019. Google ScholarDigital Library
A. Asudeh and H. Jagadish. Responsible scoring mechanisms through function sampling. CoRR, abs/:1911.10073, 2019.Google Scholar
Y. Guan, A. Asudeh, P. Mayuram, H. Jagadish, J. Stoyanovich, G. Miklau, and G. Das. MithraRanking: A system for responsible ranking design. In SIGMOD, 2019. Google ScholarDigital Library
A. Asudeh, H. Jagadish, Y. Wu, and C. Yu. On detecting cherry-picked trendlines. PVLDB, 13(6):939--952, 2020.Google ScholarDigital Library

Recommendations

Dividing connected chores fairly

In this paper we consider the fair division of chores (tasks that need to be performed by agents, with negative utility for them), and study the loss in social welfare due to fairness. Previous work has been done on this so-called price of fairness, ...
Read More
Fairly Allocating (Contiguous) Dynamic Indivisible Items with Few Adjustments
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

We study the problem of dynamically allocating T indivisible items to n agents with the restriction that the allocation is fair all the time. Due to the negative results to achieve fairness when allocations are irrevocable, we allow adjustments to make ...
Read More
Generating Top-N Items Recommendation Set Using Collaborative, Content Based Filtering and Rating Variance
Abstract
The main purpose of any recommendation system is to recommend items of users’ interest. Mostly content and collaborative filtering are widely used recommendation systems. Matrix factorization technique is also used by many recommendation systems. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Proceedings of the VLDB Endowment Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Editors:
Magdalena Balazinska
University of Washington
,
Xiaofang Zhou
University of Queensland, Australia
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 August 2020
Published in pvldb Volume 13, Issue 12
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 170
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Fairly evaluating and scoring items in a data set

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Dividing connected chores fairly

Fairly Allocating (Contiguous) Dynamic Indivisible Items with Few Adjustments

Generating Top-N Items Recommendation Set Using Collaborative, Content Based Filtering and Rating Variance

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Fairly evaluating and scoring items in a data set

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Dividing connected chores fairly

Fairly Allocating (Contiguous) Dynamic Indivisible Items with Few Adjustments

Generating Top-N Items Recommendation Set Using Collaborative, Content Based Filtering and Rating Variance

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media