skip to main content
10.1145/3278721.3278729acmconferencesArticle/Chapter ViewAbstractPublication PagesaiesConference Proceedingsconference-collections
research-article
Open Access

Measuring and Mitigating Unintended Bias in Text Classification

Published:27 December 2018Publication History

ABSTRACT

We introduce and illustrate a new approach to measuring and mitigating unintended bias in machine learning models. Our definition of unintended bias is parameterized by a test set and a subset of input features. We illustrate how this can be used to evaluate text classifiers using a synthetic test set and a public corpus of comments annotated for toxicity from Wikipedia Talk pages. We also demonstrate how imbalances in training data can lead to unintended bias in the resulting models, and therefore potentially unfair applications. We use a set of common demographic identity terms as the subset of input features on which we measure bias. This technique permits analysis in the common scenario where demographic information on authors and readers is unavailable, so that bias mitigation must focus on the content of the text itself. The mitigation method we introduce is an unsupervised approach based on balancing the training dataset. We demonstrate that this approach reduces the unintended bias without compromising overall model quality.

References

  1. Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. https://www.tensorflow.org/ Software available from tensorflow.org.Google ScholarGoogle Scholar
  2. Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H. Chi. 2017. Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations. CoRR abs/1707.00075 (2017). http://arxiv.org/abs/1707.00075Google ScholarGoogle Scholar
  3. Su Lin Blodgett and Brendan O'Connor. 2017. Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English. CoRR abs/1707.00061 (2017). http://arxiv.org/abs/1707.00061Google ScholarGoogle Scholar
  4. Tolga Bolukbasi, Kai-Wei Chang, James Y. Zou, Venkatesh Saligrama, and Adam Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. CoRR abs/1607.06520 (2016). http://arxiv.org/abs/1607.06520Google ScholarGoogle Scholar
  5. François Chollet et al. 2015. Keras. https://github.com/fchollet/keras.Google ScholarGoogle Scholar
  6. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern recognition letters 27, 8 (2006), 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael Feldman, Sorelle A. Friedler, John Moeller, Carlos Scheidegger, and Suresh Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15). ACM, New York, NY, USA, 259--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sorelle A. Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian. 2016. On the (im)possibility of fairness. CoRR abs/1609.07236 (2016). http://arxiv.org/ abs/1609.07236Google ScholarGoogle Scholar
  9. Moritz Hardt, Eric Price, and Nathan Srebro. 2016. Equality of Opportunity in Supervised Learning. CoRR abs/1610.02413 (2016). http://arxiv.org/abs/1610.02413Google ScholarGoogle Scholar
  10. Dirk Hovy and Shannon L. Spruit. 2016. The Social Impact of Natural Language Processing. In ACL.Google ScholarGoogle Scholar
  11. Jon M. Kleinberg, Sendhil Mullainathan, and Manish Raghavan. 2016. Inherent Trade-Offs in the Fair Determination of Risk Scores. CoRR abs/1609.05807 (2016). http://arxiv.org/abs/1609.05807Google ScholarGoogle Scholar
  12. R. Tatman. 2017. Gender and Dialect Bias in YouTube's Automatic Captions. Ethics in Natural Language Processing.Google ScholarGoogle Scholar
  13. Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web (WWW '17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 1391--1399. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Measuring and Mitigating Unintended Bias in Text Classification

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                AIES '18: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society
                December 2018
                406 pages
                ISBN:9781450360128
                DOI:10.1145/3278721

                Copyright © 2018 Owner/Author

                This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 27 December 2018

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                AIES '18 Paper Acceptance Rate61of162submissions,38%Overall Acceptance Rate61of162submissions,38%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader