skip to main content
10.1145/3308558.3313507acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Debiasing Vandalism Detection Models at Wikidata

Published:13 May 2019Publication History

ABSTRACT

Crowdsourced knowledge bases like Wikidata suffer from low-quality edits and vandalism, employing machine learning-based approaches to detect both kinds of damage. We reveal that state-of-the-art detection approaches discriminate anonymous and new users: benign edits from these users receive much higher vandalism scores than benign edits from older ones, causing newcomers to abandon the project prematurely. We address this problem for the first time by analyzing and measuring the sources of bias, and by developing a new vandalism detection model that avoids them. Our model FAIR-S reduces the bias ratio of the state-of-the-art vandalism detector WDVD from 310.7 to only 11.9 while maintaining high predictive performance at 0.963 ROC and 0.316 PR.

References

  1. B. T. Adler, L. de Alfaro, S. M. Mola-Velasco, P. Rosso, and A. G. West. 2011. Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features. In CICLing. Springer, 277-288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baeza-Yates. 2018. Bias on the Web. Commun. ACM 61, 6 (2018), 54-61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Barocas and A. D. Selbst. 2016. Big data's disparate impact. Cal. L. Rev. 104(2016), 671.Google ScholarGoogle Scholar
  4. R. Berk, H. Heidari, S. Jabbari, M. Kearns, and A. Roth. 2018. Fairness in Criminal Justice Risk Assessments: The State of the Art. Sociological Methods & Research(2018).Google ScholarGoogle Scholar
  5. T. Bolukbasi, K. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. In NIPS. 4349-4357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Bordes, N. Usunier, A. García-Durán, J. Weston, and O. Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In NIPS. 2787-2795. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Calders and S. Verwer. 2010. Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21, 2 (2010), 277-292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Chouldechova. 2017. Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments. Big Data 5, 2 (2017), 153-163.Google ScholarGoogle ScholarCross RefCross Ref
  9. G. L. Ciampaglia, P. Shiralkar, L. M. Rocha, J. Bollen, F. Menczer, and A. Flammini. 2015. Computational Fact Checking from Knowledge Networks. PLOS ONE 10, 6 (2015), 1-13.Google ScholarGoogle ScholarCross RefCross Ref
  10. S. Corbett-Davies, E. Pierson, A. Feller, S. Goel, and A. Huq. 2017. Algorithmic Decision Making and the Cost of Fairness. In KDD. ACM, 797-806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Darari, S. Razniewski, R. E. Prasojo, and W. Nutt. 2016. Enabling Fine-Grained RDF Data Completeness Assessment. In ICWE. Springer, 170-187.Google ScholarGoogle Scholar
  12. J. Davis and M. Goadrich. 2006. The Relationship Between Precision-Recall and ROC Curves. In ICML. ACM, 233-240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Dixon, J. Li, J. Sorensen, N. Thain, and L. Vasserman. 2018. Measuring and Mitigating Unintended Bias in Text Classification. In AIES. ACM, 67-73. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. 2014. Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion. In KDD. ACM, 601-610. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. L. Dong, E. Gabrilovich, K. Murphy, V. Dang, W. Horn, C. Lugaresi, S. Sun, and W. Zhang. 2016. Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources. IEEE Data Eng. Bull. 39, 2 (2016), 106-117.Google ScholarGoogle Scholar
  16. C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. S. Zemel. 2012. Fairness Through Awareness. In ITCS. ACM, 214-226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. 2015. Certifying and Removing Disparate Impact. In KDD. ACM, 259-268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. Galárraga, S. Razniewski, A. Amarilli, and F. M. Suchanek. 2017. Predicting Completeness in Knowledge Bases. In WSDM. ACM, 375-383. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Gangemi, A. G. Nuzzolese, V. Presutti, F. Draicchio, A. Musetti, and P. Ciancarini. 2012. Automatic Typing of DBpedia Entities. In ISWC. 65-81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Gardner and T. M. Mitchell. 2015. Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction. In EMNLP. ACL, 1488-1498.Google ScholarGoogle Scholar
  21. A. Halfaker, R. S. Geiger, J. T. Morgan, and J. Riedl. 2013. The Rise and Decline of an Open Collaboration System: How Wikipedia's Reaction to Popularity is Causing Its Decline. American Behavioral Scientist 57, 5 (2013), 664-688.Google ScholarGoogle ScholarCross RefCross Ref
  22. A. Halfaker, A. Kittur, and J. Riedl. 2011. Don't Bite the Newbies: How Reverts Affect the Quantity and Quality of Wikipedia Work. In Int. Sym. Wikis. 163-172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Hardt, E. Price, and N. Srebro. 2016. Equality of Opportunity in Supervised Learning. In NIPS. 3315-3323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Heindorf, M. Potthast, H. Bast, B. Buchhold, and E. Haussmann. 2017. WSDM Cup 2017: Vandalism Detection and Triple Scoring. In WSDM. ACM, 827-828. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Heindorf, M. Potthast, G. Engels, and B. Stein. 2017. Overview of the Wikidata Vandalism Detection Task at the WSDM Cup 2017. In WSDM Cup 2017 Notebook Papers.Google ScholarGoogle Scholar
  26. S. Heindorf, M. Potthast, B. Stein, and G. Engels. 2015. Towards Vandalism Detection in Knowledge Bases: Corpus Construction and Analysis. In SIGIR. ACM, 831-834. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Heindorf, M. Potthast, B. Stein, and G. Engels. 2016. Vandalism Detection in Wikidata. In CIKM. ACM, 327-336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Jain and P. Pantel. 2010. FactRank: Random Walks on a Web of Facts. In COLING. Tsinghua University Press, 501-509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Javanmardi, D. W. McDonald, and C. V. Lopes. 2011. Vandalism Detection in Wikipedia: A High-Performing, Feature-Rich Model and its Reduction Through Lasso. In Int. Sym. Wikis. ACM, 82-90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Kamiran, T. Calders, and M. Pechenizkiy. 2010. Discrimination Aware Decision Tree Learning. In ICDM. IEEE Computer Society, 869-874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. J. Kiesel, M. Potthast, M. Hagen, and B. Stein. 2017. Spatio-Temporal Analysis of Reverted Wikipedia Edits. In ICWSM. AAAI Press, 122-131.Google ScholarGoogle Scholar
  32. N. Kilbertus, M. Rojas-Carulla, G. Parascandolo, M. Hardt, D. Janzing, and B. Schölkopf. 2017. Avoiding Discrimination through Causal Reasoning. In NIPS. 656-666. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. M. Kleinberg, S. Mullainathan, and M. Raghavan. 2017. Inherent Trade-Offs in the Fair Determination of Risk Scores. In ITCS, Vol. 67. 43:1-43:23.Google ScholarGoogle Scholar
  34. M. J. Kusner, J. R. Loftus, C. Russell, and R. Silva. 2017. Counterfactual Fairness. In NIPS. 4069-4079. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Lajus and F. M. Suchanek. 2018. Are All People Married?: Determining Obligatory Attributes in Knowledge Bases. In WWW. ACM, 1115-1124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. N. Lao, T. M. Mitchell, and W. W. Cohen. 2011. Random Walk Inference and Learning in A Large Scale Knowledge Base. In EMNLP. ACL, 529-539. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Lehmann, D. Gerber, M. Morsey, and A. N. Ngomo. 2012. DeFacto - Deep Fact Validation. In ISWC. Springer, 312-327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In AAAI. AAAI Press, 2181-2187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. A. Melo, H. Paulheim, and J. Völker. 2016. Type Prediction in RDF Knowledge Bases Using Hierarchical Multilabel Classification. In WIMS. ACM, 14:1-14:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. Minkov, W. W. Cohen, and A. Y. Ng. 2006. Contextual Search and Name Disambiguation in Email Using Graphs. In SIGIR. ACM, 27-34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Nickel, K. Murphy, V. Tresp, and E. Gabrilovich. 2016. A Review of Relational Machine Learning for Knowledge Graphs. Proc. IEEE 104, 1 (2016), 11-33.Google ScholarGoogle ScholarCross RefCross Ref
  42. M. Nickel, V. Tresp, and H. Kriegel. 2012. Factorizing YAGO: Scalable Machine Learning for Linked Data. In WWW. ACM, 271-280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. C. Nishioka and A. Scherp. 2018. Analysing the Evolution of Knowledge Graphs for the Purpose of Change Verification. In ICSC. IEEE Computer Society, 25-32.Google ScholarGoogle Scholar
  44. H. Paulheim and C. Bizer. 2013. Type Inference on Noisy RDF Data. In ISWC. Springer, 510-525. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. Pedreschi, S. Ruggieri, and F. Turini. 2009. Measuring Discrimination in Socially-Sensitive Decision Records. In SDM. SIAM, 581-592.Google ScholarGoogle Scholar
  46. M. Potthast, B. Stein, and R. Gerling. 2008. Automatic Vandalism Detection in Wikipedia. In ECIR. Springer, 663-668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. E. Raff, J. Sylvester, and S. Mills. 2018. Fair Forests: Regularized Tree Induction to Minimize Model Bias. In AIES. ACM, 243-250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. A. Romei and S. Ruggieri. 2014. A multidisciplinary survey on discrimination analysis. Knowledge Eng. Review 29, 5 (2014), 582-638.Google ScholarGoogle ScholarCross RefCross Ref
  49. A. Sarabadani, A. Halfaker, and D. Taraborelli. 2017. Building Automated Vandalism Detection Tools for Wikidata. In WWW (Companion Volume). ACM, 1647-1654. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. J. Schneider, B. S. Gelley, and A. Halfaker. 2014. Accept, decline, postpone: How newcomer productivity is reduced in English Wikipedia by pre-publication review. In OpenSym. ACM, 26:1-26:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. B. Shi and T. Weninger. 2016. Discriminative predicate path mining for fact checking in knowledge graphs. Knowl.-Based Syst. 104(2016), 123-133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. H. Tan, E. Agichtein, P. Ipeirotis, and E. Gabrilovich. 2014. Trust, but Verify: Predicting Contribution Quality for Knowledge Base Construction and Curation. In WSDM. ACM, 553-562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. Torralba and A. A. Efros. 2011. Unbiased Look at Dataset Bias. In CVPR. IEEE Computer Society, 1521-1528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. K. Tran and P. Christen. 2013. Cross Language Prediction of Vandalism on Wikipedia Using Article Views and Revisions. In PAKDD. Springer, 268-279.Google ScholarGoogle Scholar
  55. Q. Wang, Z. Mao, B. Wang, and L. Guo. 2017. Knowledge Graph Embedding: A Survey of Approaches and Applications. IEEE Trans. Knowl. Data Eng. 29, 12 (2017), 2724-2743.Google ScholarGoogle ScholarCross RefCross Ref
  56. W. Y. Wang and K. McKeown. 2010. ”Got You!”: Automatic Vandalism Detection in Wikipedia with Web-based Shallow Syntactic-Semantic Modeling. In COLING. Tsinghua University Press, 1146-1154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. X. Wang, M. Bendersky, D. Metzler, and M. Najork. 2016. Learning to Rank with Selection Bias in Personal Search. In SIGIR. ACM, 115-124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Z. Wang, J. Zhang, J. Feng, and Z. Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In AAAI. AAAI Press, 1112-1119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. C. Wilkie and L. Azzopardi. 2017. Algorithmic Bias: Do Good Systems Make Relevant Documents More Retrievable?. In CIKM. ACM, 2375-2378. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Y. Wu, P. K. Agarwal, C. Li, J. Yang, and C. Yu. 2014. Toward Computational Fact-Checking. PVLDB 7, 7 (2014), 589-600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. K. Yang and J. Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In SSDBM. ACM, 22:1-22:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. M. B. Zafar, I. Valera, M. Gomez-Rodriguez, and K. P. Gummadi. 2017. Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment. In WWW. ACM, 1171-1180. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. R. S. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. 2013. Learning Fair Representations. In ICML (3)(JMLR Workshop and Conference Proceedings), Vol. 28. JMLR.org, 325-333. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. L. Zhang and X. Wu. 2017. Anti-discrimination learning: a causal modeling-based framework. I. J. Data Science and Analytics 4, 1 (2017), 1-16.Google ScholarGoogle ScholarCross RefCross Ref
  65. J. Zhao, T. Wang, M. Yatskar, V. Ordonez, and K. Chang. 2017. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints. In EMNLP. ACL, 2979-2989.Google ScholarGoogle Scholar
  66. I. Zliobaite. 2015. On the relation between accuracy and fairness in binary classification. CoRR abs/1505.05723(2015).Google ScholarGoogle Scholar
  67. I. Zliobaite. 2017. Measuring discrimination in algorithmic decision making. Data Min. Knowl. Discov. 31, 4 (2017), 1060-1089. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    WWW '19: The World Wide Web Conference
    May 2019
    3620 pages
    ISBN:9781450366748
    DOI:10.1145/3308558

    Copyright © 2019 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 May 2019

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    Overall Acceptance Rate1,899of8,196submissions,23%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format