skip to main content
10.1145/2339530.2339571acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Learning from crowds in the presence of schools of thought

Published:12 August 2012Publication History

ABSTRACT

Crowdsourcing has recently become popular among machine learning researchers and social scientists as an effective way to collect large-scale experimental data from distributed workers. To extract useful information from the cheap but potentially unreliable answers to tasks, a key problem is to identify reliable workers as well as unambiguous tasks. Although for objective tasks that have one correct answer per task, previous works can estimate worker reliability and task clarity based on the single gold standard assumption, for tasks that are subjective and accept multiple reasonable answers that workers may be grouped into, a phenomenon called schools of thought, existing models cannot be trivially applied. In this work, we present a statistical model to estimate worker reliability and task clarity without resorting to the single gold standard assumption. This is instantiated by explicitly characterizing the grouping behavior to form schools of thought with a rank-1 factorization of a worker-task groupsize matrix. Instead of performing an intermediate inference step, which can be expensive and unstable, we present an algorithm to analytically compute the sizes of different groups. We perform extensive empirical studies on real data collected from Amazon Mechanical Turk. Our method discovers the schools of thought, shows reasonable estimation of worker reliability and task clarity, and is robust to hyperparameter changes. Furthermore, our estimated worker reliability can be used to improve the gold standard prediction for objective tasks.

Skip Supplemental Material Section

Supplemental Material

306_m_talk_6.mp4

mp4

144.8 MB

References

  1. J. Abernethy and R.M. Frongillo. A collaborative mechanism for crowdsourcing prediction problems. In NIPS, 2011.Google ScholarGoogle Scholar
  2. E. Adar. Why I hate Mechanical Turk research (and workshops). In CHI, 2011.Google ScholarGoogle Scholar
  3. C.E. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Annals of statistics, 2(6):1152--1174, 1974.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. Blackwell and J.B. MacQueen. Ferguson distributions via Pólya urn schemes. Annals of statistics, 1(2):353--355, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  5. AP Dawid and AM Skene. Maximum likelihood estimation of observer error-rates using the EM algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):20--28, 1979.Google ScholarGoogle Scholar
  6. S.J. Gershman and D. Blei. A tutorial on bayesian nonparametric models. Journal of Mathematical Psychology (in press), 2011.Google ScholarGoogle Scholar
  7. T.L. Griffiths and M. Steyvers. Finding scientific topics. National Academy of Sciences of the United States of America, 101(Suppl 1):5228, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  8. P.D. Hoff, A.E. Raftery, and M.S. Handcock. Latent space approaches to social network analysis. Journal of the American Statistical Association, 97(460):1090--1098, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  9. P.G. Ipeirotis, F. Provost, and J. Wang. Quality management on Amazon Mechanical Turk. In ACM SIGKDD workshop on human computation, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D.R. Karger, S. Oh, and D. Shah. Iterative learning for reliable crowdsourcing systems. In NIPS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Kittur, E. Chi, and B. Suh. Crowdsourcing user studies with Mechanical Turk. In CHI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R.M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Journal of computational and graphical statistics, 9(2):249--265, 2000.Google ScholarGoogle Scholar
  13. A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In NIPS, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Pinar, J. Carbonell, and J. Schneider. Efficiently learning the accuracy of labeling sources for selective sampling. In SIGKDD, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V.C. Raykar, S. Yu, L.H. Zhao, G.H. Valadez, C. Florin, L. Bogoni, and L. Moy. Learning from crowds. Journal of Machine Learning Research, 99:1297--1322, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Ross, L. Irani, M. Silberman, A. Zaldivar, and B. Tomlinson. Who are the crowdworkers? Shifting demographics in Mechanical Turk. In CHI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D.M. Russell, M.J. Stefik, P. Pirolli, and S.K. Card. The cost structure of sensemaking. In CHI, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Smyth, U. Fayyad, M. Burl, P. Perona, and P. Baldi. Inferring ground truth from subjective labelling of venus images. In NIPS, 1995.Google ScholarGoogle Scholar
  19. R. Snow, B. O'Connor, D. Jurafsky, and A. Ng. Cheap and fast -- but is it good? evaluating non-expert annotations for natural language tasks. In EMNLP, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y.W. Teh, D. Newman, and M. Welling. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In NIPS, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. P. Welinder, S. Branson, S. Belongie, and P. Perona. The multidimensional wisdom of crowds. In NIPS, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Yan, R. Rosales, G. Fung, M. Schmidt, G. Hermosillo, L. Bogoni, L. Moy, J.G. Dy, and PA Malvern. Modeling annotator expertise: Learning when everybody knows a bit of something. In AISTATS, 2010.Google ScholarGoogle Scholar
  24. J. Zhu, N. Chen, and E. Xing. Infinite Latent SVM for Classification and Multi-task Learning. In NIPS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2012
    1616 pages
    ISBN:9781450314626
    DOI:10.1145/2339530

    Copyright © 2012 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 12 August 2012

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,133of8,635submissions,13%

    Upcoming Conference

    KDD '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader