skip to main content
10.1145/2396761.2398697acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy

Published:29 October 2012Publication History

ABSTRACT

Information retrieval systems require human contributed relevance labels for their training and evaluation. Increasingly such labels are collected under the anonymous, uncontrolled conditions of crowdsourcing, leading to varied output quality. While a range of quality assurance and control techniques have now been developed to reduce noise during or after task completion, little is known about the workers themselves and possible relationships between workers' characteristics and the quality of their work. In this paper, we ask how do the relatively well or poorly-performing crowds, working under specific task conditions, actually look like in terms of worker characteristics, such as demographics or personality traits. Our findings show that the face of a crowd is in fact indicative of the quality of their work.

References

  1. O. Alonso, D. E. Rose, and B. Stewart. Crowdsourcing for relevance evaluation. phSIGIR Forum, 42: 9--15, November 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Bailey, N. Craswell, I. Soboroff, P. Thomas, A. P. de Vries, and E. Yilmaz. Relevance assessment: are judges exchangeable and does it matter. In phProc. of SIGIR 2008, pages 667--674, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Chandler and A. Kapelner. Breaking monotony with meaning: Motivation in crowdsourcing markets. Technical report, Working paper, University of Chicago, 2010.Google ScholarGoogle Scholar
  4. J. S. Downs, M. B. Holbrook, S. Sheng, and L. F. Cranor. Are your participants gaming the system?: screening mechanical turk workers. In phProc. of CHI 2010, pages 2399--2402. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Grady and M. Lease. Crowdsourcing document relevance assessment with mechanical turk. In phProc. of CSLDAMT '10, pages 172--179, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. G. Ipeirotis. Analyzing the amazon mechanical turk marketplace. phXRDS, 17: 16--21, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In phProc. of HCOMP '10, pages 64--67, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. O. P. John, L. P. Naumann, and C. J. Soto. Paradigm shift to the integrative big-five trait taxonomy: History, measurement, and conceptual issues. In phHandbook of personality: Theory and research, chapter 4, pages 114--212. Guilford Press, New York NY, 2008.Google ScholarGoogle Scholar
  9. G. Kazai, M. Koolen, A. Doucet, and M. Landoni. Overview of the INEX 2010 book track: At the mercy of crowdsourcing. In phINEX 2010 Workshop Pre-proceedings, pages 89--99, 2010.Google ScholarGoogle Scholar
  10. G. Kazai, J. Kamps, M. Koolen, and N. Milic-Frayling. Crowdsourcing for book search evaluation: Impact of quality on comparative system ranking. In phProc. of SIGIR 2011. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In phProc. of CHI 2008, pages 453--456, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Le, A. Edmonds, V. Hester, and L. Biewald. Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In phSIGIR 2010 Workshop on Crowdsourcing for Search Evaluation, pages 21--26, 2010.Google ScholarGoogle Scholar
  13. W. Mason and D. J. Watts. Financial incentives and the "performance of crowds". In phProc. of HCOMP '09, pages 77--85, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Rammstedt and O. P. John. Measuring personality in one minute or less: A 10-item short version of the Big Five Inventory in English and German. phJournal of Research in Personality, 41: 203--212, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  15. J. Ross, L. Irani, M. S. Silberman, A. Zaldivar, and B. Tomlinson. Who are the crowdworkers?: shifting demographics in mechanical turk. In phProc. of CHI 2010, Extended Abstracts Volume, pages 2863--2872. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Shaw, J. Horton, and D. Chen. Designing incentives for inexpert human raters. In phProc. of CSCW '11, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In phProc. of KDD '08, pages 614--622, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Snow, B. O'Connor, D. Jurafsky, and A. Y. Ng. Cheap and fast--but is it good?: evaluating non-expert annotations for natural language tasks. In phProc. of the EMNLP '08, pages 254--263, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. M. Voorhees and D. K. Harman, editors. phTREC: Experimentation and Evaluation in Information Retrieval. MIT Press, 2005.Google ScholarGoogle Scholar
  20. J. Vuurens, A. de Vries, and C. Eickhoff. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In phProc. of the ACM SIGIR Workshop on Crowdsourcing for Information Retrieval, 2011. ACM.Google ScholarGoogle Scholar
  21. D. Zhu and B. Carterette. An analysis of assessor behavior in crowdsourced preference judgments. In phSIGIR 2010 Workshop on Crowdsourcing for Search Evaluation, pages 17--20, 2010.Google ScholarGoogle Scholar

Index Terms

  1. The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
      October 2012
      2840 pages
      ISBN:9781450311564
      DOI:10.1145/2396761

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 29 October 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader