Skip to main content
Erschienen in: Data Mining and Knowledge Discovery 5-6/2014

01.09.2014

Approximating the crowd

verfasst von: Şeyda Ertekin, Cynthia Rudin, Haym Hirsh

Erschienen in: Data Mining and Knowledge Discovery | Ausgabe 5-6/2014

Einloggen

Aktivieren Sie unsere intelligente Suche um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

The problem of “approximating the crowd” is that of estimating the crowd’s majority opinion by querying only a subset of it. Algorithms that approximate the crowd can intelligently stretch a limited budget for a crowdsourcing task. We present an algorithm, “CrowdSense,” that works in an online fashion where items come one at a time. CrowdSense dynamically samples subsets of the crowd based on an exploration/exploitation criterion. The algorithm produces a weighted combination of the subset’s votes that approximates the crowd’s opinion. We then introduce two variations of CrowdSense that make various distributional approximations to handle distinct crowd characteristics. In particular, the first algorithm makes a statistical independence approximation of the labelers for large crowds, whereas the second algorithm finds a lower bound on how often the current subcrowd agrees with the crowd’s majority vote. Our experiments on CrowdSense and several baselines demonstrate that we can reliably approximate the entire crowd’s vote by collecting opinions from a representative subset of the crowd.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Anhänge
Nur mit Berechtigung zugänglich
Literatur
Zurück zum Zitat Bernstein MS, Little G, Miller RC, Hartmann B, Ackerman MS, Karger DR, Crowell D, Panovich K (2010) Soylent: a word processor with a crowd inside. In: Proceedings of the \(23^{rd}\) annual ACM symposium on User interface software and technology (UIST), pp 313–322 Bernstein MS, Little G, Miller RC, Hartmann B, Ackerman MS, Karger DR, Crowell D, Panovich K (2010) Soylent: a word processor with a crowd inside. In: Proceedings of the \(23^{rd}\) annual ACM symposium on User interface software and technology (UIST), pp 313–322
Zurück zum Zitat Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds: enabling realtime crowd-powered interfaces. In: Proceedings of the 24th annual ACM symposium on User interface software and technology (UIST), pp 33–42 Bernstein MS, Brandt J, Miller RC, Karger DR (2011) Crowds in two seconds: enabling realtime crowd-powered interfaces. In: Proceedings of the 24th annual ACM symposium on User interface software and technology (UIST), pp 33–42
Zurück zum Zitat Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, Yeh T (2010) Vizwiz: Nearly real-time answers to visual questions. In: Proceedings of the \(23^{rd}\) Annual ACM Symposium on User Interface Software and Technology, ACM, New York, USA, UIST ’10, pp 333–342 Bigham JP, Jayant C, Ji H, Little G, Miller A, Miller RC, Miller R, Tatarowicz A, White B, White S, Yeh T (2010) Vizwiz: Nearly real-time answers to visual questions. In: Proceedings of the \(23^{rd}\) Annual ACM Symposium on User Interface Software and Technology, ACM, New York, USA, UIST ’10, pp 333–342
Zurück zum Zitat Callison-Burch C, Dredze M (2010) Creating speech and language data with amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp 1–12 Callison-Burch C, Dredze M (2010) Creating speech and language data with amazon’s mechanical turk. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp 1–12
Zurück zum Zitat Dakka W, Ipeirotis PG (2008) Automatic extraction of useful facet hierarchies from text databases. In: Proceedings of the 24\(^{th}\) International Conference on Data Engineering (ICDE), pp 466–475 Dakka W, Ipeirotis PG (2008) Automatic extraction of useful facet hierarchies from text databases. In: Proceedings of the 24\(^{th}\) International Conference on Data Engineering (ICDE), pp 466–475
Zurück zum Zitat Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the em algorithm. Appl. Stat. 28(1):20–28CrossRef Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the em algorithm. Appl. Stat. 28(1):20–28CrossRef
Zurück zum Zitat Dekel O, Shamir O (2009a) Good learners for evil teachers. In: Proceedings of the 26\(^{th}\) Annual International Conference on Machine Learning (ICML) Dekel O, Shamir O (2009a) Good learners for evil teachers. In: Proceedings of the 26\(^{th}\) Annual International Conference on Machine Learning (ICML)
Zurück zum Zitat Dekel O, Shamir O (2009b) Vox populi: Collecting high-quality labels from a crowd. In: Proceedings of the 22\(^{nd}\) Annual Conference on Learning Theory Dekel O, Shamir O (2009b) Vox populi: Collecting high-quality labels from a crowd. In: Proceedings of the 22\(^{nd}\) Annual Conference on Learning Theory
Zurück zum Zitat Dekel O, Gentile C, Sridharan K (2010) Robust selective sampling from single and multiple teachers. In: The 23\(^{rd}\) Conference on Learning Theory (COLT), pp 346–358 Dekel O, Gentile C, Sridharan K (2010) Robust selective sampling from single and multiple teachers. In: The 23\(^{rd}\) Conference on Learning Theory (COLT), pp 346–358
Zurück zum Zitat Donmez P, Carbonell JG, Schneider J (2009) Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining (KDD), pp 259–268 Donmez P, Carbonell JG, Schneider J (2009) Efficiently learning the accuracy of labeling sources for selective sampling. In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining (KDD), pp 259–268
Zurück zum Zitat Downs JS, Holbrook MB, Sheng S, Cranor LF (2010) Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the 28th international conference on Human factors in computing systems, CHI ’10, pp 2399–2402 Downs JS, Holbrook MB, Sheng S, Cranor LF (2010) Are your participants gaming the system?: screening mechanical turk workers. In: Proceedings of the 28th international conference on Human factors in computing systems, CHI ’10, pp 2399–2402
Zurück zum Zitat Ertekin S, Hirsh H, Rudin C (2012) Learning to predict the wisdom of crowds. In: Proceedings of Collective Intelligence, CI’12, Cambridge, Massachusetts Ertekin S, Hirsh H, Rudin C (2012) Learning to predict the wisdom of crowds. In: Proceedings of Collective Intelligence, CI’12, Cambridge, Massachusetts
Zurück zum Zitat Gillick D, Liu Y (2010) Non-expert evaluation of summarization systems is risky. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Association for Computational Linguistics, CSLDAMT ’10, pp 148–151 Gillick D, Liu Y (2010) Non-expert evaluation of summarization systems is risky. In: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, Association for Computational Linguistics, CSLDAMT ’10, pp 148–151
Zurück zum Zitat Hsueh PY, Melville P, Sindhwani V (2009) Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp 27–35 Hsueh PY, Melville P, Sindhwani V (2009) Data quality from crowdsourcing: a study of annotation selection criteria. In: Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp 27–35
Zurück zum Zitat Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, ACM, New York, USA, HCOMP ’10, pp 64–67 Ipeirotis PG, Provost F, Wang J (2010) Quality management on amazon mechanical turk. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, ACM, New York, USA, HCOMP ’10, pp 64–67
Zurück zum Zitat Ipeirotis PG, Provost F, Sheng VS, Wang J (2013) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441CrossRefMathSciNet Ipeirotis PG, Provost F, Sheng VS, Wang J (2013) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441CrossRefMathSciNet
Zurück zum Zitat Kaisser M, Lowe J (2008) Creating a research collection of question answer sentence pairs with amazons mechanical turk. In: Proceedings of the \(6^{th}\) International Language Resources and Evaluation (LREC’08), European Language Resources Association (ELRA) Kaisser M, Lowe J (2008) Creating a research collection of question answer sentence pairs with amazons mechanical turk. In: Proceedings of the \(6^{th}\) International Language Resources and Evaluation (LREC’08), European Language Resources Association (ELRA)
Zurück zum Zitat Kasneci G, Gael JV, Stern D, Graepel T (2011) Cobayes: bayesian knowledge corroboration with assessors of unknown areas of expertise. In: Proceedings of the \(4^{th}\) ACM International Conference on Web Search and Data Mining (WSDM), pp 465–474 Kasneci G, Gael JV, Stern D, Graepel T (2011) Cobayes: bayesian knowledge corroboration with assessors of unknown areas of expertise. In: Proceedings of the \(4^{th}\) ACM International Conference on Web Search and Data Mining (WSDM), pp 465–474
Zurück zum Zitat Law E, von Ahn L (2011) Human computation, synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael Law E, von Ahn L (2011) Human computation, synthesis lectures on artificial intelligence and machine learning. Morgan & Claypool Publishers, San Rafael
Zurück zum Zitat Marge M, Banerjee S, Rudnicky A (2010) Using the amazon mechanical turk for transcription of spoken language. In: Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp 5270–5273 Marge M, Banerjee S, Rudnicky A (2010) Using the amazon mechanical turk for transcription of spoken language. In: Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp 5270–5273
Zurück zum Zitat Mason W, Watts DJ (2009) Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’09, pp 77–85 Mason W, Watts DJ (2009) Financial incentives and the “performance of crowds”. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP ’09, pp 77–85
Zurück zum Zitat Nakov P (2008) Noun compound interpretation using paraphrasing verbs: Feasibility study. In: Proceedings of the \(13^{th}\) international conference on Artificial Intelligence: Methodology, Systems, and Applications, Springer-Verlag, Berlin, Heidelberg, AIMSA ’08, pp 103–117 Nakov P (2008) Noun compound interpretation using paraphrasing verbs: Feasibility study. In: Proceedings of the \(13^{th}\) international conference on Artificial Intelligence: Methodology, Systems, and Applications, Springer-Verlag, Berlin, Heidelberg, AIMSA ’08, pp 103–117
Zurück zum Zitat Nowak S, Rüger S (2010) How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the international conference on Multimedia information retrieval, MIR ’10, pp 557–566 Nowak S, Rüger S (2010) How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the international conference on Multimedia information retrieval, MIR ’10, pp 557–566
Zurück zum Zitat Ogawa S, Piller F (2006) Reducing the risks of new product development. MITSloan Manag Rev 47(2):65 Ogawa S, Piller F (2006) Reducing the risks of new product development. MITSloan Manag Rev 47(2):65
Zurück zum Zitat Quinn AJ, Bederson BB (2011), Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 Conference on Human Factors in, Computing Systems, pp 1403–1412 Quinn AJ, Bederson BB (2011), Human computation: a survey and taxonomy of a growing field. In: Proceedings of the 2011 Conference on Human Factors in, Computing Systems, pp 1403–1412
Zurück zum Zitat Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res (JMLR) 11:1297–1322MathSciNet Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res (JMLR) 11:1297–1322MathSciNet
Zurück zum Zitat Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceeding of the \(14^{th}\) International Conference on Knowledge Discovery and Data Mining (KDD), pp 614–622 Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceeding of the \(14^{th}\) International Conference on Knowledge Discovery and Data Mining (KDD), pp 614–622
Zurück zum Zitat Smyth P, Burl MC, Fayyad UM, Perona P (1994a) Knowledge discovery in large image databases: Dealing with uncertainties in ground truth. In: KDD, Workshop, pp 109–120 Smyth P, Burl MC, Fayyad UM, Perona P (1994a) Knowledge discovery in large image databases: Dealing with uncertainties in ground truth. In: KDD, Workshop, pp 109–120
Zurück zum Zitat Smyth P, Fayyad UM, Burl MC, Perona P, Baldi P (1994b) Inferring ground truth from subjective labelling of venus images. In: NIPS, pp 1085–1092 Smyth P, Fayyad UM, Burl MC, Perona P, Baldi P (1994b) Inferring ground truth from subjective labelling of venus images. In: NIPS, pp 1085–1092
Zurück zum Zitat Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 254–263 Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast–but is it good?: evaluating non-expert annotations for natural language tasks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 254–263
Zurück zum Zitat Sorokin A, Forsyth D (2008) Utility data annotation with amazon mechanical turk. Computer Vision and Pattern Recognition Workshop 1–8 Sorokin A, Forsyth D (2008) Utility data annotation with amazon mechanical turk. Computer Vision and Pattern Recognition Workshop 1–8
Zurück zum Zitat Sullivan EA (2010) A group effort: more companies are turning to the wisdom of the crowd to find ways to innovate. Mark News 44(2):22–28 Sullivan EA (2010) A group effort: more companies are turning to the wisdom of the crowd to find ways to innovate. Mark News 44(2):22–28
Zurück zum Zitat Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Who should label what? instance allocation in multiple expert active learning. In: Proceedings of the SIAM International Conference on Data Mining (SDM) Wallace BC, Small K, Brodley CE, Trikalinos TA (2011) Who should label what? instance allocation in multiple expert active learning. In: Proceedings of the SIAM International Conference on Data Mining (SDM)
Zurück zum Zitat Warfield SK, Zou KH, Wells WM (2004) Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation. IEEE Transact Med Imaging (TMI) 23(7):21–903 Warfield SK, Zou KH, Wells WM (2004) Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation. IEEE Transact Med Imaging (TMI) 23(7):21–903
Zurück zum Zitat Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems (NIPS) vol 10, pp 2424-2432 Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. In: Advances in Neural Information Processing Systems (NIPS) vol 10, pp 2424-2432
Zurück zum Zitat Whitehill J, Ruvolo P, fan Wu T, Bergsma J, Movellan J (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems (NIPS), pp 2035–2043 Whitehill J, Ruvolo P, fan Wu T, Bergsma J, Movellan J (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In: Advances in Neural Information Processing Systems (NIPS), pp 2035–2043
Zurück zum Zitat Yan Y, Rosales R, Fung G, Dy J (2010b) Modeling multiple annotator expertise in the semi-supervised learning scenario. In: Proc. of the \(26^{th}\) Conference on Uncertainty in Artificial Intelligence (UAI), AUAI Press, Corvallis, Oregon, pp 674–682 Yan Y, Rosales R, Fung G, Dy J (2010b) Modeling multiple annotator expertise in the semi-supervised learning scenario. In: Proc. of the \(26^{th}\) Conference on Uncertainty in Artificial Intelligence (UAI), AUAI Press, Corvallis, Oregon, pp 674–682
Zurück zum Zitat Yan Y, Rosales R, Fung G, Schmidt MW, Valadez GH, Bogoni L, Moy L, Dy JG (2010b) Modeling annotator expertise: Learning when everybody knows a bit of something. J Mac Learn Res-Proc Track 9:932–939 Yan Y, Rosales R, Fung G, Schmidt MW, Valadez GH, Bogoni L, Moy L, Dy JG (2010b) Modeling annotator expertise: Learning when everybody knows a bit of something. J Mac Learn Res-Proc Track 9:932–939
Zurück zum Zitat Zheng Y, Scott S, Deng K (2010) Active learning from multiple noisy labelers with varied costs. In: 10th IEEE International Conference on Data Mining (ICDM), pp 639–648 Zheng Y, Scott S, Deng K (2010) Active learning from multiple noisy labelers with varied costs. In: 10th IEEE International Conference on Data Mining (ICDM), pp 639–648
Metadaten
Titel
Approximating the crowd
verfasst von
Şeyda Ertekin
Cynthia Rudin
Haym Hirsh
Publikationsdatum
01.09.2014
Verlag
Springer US
Erschienen in
Data Mining and Knowledge Discovery / Ausgabe 5-6/2014
Print ISSN: 1384-5810
Elektronische ISSN: 1573-756X
DOI
https://doi.org/10.1007/s10618-014-0354-1

Weitere Artikel der Ausgabe 5-6/2014

Data Mining and Knowledge Discovery 5-6/2014 Zur Ausgabe