Abstract
Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, we look to the crowdsourcing solution -- employing human participation -- to make good the shortfall in current technology. Crowdsourcing is a good supplement to many computer tasks. A complex job may be divided into computer-oriented tasks and human-oriented tasks, which are then assigned to machines and humans respectively.
To leverage the power of crowdsourcing, we design and implement a Crowdsourcing Data Analytics System, CDAS. CDAS is a framework designed to support the deployment of various crowdsourcing applications. The core part of CDAS is a quality-sensitive answering model, which guides the crowdsourcing engine to process and monitor the human tasks. In this paper, we introduce the principles of our quality-sensitive model. To satisfy user required accuracy, the model guides the crowdsourcing query engine for the design and processing of the corresponding crowdsourcing jobs. It provides an estimated accuracy for each generated result based on the human workers' historical performances. When verifying the quality of the result, the model employs an online strategy to reduce waiting time. To show the effectiveness of the model, we implement and deploy two analytics jobs on CDAS, a twitter sentiment analytics job and an image tagging job. We use real Twitter and Flickr data as our queries respectively. We compare our approaches with state-of-the-art classification and image annotation techniques. The results show that the human-assisted methods can indeed achieve a much higher accuracy. By embedding the quality-sensitive model into crowdsourcing query engine, we effectively reduce the processing cost while maintaining the required query answer quality.
- O. Alonso, D. E. Rose, and B. Stewart. Crowdsourcing for relevance evaluation. In SIGIR Forum, 42(2): 9--15, 2008. Google ScholarDigital Library
- J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In CoRR, abs/0911.1583, 2009.Google Scholar
- C. Callison-Burch and M. Dredze. Creating speech and language data with amazon's mechanical turk. In NAACL HLT Workshop, pages 1--12, 2010. Google ScholarDigital Library
- X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. In PVLDB, 2(1): 550--561, 2009. Google ScholarDigital Library
- R. Fisher. Statistical methods for research workers. Oliver and Boyd, 1954.Google Scholar
- M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72, 2011. Google ScholarDigital Library
- A. Ghosh, S. Kale, and P. McAfee. Who moderates the moderators?: crowdsourcing abuse detection in user-generated content. In EC, pages 167--176, 2011. Google ScholarDigital Library
- C. Grady and M. Lease. Crowdsourcing document relevance assessment with mechanical turk. In NAACL HLT Workshop, pages 172--179, 2010. Google ScholarDigital Library
- P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In SIGKDD Workshop, pages 64--67, 2010. Google ScholarDigital Library
- G. Kazai, J. Kamps, M. Koolen, and N. Milic-Frayling. Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking. In SIGIR, pages 205--214, 2011. Google ScholarDigital Library
- A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In SIGCHI, pages 453--456, 2008. Google ScholarDigital Library
- J. Ledlie, B. Odero, E. Minkov, I. Kiss, and J. Polifroni. Crowd translator: on building localized speech recognizers through micropayments. In SIGOPS Oper. Syst. Rev., 43(4): 84--89, 2010. Google ScholarDigital Library
- J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In IEEE Trans. Pattern Anal. Mach. Intell., 30(6): 985--1002, June 2008. Google ScholarDigital Library
- X. Liu, X. L. Dong, B. C. Ooi, and D. Srivastava. Online data fusion. In PVLDB, 4(11): 932--943, 2011.Google ScholarDigital Library
- A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Demonstration of qurk: a query processor for humanoperators. In SIGMOD, pages 1315--1318, 2011. Google ScholarDigital Library
- A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011.Google Scholar
- R. Munro, S. Bethard, V. Kuperman, V. T. Lai, R. Melnick, C. Potts, T. Schnoebelen, and H. Tily. Crowdsourcing and language studies: the new generation of linguistic data. In NAACL HLT Workshop, pages 122--130, 2010. Google ScholarDigital Library
- S. Nowak and S. Rüger. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In MIR, pages 557--566, 2010. Google ScholarDigital Library
- A. Parameswaran, A. D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it's okay to ask questions. In PVLDB, 4(5): 267--278, 2011. Google ScholarDigital Library
- A. G. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, pages 160--166, 2011.Google Scholar
- C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical turk. In NAACL HLT Workshop, pages 139--147, 2010. Google ScholarDigital Library
- R. V. Wanzeele, K. Verbeeck, A. Vorstermans, T. Tourwe, and E. Tsiporkova. Extracting emotions out of twitters microblogs. In BNAIC, pages 304--311, 2011.Google Scholar
- T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In MobiSys, pages 77--90, 2010. Google ScholarDigital Library
Recommendations
Enhancing Bidding Strategies in CDAs by Adaptive Judgement of Price Acceptability
Multi-Agent Systems for SocietyContinuous Double Auctions (CDAs) and agent technology provide great opportunities for market institutions to carry out real-world trading quickly and conveniently. There are several bidding strategies in the literature for agents in CDAs to employ, ...
On Multiple Keyword Sponsored Search Auctions with Budgets
We study multiple keyword sponsored search auctions with budgets. Each keyword has multiple ad slots with a click-through rate. The bidders have additive valuations, which are linear in the click-through rates, and budgets, which are restricting their ...
Implications of a Reserve Price in an Agent-Based Common-Value Auction
Auction sellers can use a reserve price to require a minimum bid before items are sold. Theoretical and experimental research has tested the influence of a reserve price in an independent private values auction, but little focus has been given to the ...
Comments