research-article

CDAS: a crowdsourcing data analytics system

Authors:
Xuan Liu

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Meiyu Lu

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Beng Chin Ooi

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Yanyan Shen

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

,
Sai Wu

Zhejiang University, Hangzhou, P. R. China

Zhejiang University, Hangzhou, P. R. China
View Profile

,
Meihui Zhang

National University of Singapore, Singapore

National University of Singapore, Singapore
View Profile

Proceedings of the VLDB Endowment Volume 5 Issue 10pp 1040–1051https://doi.org/10.14778/2336664.2336676

Published:01 June 2012Publication History

Proceedings of the VLDB Endowment

Abstract

Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, we look to the crowdsourcing solution -- employing human participation -- to make good the shortfall in current technology. Crowdsourcing is a good supplement to many computer tasks. A complex job may be divided into computer-oriented tasks and human-oriented tasks, which are then assigned to machines and humans respectively.

To leverage the power of crowdsourcing, we design and implement a Crowdsourcing Data Analytics System, CDAS. CDAS is a framework designed to support the deployment of various crowdsourcing applications. The core part of CDAS is a quality-sensitive answering model, which guides the crowdsourcing engine to process and monitor the human tasks. In this paper, we introduce the principles of our quality-sensitive model. To satisfy user required accuracy, the model guides the crowdsourcing query engine for the design and processing of the corresponding crowdsourcing jobs. It provides an estimated accuracy for each generated result based on the human workers' historical performances. When verifying the quality of the result, the model employs an online strategy to reduce waiting time. To show the effectiveness of the model, we implement and deploy two analytics jobs on CDAS, a twitter sentiment analytics job and an image tagging job. We use real Twitter and Flickr data as our queries respectively. We compare our approaches with state-of-the-art classification and image annotation techniques. The results show that the human-assisted methods can indeed achieve a much higher accuracy. By embedding the quality-sensitive model into crowdsourcing query engine, we effectively reduce the processing cost while maintaining the required query answer quality.

References

O. Alonso, D. E. Rose, and B. Stewart. Crowdsourcing for relevance evaluation. In SIGIR Forum, 42(2): 9--15, 2008. Google ScholarDigital Library
J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In CoRR, abs/0911.1583, 2009.Google Scholar
C. Callison-Burch and M. Dredze. Creating speech and language data with amazon's mechanical turk. In NAACL HLT Workshop, pages 1--12, 2010. Google ScholarDigital Library
X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. In PVLDB, 2(1): 550--561, 2009. Google ScholarDigital Library
R. Fisher. Statistical methods for research workers. Oliver and Boyd, 1954.Google Scholar
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72, 2011. Google ScholarDigital Library
A. Ghosh, S. Kale, and P. McAfee. Who moderates the moderators?: crowdsourcing abuse detection in user-generated content. In EC, pages 167--176, 2011. Google ScholarDigital Library
C. Grady and M. Lease. Crowdsourcing document relevance assessment with mechanical turk. In NAACL HLT Workshop, pages 172--179, 2010. Google ScholarDigital Library
P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In SIGKDD Workshop, pages 64--67, 2010. Google ScholarDigital Library
G. Kazai, J. Kamps, M. Koolen, and N. Milic-Frayling. Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking. In SIGIR, pages 205--214, 2011. Google ScholarDigital Library
A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In SIGCHI, pages 453--456, 2008. Google ScholarDigital Library
J. Ledlie, B. Odero, E. Minkov, I. Kiss, and J. Polifroni. Crowd translator: on building localized speech recognizers through micropayments. In SIGOPS Oper. Syst. Rev., 43(4): 84--89, 2010. Google ScholarDigital Library
J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In IEEE Trans. Pattern Anal. Mach. Intell., 30(6): 985--1002, June 2008. Google ScholarDigital Library
X. Liu, X. L. Dong, B. C. Ooi, and D. Srivastava. Online data fusion. In PVLDB, 4(11): 932--943, 2011.Google ScholarDigital Library
A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Demonstration of qurk: a query processor for humanoperators. In SIGMOD, pages 1315--1318, 2011. Google ScholarDigital Library
A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011.Google Scholar
R. Munro, S. Bethard, V. Kuperman, V. T. Lai, R. Melnick, C. Potts, T. Schnoebelen, and H. Tily. Crowdsourcing and language studies: the new generation of linguistic data. In NAACL HLT Workshop, pages 122--130, 2010. Google ScholarDigital Library
S. Nowak and S. Rüger. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In MIR, pages 557--566, 2010. Google ScholarDigital Library
A. Parameswaran, A. D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it's okay to ask questions. In PVLDB, 4(5): 267--278, 2011. Google ScholarDigital Library
A. G. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, pages 160--166, 2011.Google Scholar
C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical turk. In NAACL HLT Workshop, pages 139--147, 2010. Google ScholarDigital Library
R. V. Wanzeele, K. Verbeeck, A. Vorstermans, T. Tourwe, and E. Tsiporkova. Extracting emotions out of twitters microblogs. In BNAIC, pages 304--311, 2011.Google Scholar
T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In MobiSys, pages 77--90, 2010. Google ScholarDigital Library

Recommendations

Enhancing Bidding Strategies in CDAs by Adaptive Judgement of Price Acceptability
Multi-Agent Systems for Society

Continuous Double Auctions (CDAs) and agent technology provide great opportunities for market institutions to carry out real-world trading quickly and conveniently. There are several bidding strategies in the literature for agents in CDAs to employ, ...
Read More
On Multiple Keyword Sponsored Search Auctions with Budgets

We study multiple keyword sponsored search auctions with budgets. Each keyword has multiple ad slots with a click-through rate. The bidders have additive valuations, which are linear in the click-through rates, and budgets, which are restricting their ...
Read More
Implications of a Reserve Price in an Agent-Based Common-Value Auction

Auction sellers can use a reserve price to require a minimum bid before items are sold. Theoretical and experimental research has tested the influence of a reserve price in an independent private values auction, but little focus has been given to the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 5, Issue 10
June 2012
180 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 June 2012
Published in pvldb Volume 5, Issue 10
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 803
  Total Downloads
- Downloads (Last 12 months)19
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CDAS: a crowdsourcing data analytics system

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Enhancing Bidding Strategies in CDAs by Adaptive Judgement of Price Acceptability

On Multiple Keyword Sponsored Search Auctions with Budgets

Implications of a Reserve Price in an Agent-Based Common-Value Auction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

CDAS: a crowdsourcing data analytics system

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Recommendations

Enhancing Bidding Strategies in CDAs by Adaptive Judgement of Price Acceptability

On Multiple Keyword Sponsored Search Auctions with Budgets

Implications of a Reserve Price in an Agent-Based Common-Value Auction

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media