skip to main content
research-article

CDAS: a crowdsourcing data analytics system

Authors Info & Claims
Published:01 June 2012Publication History
Skip Abstract Section

Abstract

Some complex problems, such as image tagging and natural language processing, are very challenging for computers, where even state-of-the-art technology is yet able to provide satisfactory accuracy. Therefore, rather than relying solely on developing new and better algorithms to handle such tasks, we look to the crowdsourcing solution -- employing human participation -- to make good the shortfall in current technology. Crowdsourcing is a good supplement to many computer tasks. A complex job may be divided into computer-oriented tasks and human-oriented tasks, which are then assigned to machines and humans respectively.

To leverage the power of crowdsourcing, we design and implement a Crowdsourcing Data Analytics System, CDAS. CDAS is a framework designed to support the deployment of various crowdsourcing applications. The core part of CDAS is a quality-sensitive answering model, which guides the crowdsourcing engine to process and monitor the human tasks. In this paper, we introduce the principles of our quality-sensitive model. To satisfy user required accuracy, the model guides the crowdsourcing query engine for the design and processing of the corresponding crowdsourcing jobs. It provides an estimated accuracy for each generated result based on the human workers' historical performances. When verifying the quality of the result, the model employs an online strategy to reduce waiting time. To show the effectiveness of the model, we implement and deploy two analytics jobs on CDAS, a twitter sentiment analytics job and an image tagging job. We use real Twitter and Flickr data as our queries respectively. We compare our approaches with state-of-the-art classification and image annotation techniques. The results show that the human-assisted methods can indeed achieve a much higher accuracy. By embedding the quality-sensitive model into crowdsourcing query engine, we effectively reduce the processing cost while maintaining the required query answer quality.

References

  1. O. Alonso, D. E. Rose, and B. Stewart. Crowdsourcing for relevance evaluation. In SIGIR Forum, 42(2): 9--15, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Bollen, A. Pepe, and H. Mao. Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena. In CoRR, abs/0911.1583, 2009.Google ScholarGoogle Scholar
  3. C. Callison-Burch and M. Dredze. Creating speech and language data with amazon's mechanical turk. In NAACL HLT Workshop, pages 1--12, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. In PVLDB, 2(1): 550--561, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Fisher. Statistical methods for research workers. Oliver and Boyd, 1954.Google ScholarGoogle Scholar
  6. M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Ghosh, S. Kale, and P. McAfee. Who moderates the moderators?: crowdsourcing abuse detection in user-generated content. In EC, pages 167--176, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Grady and M. Lease. Crowdsourcing document relevance assessment with mechanical turk. In NAACL HLT Workshop, pages 172--179, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In SIGKDD Workshop, pages 64--67, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Kazai, J. Kamps, M. Koolen, and N. Milic-Frayling. Crowdsourcing for book search evaluation: impact of hit design on comparative system ranking. In SIGIR, pages 205--214, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Kittur, E. H. Chi, and B. Suh. Crowdsourcing user studies with mechanical turk. In SIGCHI, pages 453--456, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Ledlie, B. Odero, E. Minkov, I. Kiss, and J. Polifroni. Crowd translator: on building localized speech recognizers through micropayments. In SIGOPS Oper. Syst. Rev., 43(4): 84--89, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Li and J. Z. Wang. Real-time computerized annotation of pictures. In IEEE Trans. Pattern Anal. Mach. Intell., 30(6): 985--1002, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. X. Liu, X. L. Dong, B. C. Ooi, and D. Srivastava. Online data fusion. In PVLDB, 4(11): 932--943, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Demonstration of qurk: a query processor for humanoperators. In SIGMOD, pages 1315--1318, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011.Google ScholarGoogle Scholar
  17. R. Munro, S. Bethard, V. Kuperman, V. T. Lai, R. Melnick, C. Potts, T. Schnoebelen, and H. Tily. Crowdsourcing and language studies: the new generation of linguistic data. In NAACL HLT Workshop, pages 122--130, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Nowak and S. Rüger. How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In MIR, pages 557--566, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Parameswaran, A. D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it's okay to ask questions. In PVLDB, 4(5): 267--278, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. G. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, pages 160--166, 2011.Google ScholarGoogle Scholar
  21. C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical turk. In NAACL HLT Workshop, pages 139--147, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. V. Wanzeele, K. Verbeeck, A. Vorstermans, T. Tourwe, and E. Tsiporkova. Extracting emotions out of twitters microblogs. In BNAIC, pages 304--311, 2011.Google ScholarGoogle Scholar
  23. T. Yan, V. Kumar, and D. Ganesan. Crowdsearch: exploiting crowds for accurate real-time image search on mobile phones. In MobiSys, pages 77--90, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 5, Issue 10
    June 2012
    180 pages

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 June 2012
    Published in pvldb Volume 5, Issue 10

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader