Abstract
Crowdsourcing is a new computing paradigm that harnesses human effort to solve computer-hard problems, such as entity resolution and photo tagging. The crowd (or workers) have diverse qualities and it is important to effectively model a worker's quality. Most of existing worker models assume that workers have the same quality on different tasks. In practice, however, tasks belong to a variety of diverse domains, and workers have different qualities on different domains. For example, a worker who is a basketball fan should have better quality for the task of labeling a photo related to 'Stephen Curry' than the one related to 'Leonardo DiCaprio'. In this paper, we study how to leverage domain knowledge to accurately model a worker's quality. We examine using knowledge base (KB), e.g., Wikipedia and Freebase, to detect the domains of tasks and workers. We develop Domain Vector Estimation, which analyzes the domains of a task with respect to the KB. We also study Truth Inference, which utilizes the domain-sensitive worker model to accurately infer the true answer of a task. We design an Online Task Assignment algorithm, which judiciously and efficiently assigns tasks to appropriate workers. To implement these solutions, we have built DOCS, a system deployed on the Amazon Mechanical Turk. Experiments show that DOCS performs much better than the state-of-the-art approaches.
- https://docs.aws.amazon.com/AWSMechTurk/latest/RequesterUI/amt-ui.pdf.Google Scholar
- http://answers.yahoo.com/question/index?qid=20071211155603AAKwtyr.Google Scholar
- Amazon mechanical turk. https://www.mturk.com/.Google Scholar
- Chinacrowd. http://www.chinacrowds.com.Google Scholar
- Y. Amsterdamer, S. B. Davidson, T. Milo, S. Novgorodov, and A. Somech. Oassis: query driven crowd mining. In SIGMOD, pages 589--600, 2014. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan):993--1022, 2003. Google ScholarDigital Library
- M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Time bounds for selection. Journal of Computer and System Sciences, 7(4):448--461, 1973. Google ScholarDigital Library
- R. Boim, O. Greenshpan, T. Milo, S. Novgorodov, N. Polyzotis, and W. C. Tan. Asking the right questions in crowd data sourcing. In ICDE, pages 1261--1264, 2012. Google ScholarDigital Library
- C. Chai, G. Li, J. Li, D. Deng, and J. Feng. Cost-effective crowdsourced entity resolution: A partial-order approach. In SIGMOD, pages 969--984, 2016. Google ScholarDigital Library
- X. Cheng and D. Roth. Relational inference for wikification. In EMNLP, pages 1787--1796, 2013.Google Scholar
- X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye. Katara: A data cleaning system powered by knowledge bases and crowdsourcing. In SIGMOD, pages 1247--1261, 2015. Google ScholarDigital Library
- Compositions. http://mathworld.wolfram.com/Composition.html.Google Scholar
- CrowdFlower. http://crowdflower.com/.Google Scholar
- S. B. Davidson, S. Khanna, T. Milo, and S. Roy. Using the crowd for top-k and group-by queries. In ICDT, pages 225--236, 2013. Google ScholarDigital Library
- A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics, pages 20--28, 1979.Google Scholar
- G. Demartini, D. E. Difallah, and P. Cudré-Mauroux. Zencrowd: leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In WWW, pages 469--478, 2012. Google ScholarDigital Library
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), pages 1--38, 1977.Google Scholar
- J. Fan, G. Li, B. C. Ooi, K.-l. Tan, and J. Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, pages 1015--1030, 2015. Google ScholarDigital Library
- M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72, 2011. Google ScholarDigital Library
- Freebase. https://www.freebase.com/.Google Scholar
- G. Goel, A. Nikzad, and A. Singla. Allocating tasks to workers with matching constraints: truthful mechanisms for crowdsourcing markets. In WWW, pages 279--280, 2014. Google ScholarDigital Library
- C.-J. Ho and J. W. Vaughan. Online task assignment in crowdsourcing markets. In AAAI, pages 45--51, 2012. Google ScholarDigital Library
- H. Hu, G. Li, Z. Bao, Y. Cui, and J. Feng. Crowdsourcing-based real-time urban traffic speed estimation: From trends to speeds. In ICDE, pages 883--894, 2016.Google ScholarCross Ref
- H. Hu, Y. Zheng, Z. Bao, G. Li, J. Feng, and R. Cheng. Crowdsourced POI labelling: Location-aware result inference and task assignment. In ICDE, pages 61--72, 2016.Google ScholarCross Ref
- H. Ji, R. Grishman, H. T. Dang, K. Griffitt, and J. Ellis. Overview of the tac 2010 knowledge base population track. In TAC, 2010.Google Scholar
- S. Kullback and R. A. Leibler. On information and sufficiency. The annals of mathematical statistics, 22(1):79--86, 1951.Google Scholar
- G. Li, J. Wang, Y. Zheng, and M. J. Franklin. Crowdsourced data management: A survey. TKDE, 28(9):2296--2319, 2016.Google ScholarDigital Library
- Y. Li, J. Gao, C. Meng, Q. Li, L. Su, B. Zhao, W. Fan, and J. Han. A survey on truth discovery. SIGKDD Explorations, 17(2):1--16, 2015. Google ScholarDigital Library
- X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. Cdas: a crowdsourcing data analytics system. PVLDB, 5(10):1040--1051, 2012. Google ScholarDigital Library
- F. Ma, Y. Li, Q. Li, M. Qiu, J. Gao, S. Zhi, L. Su, B. Zhao, H. Ji, and J. Han. Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation. In KDD, pages 745--754, 2015. Google ScholarDigital Library
- A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011.Google Scholar
- P. Mavridis, D. Gross-Amblard, and Z. Miklós. Using hierarchical skills for optimized task assignment in knowledge-intensive crowdsourcing. In WWW, pages 843--853, 2016. Google ScholarDigital Library
- G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Wiley-Interscience, 1988. Google ScholarDigital Library
- A. G. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: algorithms for filtering data with humans. In SIGMOD, pages 361--372, 2012. Google ScholarDigital Library
- QA. https://webscope.sandbox.yahoo.com/catalog.php?datatype=l&did=76.Google Scholar
- L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL, pages 1375--1384, 2011. Google ScholarDigital Library
- S. B. Roy, I. Lykourentzou, S. Thirumuruganathan, S. Amer-Yahia, and G. Das. Task assignment optimization in knowledge-intensive crowdsourcing. VLDBJ, 24(4):467--491, 2015. Google ScholarDigital Library
- C. E. Shannon. A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev., 5(1):3--55, 2001. Google ScholarDigital Library
- W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. TKDE, 27(2):443--460, 2015.Google ScholarCross Ref
- Technical Report. http://i.cs.hku.hk/~ydzheng2/docs_full.pdf.Google Scholar
- L. Tran-Thanh, S. Stein, A. Rogers, and N. R. Jennings. Efficient crowdsourcing of unknown experts using bounded multi-armed bandits. Artificial Intelligence, 214:89--111, 2014. Google ScholarDigital Library
- S. Trani, D. Ceccarelli, C. Lucchese, S. Orlando, and R. Perego. Dexter 2.0-an open source tool for semantically enriching data. In ICWS, pages 417--420, 2014. Google ScholarDigital Library
- J. Wang, T. Kraska, M. J. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. PVLDB, 5(11):1483--1494, 2012. Google ScholarDigital Library
- J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD, pages 229--240, 2013. Google ScholarDigital Library
- S. E. Whang, P. Lofgren, and H. Garcia-Molina. Question selection for crowd entity resolution. PVLDB, 6(6):349--360, 2013. Google ScholarDigital Library
- J. Whitehill, T.-f. Wu, J. Bergsma, J. R. Movellan, and P. L. Ruvolo. Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. In NIPS, pages 2035--2043, 2009. Google ScholarDigital Library
- Wikipedia. https://en.wikipedia.org/wiki/Category:Main_topic_classifications.Google Scholar
- Yahoo Answers. https://answers.yahoo.com/dir/index.Google Scholar
- L. Yang, M. Qiu, S. Gottipati, F. Zhu, J. Jiang, H. Sun, and Z. Chen. Cqarank: jointly model topics and expertise in community question answering. In CIKM, pages 99--108, 2013. Google ScholarDigital Library
- X. Zhang, G. Li, and J. Feng. Crowdsourced top-k algorithms: An experimental evaluation. PVLDB, 9(8):612--623, 2016. Google ScholarDigital Library
- W. X. Zhao, J. Jiang, J. Weng, J. He, E.-P. Lim, H. Yan, and X. Li. Comparing twitter and traditional media using topic models. In ECIR, pages 338--349, 2011. Google ScholarDigital Library
- Z. Zhao, F. Wei, M. Zhou, W. Chen, and W. Ng. Crowd-selection query processing in crowdsourcing databases: A task-driven approach. In EDBT, pages 397--408, 2015.Google Scholar
- Y. Zheng, R. Cheng, S. Maniu, and L. Mo. On optimality of jury selection in crowdsourcing. In EDBT, pages 193--204, 2015.Google Scholar
- Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. Qasca: A quality-aware task assignment system for crowdsourcing applications. In SIGMOD, pages 1031--1046, 2015. Google ScholarDigital Library
- S. Zhi, B. Zhao, W. Tong, J. Gao, D. Yu, H. Ji, and J. Han. Modeling truth existence in truth discovery. In KDD, pages 1543--1552, 2015. Google ScholarDigital Library
- D. Zhou, S. Basu, Y. Mao, and J. C. Platt. Learning from the wisdom of crowds by minimax entropy. In NIPS, pages 2195--2203, 2012. Google ScholarDigital Library
Recommendations
I-DOCS: Distributed Agent-Assisted Knowledge Fusion for Disease Gene Discovery
ICPADS '01: Proceedings of the Eighth International Conference on Parallel and Distributed SystemsAbstract: New methods of electronic collaboration are needed to manage and reconcile the vast scientific knowledge made available through the experience of diverse experts. Fundamental research in capturing, managing, analyzing, and explaining ...
Comments