ABSTRACT
Crowdsourced entity resolution has recently attracted significant attentions because it can harness the wisdom of crowd to improve the quality of entity resolution. However existing techniques either cannot achieve high quality or incur huge monetary costs. To address these problems, we propose a cost-effective crowdsourced entity resolution framework, which significantly reduces the monetary cost while keeping high quality. We first define a partial order on the pairs of records. Then we select a pair as a question and ask the crowd to check whether the records in the pair refer to the same entity. After getting the answer of this pair, we infer the answers of other pairs based on the partial order. Next we iteratively select pairs without answers to ask until we get the answers of all pairs. We devise effective algorithms to judiciously select the pairs to ask in order to minimize the number of asked pairs. To further reduce the cost, we propose a grouping technique to group the pairs and we only ask one pair instead of all pairs in each group. We develop error-tolerant techniques to tolerate the errors introduced by the partial order and the crowd. Experimental results show that our method reduces the cost to 1.25% of existing approaches (or existing approaches take 80* monetary cost of our method) while not sacrificing the quality.
- C. C. Cao, J. She, Y. Tong, and L. Chen. Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB, 5(11):1495--1506, 2012. Google ScholarDigital Library
- X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013. Google ScholarDigital Library
- J. Fan, G. Li, B. C. Ooi, K. Tan, and J. Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, pages 1015--1030, 2015. Google ScholarDigital Library
- S. Felsner, V. Raghavan, and J. P. Spinrad. Recognition algorithms for orders of small width and graphs of small dilworth number. Order, 20(4):351--364, 2003.Google ScholarCross Ref
- M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72, 2011. Google ScholarDigital Library
- D. R. Fulkerson. Note on dilworth's decomposition theorem for partially ordered sets. American Mathematical Society, 7(4):701--702, 1956.Google Scholar
- C. Gokhale, S. Das, A. Doan, J. F. Naughton, N. Rampalli, J. W. Shavlik, and X. Zhu. Corleone: hands-off crowdsourcing for entity matching. In SIGMOD, pages 601--612, 2014. Google ScholarDigital Library
- S. Guo, A. G. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD, pages 385--396, 2012. Google ScholarDigital Library
- P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In SIGKDD workshop on human computation, pages 64--67. ACM, 2010. Google ScholarDigital Library
- M. Kreveld, M. Overmars, O. Schwarzkopf, M. d. Berg, and O. Schwartskopf. Computational geometry: algorithms and applications, 1997. Google ScholarDigital Library
- X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. CDAS: A crowdsourcing data analytics system. PVLDB, 5(10):1040--1051, 2012. Google ScholarDigital Library
- A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 5(1):13--24, 2011. Google ScholarDigital Library
- A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011.Google Scholar
- N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM journal on computing, 13(1):182--196, 1984.Google Scholar
- W. R. Ouyang, L. M. Kaplan, and et al. Debiasing crowdsourced quantitative characteristics in local businesses and services. In IPSN, pages 190--201, 2015. Google ScholarDigital Library
- A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In CIKM, pages 1203--1212, 2012. Google ScholarDigital Library
- H. Park and J. Widom. Query optimization over crowdsourced data. PVLDB, 6(10):781--792, 2013. Google ScholarDigital Library
- T. Pfeiffer, X. A. Gao, Y. Chen, A. Mao, and D. G. Rand. Adaptive polling for information aggregation. In AAAI, 2012. Google ScholarDigital Library
- P. Venetis, H. Garcia-Molina, K. Huang, and N. Polyzotis. Max algorithms in crowdsourcing environments. In WWW, pages 989--998, 2012. Google ScholarDigital Library
- V. Verroios and H. Garcia-Molina. Entity resolution with crowd errors. In ICDE, pages 219--230, 2015.Google ScholarCross Ref
- N. Vesdapunt, K. Bellare, and N. N. Dalvi. Crowdsourcing algorithms for entity resolution. PVLDB, 2014. Google ScholarDigital Library
- J. Wang, T. Kraska, M. J. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. PVLDB, 2012. Google ScholarDigital Library
- J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD, pages 229--240, 2013. Google ScholarDigital Library
- S. Wang, X. Xiao, and C. Lee. Crowd-based deduplication: An adaptive approach. In SIGMOD, pages 1263--1277, 2015. Google ScholarDigital Library
- S. E. Whang, P. Lofgren, and H. Garcia-Molina. Question selection for crowd entity resolution. PVLDB, 2013. Google ScholarDigital Library
- Y. Zheng, R. Cheng, S. Maniu, and L. Mo. On optimality of jury selection in crowdsourcing. In EDBT, pages 193--204, 2015.Google Scholar
- Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. QASCA: A quality-aware task assignment system for crowdsourcing applications. In SIGMOD, pages 1031--1046, 2015. Google ScholarDigital Library
Index Terms
- Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach
Recommendations
A partial-order-based framework for cost-effective crowdsourced entity resolution
Crowdsourced entity resolution has recently attracted significant attentions because it can harness the wisdom of crowd to improve the quality of entity resolution. However, existing techniques either cannot achieve high quality or incur huge monetary ...
Cost-effective crowdsourced join queries for entity resolution without prior knowledge
AbstractThe join query, which finds matching pairs from two object sets, is a fundamental operation in computer systems and helps to solve many real problems, e.g., entity resolution. In this paper, we address the problem of join queries by ...
Highlights- To leverage crowdsourcing to obtain matching relationships in join queries.
- ...
Attribute-based Crowd Entity Resolution
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementWe study the problem of using the crowd to perform entity resolution (ER) on a set of records. For many types of records, especially those involving images, such a task can be difficult for machines, but relatively easy for humans. Typical crowd-based ...
Comments