skip to main content
10.1145/2882903.2915252acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach

Published:14 June 2016Publication History

ABSTRACT

Crowdsourced entity resolution has recently attracted significant attentions because it can harness the wisdom of crowd to improve the quality of entity resolution. However existing techniques either cannot achieve high quality or incur huge monetary costs. To address these problems, we propose a cost-effective crowdsourced entity resolution framework, which significantly reduces the monetary cost while keeping high quality. We first define a partial order on the pairs of records. Then we select a pair as a question and ask the crowd to check whether the records in the pair refer to the same entity. After getting the answer of this pair, we infer the answers of other pairs based on the partial order. Next we iteratively select pairs without answers to ask until we get the answers of all pairs. We devise effective algorithms to judiciously select the pairs to ask in order to minimize the number of asked pairs. To further reduce the cost, we propose a grouping technique to group the pairs and we only ask one pair instead of all pairs in each group. We develop error-tolerant techniques to tolerate the errors introduced by the partial order and the crowd. Experimental results show that our method reduces the cost to 1.25% of existing approaches (or existing approaches take 80* monetary cost of our method) while not sacrificing the quality.

References

  1. C. C. Cao, J. She, Y. Tong, and L. Chen. Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB, 5(11):1495--1506, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Fan, G. Li, B. C. Ooi, K. Tan, and J. Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, pages 1015--1030, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Felsner, V. Raghavan, and J. P. Spinrad. Recognition algorithms for orders of small width and graphs of small dilworth number. Order, 20(4):351--364, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  5. M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. R. Fulkerson. Note on dilworth's decomposition theorem for partially ordered sets. American Mathematical Society, 7(4):701--702, 1956.Google ScholarGoogle Scholar
  7. C. Gokhale, S. Das, A. Doan, J. F. Naughton, N. Rampalli, J. W. Shavlik, and X. Zhu. Corleone: hands-off crowdsourcing for entity matching. In SIGMOD, pages 601--612, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Guo, A. G. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD, pages 385--396, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In SIGKDD workshop on human computation, pages 64--67. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Kreveld, M. Overmars, O. Schwarzkopf, M. d. Berg, and O. Schwartskopf. Computational geometry: algorithms and applications, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. CDAS: A crowdsourcing data analytics system. PVLDB, 5(10):1040--1051, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 5(1):13--24, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011.Google ScholarGoogle Scholar
  14. N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM journal on computing, 13(1):182--196, 1984.Google ScholarGoogle Scholar
  15. W. R. Ouyang, L. M. Kaplan, and et al. Debiasing crowdsourced quantitative characteristics in local businesses and services. In IPSN, pages 190--201, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In CIKM, pages 1203--1212, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Park and J. Widom. Query optimization over crowdsourced data. PVLDB, 6(10):781--792, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. Pfeiffer, X. A. Gao, Y. Chen, A. Mao, and D. G. Rand. Adaptive polling for information aggregation. In AAAI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Venetis, H. Garcia-Molina, K. Huang, and N. Polyzotis. Max algorithms in crowdsourcing environments. In WWW, pages 989--998, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Verroios and H. Garcia-Molina. Entity resolution with crowd errors. In ICDE, pages 219--230, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  21. N. Vesdapunt, K. Bellare, and N. N. Dalvi. Crowdsourcing algorithms for entity resolution. PVLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Wang, T. Kraska, M. J. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. PVLDB, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD, pages 229--240, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Wang, X. Xiao, and C. Lee. Crowd-based deduplication: An adaptive approach. In SIGMOD, pages 1263--1277, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. E. Whang, P. Lofgren, and H. Garcia-Molina. Question selection for crowd entity resolution. PVLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Zheng, R. Cheng, S. Maniu, and L. Mo. On optimality of jury selection in crowdsourcing. In EDBT, pages 193--204, 2015.Google ScholarGoogle Scholar
  27. Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. QASCA: A quality-aware task assignment system for crowdsourcing applications. In SIGMOD, pages 1031--1046, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
        June 2016
        2300 pages
        ISBN:9781450335317
        DOI:10.1145/2882903

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 June 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader