research-article

Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach

Authors:
Chengliang Chai

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Guoliang Li

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jian Li

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Dong Deng

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jianhua Feng

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataJune 2016Pages 969–984https://doi.org/10.1145/2882903.2915252

Published:14 June 2016Publication History

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

Pages 969–984

ABSTRACT

Crowdsourced entity resolution has recently attracted significant attentions because it can harness the wisdom of crowd to improve the quality of entity resolution. However existing techniques either cannot achieve high quality or incur huge monetary costs. To address these problems, we propose a cost-effective crowdsourced entity resolution framework, which significantly reduces the monetary cost while keeping high quality. We first define a partial order on the pairs of records. Then we select a pair as a question and ask the crowd to check whether the records in the pair refer to the same entity. After getting the answer of this pair, we infer the answers of other pairs based on the partial order. Next we iteratively select pairs without answers to ask until we get the answers of all pairs. We devise effective algorithms to judiciously select the pairs to ask in order to minimize the number of asked pairs. To further reduce the cost, we propose a grouping technique to group the pairs and we only ask one pair instead of all pairs in each group. We develop error-tolerant techniques to tolerate the errors introduced by the partial order and the crowd. Experimental results show that our method reduces the cost to 1.25% of existing approaches (or existing approaches take 80* monetary cost of our method) while not sacrificing the quality.

References

C. C. Cao, J. She, Y. Tong, and L. Chen. Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB, 5(11):1495--1506, 2012. Google ScholarDigital Library
X. Chen, P. N. Bennett, K. Collins-Thompson, and E. Horvitz. Pairwise ranking aggregation in a crowdsourced setting. In WSDM, pages 193--202, 2013. Google ScholarDigital Library
J. Fan, G. Li, B. C. Ooi, K. Tan, and J. Feng. icrowd: An adaptive crowdsourcing framework. In SIGMOD, pages 1015--1030, 2015. Google ScholarDigital Library
S. Felsner, V. Raghavan, and J. P. Spinrad. Recognition algorithms for orders of small width and graphs of small dilworth number. Order, 20(4):351--364, 2003.Google ScholarCross Ref
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD, pages 61--72, 2011. Google ScholarDigital Library
D. R. Fulkerson. Note on dilworth's decomposition theorem for partially ordered sets. American Mathematical Society, 7(4):701--702, 1956.Google Scholar
C. Gokhale, S. Das, A. Doan, J. F. Naughton, N. Rampalli, J. W. Shavlik, and X. Zhu. Corleone: hands-off crowdsourcing for entity matching. In SIGMOD, pages 601--612, 2014. Google ScholarDigital Library
S. Guo, A. G. Parameswaran, and H. Garcia-Molina. So who won?: dynamic max discovery with the crowd. In SIGMOD, pages 385--396, 2012. Google ScholarDigital Library
P. G. Ipeirotis, F. Provost, and J. Wang. Quality management on amazon mechanical turk. In SIGKDD workshop on human computation, pages 64--67. ACM, 2010. Google ScholarDigital Library
M. Kreveld, M. Overmars, O. Schwarzkopf, M. d. Berg, and O. Schwartskopf. Computational geometry: algorithms and applications, 1997. Google ScholarDigital Library
X. Liu, M. Lu, B. C. Ooi, Y. Shen, S. Wu, and M. Zhang. CDAS: A crowdsourcing data analytics system. PVLDB, 5(10):1040--1051, 2012. Google ScholarDigital Library
A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C. Miller. Human-powered sorts and joins. PVLDB, 5(1):13--24, 2011. Google ScholarDigital Library
A. Marcus, E. Wu, S. Madden, and R. C. Miller. Crowdsourced databases: Query processing with people. In CIDR, pages 211--214, 2011.Google Scholar
N. Megiddo and K. J. Supowit. On the complexity of some common geometric location problems. SIAM journal on computing, 13(1):182--196, 1984.Google Scholar
W. R. Ouyang, L. M. Kaplan, and et al. Debiasing crowdsourced quantitative characteristics in local businesses and services. In IPSN, pages 190--201, 2015. Google ScholarDigital Library
A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In CIKM, pages 1203--1212, 2012. Google ScholarDigital Library
H. Park and J. Widom. Query optimization over crowdsourced data. PVLDB, 6(10):781--792, 2013. Google ScholarDigital Library
T. Pfeiffer, X. A. Gao, Y. Chen, A. Mao, and D. G. Rand. Adaptive polling for information aggregation. In AAAI, 2012. Google ScholarDigital Library
P. Venetis, H. Garcia-Molina, K. Huang, and N. Polyzotis. Max algorithms in crowdsourcing environments. In WWW, pages 989--998, 2012. Google ScholarDigital Library
V. Verroios and H. Garcia-Molina. Entity resolution with crowd errors. In ICDE, pages 219--230, 2015.Google ScholarCross Ref
N. Vesdapunt, K. Bellare, and N. N. Dalvi. Crowdsourcing algorithms for entity resolution. PVLDB, 2014. Google ScholarDigital Library
J. Wang, T. Kraska, M. J. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. PVLDB, 2012. Google ScholarDigital Library
J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. Leveraging transitive relations for crowdsourced joins. In SIGMOD, pages 229--240, 2013. Google ScholarDigital Library
S. Wang, X. Xiao, and C. Lee. Crowd-based deduplication: An adaptive approach. In SIGMOD, pages 1263--1277, 2015. Google ScholarDigital Library
S. E. Whang, P. Lofgren, and H. Garcia-Molina. Question selection for crowd entity resolution. PVLDB, 2013. Google ScholarDigital Library
Y. Zheng, R. Cheng, S. Maniu, and L. Mo. On optimality of jury selection in crowdsourcing. In EDBT, pages 193--204, 2015.Google Scholar
Y. Zheng, J. Wang, G. Li, R. Cheng, and J. Feng. QASCA: A quality-aware task assignment system for crowdsourcing applications. In SIGMOD, pages 1031--1046, 2015. Google ScholarDigital Library

Index Terms

Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
        Join algorithms
  2. World Wide Web
    1. Web applications
      1. Crowdsourcing

Recommendations

A partial-order-based framework for cost-effective crowdsourced entity resolution

Crowdsourced entity resolution has recently attracted significant attentions because it can harness the wisdom of crowd to improve the quality of entity resolution. However, existing techniques either cannot achieve high quality or incur huge monetary ...
Read More
Cost-effective crowdsourced join queries for entity resolution without prior knowledge
Abstract
The join query, which finds matching pairs from two object sets, is a fundamental operation in computer systems and helps to solve many real problems, e.g., entity resolution. In this paper, we address the problem of join queries by ...
Highlights
- To leverage crowdsourcing to obtain matching relationships in join queries.
- ...
Read More
Attribute-based Crowd Entity Resolution
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

We study the problem of using the crowd to perform entity resolution (ER) on a set of records. For many types of records, especially those involving images, such a task can be difficult for machines, but relatively easy for humans. Typical crowd-based ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
General Chairs:
Fatma Özcan
IBM Research, USA
,
Georgia Koutrika
HP Labs, USA
,
Program Chair:
Sam Madden
Massachusetts Institute of Technology, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowdsourcing
entity resolution
partial order
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 67
  Total Citations
  View Citations
- 674
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

A partial-order-based framework for cost-effective crowdsourced entity resolution

Cost-effective crowdsourced join queries for entity resolution without prior knowledge

Attribute-based Crowd Entity Resolution

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach

SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

A partial-order-based framework for cost-effective crowdsourced entity resolution

Cost-effective crowdsourced join queries for entity resolution without prior knowledge

Attribute-based Crowd Entity Resolution

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media