article

Reducing uncertainty of schema matching via crowdsourcing

Authors:
Chen Jason Zhang

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
Lei Chen

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

,
H. V. Jagadish

University of Michigan, Ann Arbor, MI

University of Michigan, Ann Arbor, MI
View Profile

,
Chen Caleb Cao

Hong Kong University of Science and Technology, Hong Kong, China

Hong Kong University of Science and Technology, Hong Kong, China
View Profile

Proceedings of the VLDB Endowment Volume 6 Issue 9pp 757–768https://doi.org/10.14778/2536360.2536374

Published:01 July 2013Publication History

Proceedings of the VLDB Endowment

Abstract

Schema matching is a central challenge for data integration systems. Automated tools are often uncertain about schema matchings they suggest, and this uncertainty is inherent since it arises from the inability of the schema to fully capture the semantics of the represented data. Human common sense can often help. Inspired by the popularity and the success of easily accessible crowdsourcing platforms, we explore the use of crowdsourcing to reduce the uncertainty of schema matching.

Since it is typical to ask simple questions on crowdsourcing platforms, we assume that each question, namely Correspondence Correctness Question (CCQ), is to ask the crowd to decide whether a given correspondence should exist in the correct matching. We propose frameworks and efficient algorithms to dynamically manage the CCQs, in order to maximize the uncertainty reduction within a limited budget of questions. We develop two novel approaches, namely "Single CCQ" and "Multiple CCQ", which adaptively select, publish and manage the questions. We verified the value of our solutions with simulation and real implementation.

References

L. Detwiler, W. Gatterbauer, B. Louie, D. Suciu, and P. Tarczy-Hornoch. Integrating and ranking uncertain scientific data. In ICDE, pages 1235-1238, 2009. Google Scholar
A. Doan, R. Ramakrishnan, and A. Y. Halevy. Crowdsourcing systems on the world-wide web. Commun. ACM, 54(4):86-96, 2011. Google Scholar
X. L. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. VLDB J., 18(2):469-500, 2009. Google Scholar
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD Conference, pages 61-72, 2011. Google Scholar
A. Gal. Managing uncertainty in schema matching with top-k schema mappings. J. Data Semantics VI, pages 90-114, 2006. Google Scholar
A. Gal. Uncertain Schema Matching. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.Google Scholar
A. Gal, A. Anaby-Tavor, A. Trombetta, and D. Montesi. A framework for modeling and evaluating automatic semantic reconciliation. VLDB J., 14(1):50-67, 2005. Google Scholar
A. Gal, M. V. Martinez, G. I. Simari, and V. S. Subrahmanian. Aggregate query answering under uncertain schema mappings. In ICDE, pages 940-951, 2009. Google Scholar
J. Huang, L. Antova, C. Koch, and D. Olteanu. Maybms: a probabilistic database management system. In SIGMOD Conference, 2009. Google Scholar
S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Inf. Process. Lett., 70(1):39-45, 1999. Google Scholar
A. Krause and C. Guestrin. A note on the budgeted maximization on submodular functions. (CMU-CALD-05-103), 2005.Google Scholar
P. Lemay. The Statistical Analysis of Dynamics and Complexity in Psychology: A Configural Approach. Université de Lausanne, Faculté des sciences sociales et politiques, 1999.Google Scholar
R. McCann, W. Shen, and A. Doan. Matching schemas in online communities: A web 2.0 approach. In ICDE [13], pages 110-119. Google Scholar
R. J. Miller, L. M. Haas, and M. A. Hernández. Schema mapping as query discovery. In VLDB, pages 77-88, 2000. Google Scholar
B. Mozafari, P. Sarkar, M. J. Franklin, M. I. Jordan, and S. Madden. Active learning for crowd-sourced databases. CoRR, abs/1209.3686, 2012.Google Scholar
A. G. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, pages 160-166, 2011.Google Scholar
A. G. Parameswaran, A. D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it's okay to ask questions. PVLDB, 4(5):267-278, 2011. Google Scholar
L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernández, and R. Fagin. Translating web data. In VLDB, pages 598-609, 2002. Google Scholar
Y. Qi, K. S. Candan, and M. L. Sapino. Ficsr: feedback-based inconsistency resolution and query processing on misaligned data sources. In SIGMOD Conference, pages 151-162, 2007. Google Scholar
E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB J., 10(4):334-350, 2001. Google Scholar
A. D. Sarma, X. Dong, and A. Y. Halevy. Bootstrapping pay-as-you-go data integration systems. In SIGMOD Conference, pages 861-874, 2008. Google Scholar
B. Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2012.Google Scholar
Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. PVLDB, 5(11):1650-1661, 2012. Google Scholar
Y. Tong, L. Chen, and B. Ding. Discovering threshold-based frequent closed itemsets over probabilistic data. In ICDE, pages 270-281, 2012. Google Scholar
J. Wang, T. Kraska, M. J. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. PVLDB, 5(11):1483-1494, 2012. Google Scholar
L. Zhao, G. Sukthankar, and R. Sukthankar. Robust active learning using crowdsourced annotations for activity recognition. In Human Computation, 2011.Google Scholar

Index Terms

Reducing uncertainty of schema matching via crowdsourcing
1. Information systems
  1. Data management systems
    1. Database management system engines

Index terms have been assigned to the content through auto-classification.

Recommendations

A schema matching-based approach to XML schema clustering
iiWAS '08: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services

The relationship between XML data clustering and schema matching is bidirectional. On one side, clustering techniques have been adopted to improve matching performance, and on the other side schema matching is the backbone of the clustering technique. ...
Read More
Uncertain Schema Matching
Read More
Efficient management of uncertainty in XML schema matching

Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of "possible mappings" between the schemas may be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the VLDB Endowment Volume 6, Issue 9
July 2013
180 pages
ISSN:2150-8097
Issue’s Table of Contents
Sponsors
In-Cooperation
Publisher
VLDB Endowment
Publication History
- Published: 1 July 2013
Published in pvldb Volume 6, Issue 9
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 399
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reducing uncertainty of schema matching via crowdsourcing

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A schema matching-based approach to XML schema clustering

Uncertain Schema Matching

Efficient management of uncertainty in XML schema matching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Reducing uncertainty of schema matching via crowdsourcing

Proceedings of the VLDB Endowment

Abstract

References

Cited By

Index Terms

Recommendations

A schema matching-based approach to XML schema clustering

Uncertain Schema Matching

Efficient management of uncertainty in XML schema matching

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media