skip to main content
article

Reducing uncertainty of schema matching via crowdsourcing

Authors Info & Claims
Published:01 July 2013Publication History
Skip Abstract Section

Abstract

Schema matching is a central challenge for data integration systems. Automated tools are often uncertain about schema matchings they suggest, and this uncertainty is inherent since it arises from the inability of the schema to fully capture the semantics of the represented data. Human common sense can often help. Inspired by the popularity and the success of easily accessible crowdsourcing platforms, we explore the use of crowdsourcing to reduce the uncertainty of schema matching.

Since it is typical to ask simple questions on crowdsourcing platforms, we assume that each question, namely Correspondence Correctness Question (CCQ), is to ask the crowd to decide whether a given correspondence should exist in the correct matching. We propose frameworks and efficient algorithms to dynamically manage the CCQs, in order to maximize the uncertainty reduction within a limited budget of questions. We develop two novel approaches, namely "Single CCQ" and "Multiple CCQ", which adaptively select, publish and manage the questions. We verified the value of our solutions with simulation and real implementation.

References

  1. L. Detwiler, W. Gatterbauer, B. Louie, D. Suciu, and P. Tarczy-Hornoch. Integrating and ranking uncertain scientific data. In ICDE, pages 1235-1238, 2009. Google ScholarGoogle Scholar
  2. A. Doan, R. Ramakrishnan, and A. Y. Halevy. Crowdsourcing systems on the world-wide web. Commun. ACM, 54(4):86-96, 2011. Google ScholarGoogle Scholar
  3. X. L. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. VLDB J., 18(2):469-500, 2009. Google ScholarGoogle Scholar
  4. M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. Crowddb: answering queries with crowdsourcing. In SIGMOD Conference, pages 61-72, 2011. Google ScholarGoogle Scholar
  5. A. Gal. Managing uncertainty in schema matching with top-k schema mappings. J. Data Semantics VI, pages 90-114, 2006. Google ScholarGoogle Scholar
  6. A. Gal. Uncertain Schema Matching. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011.Google ScholarGoogle Scholar
  7. A. Gal, A. Anaby-Tavor, A. Trombetta, and D. Montesi. A framework for modeling and evaluating automatic semantic reconciliation. VLDB J., 14(1):50-67, 2005. Google ScholarGoogle Scholar
  8. A. Gal, M. V. Martinez, G. I. Simari, and V. S. Subrahmanian. Aggregate query answering under uncertain schema mappings. In ICDE, pages 940-951, 2009. Google ScholarGoogle Scholar
  9. J. Huang, L. Antova, C. Koch, and D. Olteanu. Maybms: a probabilistic database management system. In SIGMOD Conference, 2009. Google ScholarGoogle Scholar
  10. S. Khuller, A. Moss, and J. Naor. The budgeted maximum coverage problem. Inf. Process. Lett., 70(1):39-45, 1999. Google ScholarGoogle Scholar
  11. A. Krause and C. Guestrin. A note on the budgeted maximization on submodular functions. (CMU-CALD-05-103), 2005.Google ScholarGoogle Scholar
  12. P. Lemay. The Statistical Analysis of Dynamics and Complexity in Psychology: A Configural Approach. Université de Lausanne, Faculté des sciences sociales et politiques, 1999.Google ScholarGoogle Scholar
  13. R. McCann, W. Shen, and A. Doan. Matching schemas in online communities: A web 2.0 approach. In ICDE [13], pages 110-119. Google ScholarGoogle Scholar
  14. R. J. Miller, L. M. Haas, and M. A. Hernández. Schema mapping as query discovery. In VLDB, pages 77-88, 2000. Google ScholarGoogle Scholar
  15. B. Mozafari, P. Sarkar, M. J. Franklin, M. I. Jordan, and S. Madden. Active learning for crowd-sourced databases. CoRR, abs/1209.3686, 2012.Google ScholarGoogle Scholar
  16. A. G. Parameswaran and N. Polyzotis. Answering queries using humans, algorithms and databases. In CIDR, pages 160-166, 2011.Google ScholarGoogle Scholar
  17. A. G. Parameswaran, A. D. Sarma, H. Garcia-Molina, N. Polyzotis, and J. Widom. Human-assisted graph search: it's okay to ask questions. PVLDB, 4(5):267-278, 2011. Google ScholarGoogle Scholar
  18. L. Popa, Y. Velegrakis, R. J. Miller, M. A. Hernández, and R. Fagin. Translating web data. In VLDB, pages 598-609, 2002. Google ScholarGoogle Scholar
  19. Y. Qi, K. S. Candan, and M. L. Sapino. Ficsr: feedback-based inconsistency resolution and query processing on misaligned data sources. In SIGMOD Conference, pages 151-162, 2007. Google ScholarGoogle Scholar
  20. E. Rahm and P. A. Bernstein. A survey of approaches to automatic schema matching. VLDB J., 10(4):334-350, 2001. Google ScholarGoogle Scholar
  21. A. D. Sarma, X. Dong, and A. Y. Halevy. Bootstrapping pay-as-you-go data integration systems. In SIGMOD Conference, pages 861-874, 2008. Google ScholarGoogle Scholar
  22. B. Settles. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2012.Google ScholarGoogle Scholar
  23. Y. Tong, L. Chen, Y. Cheng, and P. S. Yu. Mining frequent itemsets over uncertain databases. PVLDB, 5(11):1650-1661, 2012. Google ScholarGoogle Scholar
  24. Y. Tong, L. Chen, and B. Ding. Discovering threshold-based frequent closed itemsets over probabilistic data. In ICDE, pages 270-281, 2012. Google ScholarGoogle Scholar
  25. J. Wang, T. Kraska, M. J. Franklin, and J. Feng. Crowder: Crowdsourcing entity resolution. PVLDB, 5(11):1483-1494, 2012. Google ScholarGoogle Scholar
  26. L. Zhao, G. Sukthankar, and R. Sukthankar. Robust active learning using crowdsourced annotations for activity recognition. In Human Computation, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Reducing uncertainty of schema matching via crowdsourcing
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 6, Issue 9
      July 2013
      180 pages

      Publisher

      VLDB Endowment

      Publication History

      • Published: 1 July 2013
      Published in pvldb Volume 6, Issue 9

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader