skip to main content
10.1145/2588555.2610509acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation

Authors Info & Claims
Published:18 June 2014Publication History

ABSTRACT

In many applications, one can obtain descriptions about the same objects or events from a variety of sources. As a result, this will inevitably lead to data or information conflicts. One important problem is to identify the true information (i.e., the truths) among conflicting sources of data. It is intuitive to trust reliable sources more when deriving the truths, but it is usually unknown which one is more reliable a priori. Moreover, each source possesses a variety of properties with different data types. An accurate estimation of source reliability has to be made by modeling multiple properties in a unified model. Existing conflict resolution work either does not conduct source reliability estimation, or models multiple properties separately. In this paper, we propose to resolve conflicts among multiple sources of heterogeneous data types. We model the problem using an optimization framework where truths and source reliability are defined as two sets of unknown variables. The objective is to minimize the overall weighted deviation between the truths and the multi-source observations where each source is weighted by its reliability. Different loss functions can be incorporated into this framework to recognize the characteristics of various data types, and efficient computation approaches are developed. Experiments on real-world weather, stock and flight data as well as simulated multi-source data demonstrate the necessity of jointly modeling different data types in the proposed framework.

References

  1. A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with bregman divergences. JMLR, 6:1705--1749, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. P. Bertsekas. Non-linear programming. Athena Scientific, 1999.Google ScholarGoogle Scholar
  3. L. Blanco, V. Crescenzi, P. Merialdo, and P. Papotti. Probabilistic models to reconcile complex data from inaccurate data sources. In Proc. of CAiSE, pages 83--97, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bleiholder and F. Naumann. Conflict handling strategies in an integrated information system. In Proc. of IIWeb, 2006.Google ScholarGoogle Scholar
  5. J. Bleiholder and F. Naumann. Data fusion. ACM Computing Surveys, 41(1):1:1--1:41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. In NIPS, pages 281--288, 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Dai, D. Lin, E. Bertino, and M. Kantarcioglu. An approach to evaluate data trustworthiness based on data provenance. In Proc. of SDM, pages 82--98, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. PVLDB, 2(1):550--561, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. L. Dong and F. Naumann. Data fusion: Resolving data conflicts for integration. PVLDB, 2(2):1654--1655, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. L. Dong and D. Srivastava. Big data integration. In Proc. of ICDE, pages 1245--1248, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In Proc. of WSDM, pages 131--140, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Jiang. A decision-theoretic framework for numerical attribute value reconciliation. TKDE, 24(7):1153--1169, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Kasneci, J. V. Gael, D. H. Stern, and T. Graepel. Cobayes: Bayesian knowledge corroboration with assessors of unknown areas of expertise. In Proc.\ of WSDM, pages 465--474, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Marian and M. Wu. Corroborating information from web sources. IEEE Data Engineering Bulletin, 34(3):11--17, 2011.Google ScholarGoogle Scholar
  17. J. Nocedal and S. Wright. Numerical optimization. Springer, 2006.Google ScholarGoogle Scholar
  18. J. Pasternack and D. Roth. Making better informed trust decisions with generalized fact-finding. In Proc. of IJCAI, pages 2324--2329, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G.-J. Qi, C. C. Aggarwal, J. Han, and T. Huang. Mining collective intelligence in diverse groups. In Proc. of WWW, pages 1041--1052, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. JOTA, 109(3):475--494, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. V. Vydiswaran, C. Zhai, and D. Roth. Content-driven trust propagation framework. In Proc. of KDD, pages 974--982, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In Proc. of KDD, pages 1048--1052, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In Proc. of QDB, 2012.Google ScholarGoogle Scholar
  24. B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6):550--561, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
            June 2014
            1645 pages
            ISBN:9781450323765
            DOI:10.1145/2588555

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 18 June 2014

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            SIGMOD '14 Paper Acceptance Rate107of421submissions,25%Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader