skip to main content
research-article

A Survey on Truth Discovery

Published:25 February 2016Publication History
Skip Abstract Section

Abstract

Thanks to information explosion, data for the objects of interest can be collected from increasingly more sources. However, for the same object, there usually exist conflicts among the collected multi-source information. To tackle this challenge, truth discovery, which integrates multi-source noisy information by estimating the reliability of each source, has emerged as a hot topic. Several truth discovery methods have been proposed for various scenarios, and they have been successfully applied in diverse application domains. In this survey, we focus on providing a comprehensive overview of truth discovery methods, and summarizing them from different aspects. We also discuss some future directions of truth discovery research. We hope that this survey will promote a better understanding of the current progress on truth discovery, and offer some guidelines on how to apply these approaches in application domains.

References

  1. Amazon mechanical turk. https://www.mturk.com/mturk/welcome.Google ScholarGoogle Scholar
  2. Freebase. https://www.freebase.com/.Google ScholarGoogle Scholar
  3. Google knowledge graph. http://www.google.com/insidesearch/features/search/knowledge.html.Google ScholarGoogle Scholar
  4. Yago. http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/.Google ScholarGoogle Scholar
  5. C. C. Aggarwal and T. Abdelzaher. Social sensing. In Managing and mining sensor data, pages 237--297. 2013.Google ScholarGoogle ScholarCross RefCross Ref
  6. B. Aydin, Y. Yilmaz, Y. Li, Q. Li, J. Gao, and M. Demirbas. Crowdsourcing for multiple-choice question answering. In Proc. of the Conference on Innovative Applications of Artificial Intelligence (IAAI'14), pages 2946--2953, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. P. Bertsekas. Non-linear Programming. Athena Scientific, 2nd edition, 1999.Google ScholarGoogle Scholar
  8. S. Bickel and T. Scheffer. Multi-view clustering. In Proc. of the IEEE International Conference on Data Mining (ICDM'04), pages 19--26, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Bleiholder and F. Naumann. Conflict handling strategies in an integrated information system. In Proc. of the International Workshop on Information Integration on the Web (IIWeb'06), 2006.Google ScholarGoogle Scholar
  10. J. Bleiholder and F. Naumann. Data fusion. ACM Computing Surveys, 41(1):1:1--1:41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proc. of the annual conference on Computational learning theory (COLT'98), pages 92--100, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. P. Dawid and A. M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics, pages 20--28, 1979.Google ScholarGoogle Scholar
  13. R. DerSimonian and N. Laird. Meta-analysis in clinical trials. Controlled clinical trials, 7(3):177--188, 1986.Google ScholarGoogle Scholar
  14. X. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 3(1):1358--1369, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: The role of source dependence. PVLDB, 2(1):550--561, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1):562--573, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14), pages 601--610, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 7(10):881--892, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. L. Dong, E. Gabrilovich, K. Murphy, V. Dang, W. Horn, C. Lugaresi, S. Sun, and W. Zhang. Knowledge-based trust: Estimating the trustworthiness of web sources. PVLDB, 8(9):938--949, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. X. L. Dong and F. Naumann. Data fusion: Resolving data conflicts for integration. PVLDB, 2(2):1654--1655, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. PVLDB, 6(2):37--48, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. L. Dong and D. Srivastava. Compact explanation of data fusion decisions. In Proc. of the International Conference on World Wide Web (WWW'13), pages 379--390, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In Proc. of the International Conference on World Wide Web (WWW'01), pages 613--622, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Galland, S. Abiteboul, A. Marian, and P. Senellart. Corroborating information from disagreeing views. In Proc. of the ACM International Conference on Web Search and Data Mining (WSDM'10), pages 131--140, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Gupta, Y. Sun, and J. Han. Trust analysis with clustering. In Proc. of the International Conference on World Wide Web (WWW'11), pages 53--54, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Le, D. Wang, H. Ahmadi, Y. S. Uddin, B. Szymanski, R. Ganti, and T. Abdelzaher. Demo: Distilling likely truth from noisy streaming data with apollo. In Proc. of the ACM International Conference on Embedded Networked Sensor Systems (Sensys'11), pages 417--418, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Li, M. L. Lee, and W. Hsu. Entity profiling with varying source reliabilities. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14), pages 1146--1155, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Li, B. Zhao, and A. Fuxman. The wisdom of minority: discovering and targeting the right group of workers for crowdsourcing. In Proc. of the International Conference on World Wide Web (WWW'14), pages 165--176, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Q. Li, Y. Li, J. Gao, L. Su, B. Zhao, D. Murat, W. Fan, and J. Han. A confidence-aware approach for truth discovery on long-tail data. PVLDB, 8(4):425--436, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD'14), pages 1187--1198, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the deep web: Is the problem solved? PVLDB, 6(2):97--108, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Y. Li, Q. Li, J. Gao, L. Su, B. Zhao, W. Fan, and J. Han. On the discovery of evolving truth. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15), pages 675--684, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. Lin. Rank aggregation methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2(5):555--570, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. W. Lipsey and D. B. Wilson. Practical metaanalysis, volume 49. 2001.Google ScholarGoogle Scholar
  35. X. Liu, X. L. Dong, B. C. Ooi, and D. Srivastava. Online data fusion. PVLDB, 4(11):932--943, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. C. Luo, C.-C. Yih, and K. L. Su. Multisensor fusion and integration: approaches, applications, and future research directions. IEEE Sensors Journal, 2(2):107--119, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  37. F. Ma, Y. Li, Q. Li, M. Qiu, J. Gao, S. Zhi, L. Su, B. Zhao, H. Ji, and J. Han. Faitcrowd: Fine grained truth discovery for crowdsourced data aggregation. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15), pages 745--754, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. A. Marian and M. Wu. Corroborating information from web sources. IEEE Data Engineering Bulletin, 34(3):11--17, 2011.Google ScholarGoogle Scholar
  39. C. Meng, W. Jiang, Y. Li, J. Gao, L. Su, H. Ding, and Y. Cheng. Truth discovery on crowd sensing of correlated entities. In Proc. of the ACM International Conference on Embedded Networked Sensor Systems (Sensys'15), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. C. Miao, W. Jiang, L. Su, Y. Li, S. Guo, Z. Qin, H. Xiao, J. Gao, and K. Ren. Cloud-enabled privacypreserving truth discovery in crowd sensing systems. In Proc. of the ACM International Conference on Embedded Networked Sensor Systems (Sensys'15), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. B. Mitchell. Multi-sensor data fusion: an introduction. Springer Science & Business Media, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. S. Mukherjee, G. Weikum, and C. Danescu-Niculescu- Mizil. People on drugs: credibility of user statements in health communities. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'14), pages 65--74, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. V.-A. Nguyen, E.-P. Lim, J. Jiang, and A. Sun. To trust or not to trust? predicting online trusts using trust antecedent framework. In Proc. of the IEEE International Conference on Data Mining (ICDM'09), pages 896--901, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. J. O'Donovan and B. Smyth. Trust in recommender systems. In Proc. of the international conference on Intelligent user interfaces (IUI'05), pages 167--174, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Pasternack and D. Roth. Comprehensive trust metrics for information networks. In Army Science Conference, 2010.Google ScholarGoogle Scholar
  46. J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In Proc. of the International Conference on Computational Linguistics (COLING'10), pages 877--885, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Pasternack and D. Roth. Making better informed trust decisions with generalized fact-finding. In Proc. of the International Jont Conference on Artifical Intelligence (IJCAI'11), pages 2324--2329, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. J. Pasternack and D. Roth. Latent credibility analysis. In Proc. of the International Conference on World Wide Web (WWW'13), pages 1009--1020, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. R. Pochampally, A. D. Sarma, X. L. Dong, A. Meliou, and D. Srivastava. Fusing data with correlations. In Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD'14), pages 433--444, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. G.-J. Qi, C. C. Aggarwal, J. Han, and T. Huang. Mining collective intelligence in diverse groups. In Proc. of the International Conference on World Wide Web (WWW'13), pages 1041--1052, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. V. C. Raykar, S. Yu, L. H. Zhao, A. Jerebko, C. Florin, G. H. Valadez, L. Bogoni, and L. Moy. Supervised learning from multiple experts: Whom to trust when everyone lies a bit. In Proc. of the International Conference on Machine Learning (ICML'09), pages 889--896, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. T. Rekatsinas, X. L. Dong, and D. Srivastava. Characterizing and selecting fresh data sources. In Proc. of the ACM SIGMOD International Conference on Management of Data (SIGMOD'14), pages 919--930, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. D. Sarma, X. L. Dong, and A. Halevy. Data integration with dependent sources. In Proc. of the International Conference on Extending Database Technology (EDBT'11), pages 401--412, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. G. Seni and J. F. Elder. Ensemble methods in data mining: improving accuracy through combining predictions. nSynthesis Lectures on Data Mining and Knowledge Discovery, 2(1):1--126, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labels. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'08), pages 614--622, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. P. Smyth, U. Fayyad, M. Burl, P. Perona, and P. Baldi. Inferring ground truth from subjective labelling of venus images. In Advances in Neural Information Processing Systems (NIPS'95), pages 1085--1092, 1995.Google ScholarGoogle Scholar
  57. R. Snow, B. O'Connor, D. Jurafsky, and A. Ng. Cheap and fast - but is it good? evaluating non-expert annotations for natural language tasks. In Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP'08), pages 254--263, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. A. Sorokin and D. Forsyth. Utility data annotation with amazon mechanical turk. In Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'08), pages 1--8, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  59. M. Spain and P. Perona. Some objects are more equal than others: Measuring and predicting importance. In Proc. European Conference on Computer Vision (ECCV'08), pages 523--536, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. L. Su, Q. Li, S. Hu, S. Wang, J. Gao, H. Liu, T. Abdelzaher, J. Han, X. Liu, Y. Gao, and L. Kaplan. Generalized decision aggregation in distributed sensing systems. In Proc. of the IEEE Real-Time Systems Symposium (RTSS'14), pages 1--10, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  61. J. Tang and H. Liu. Trust in social computing. In Proc. of the international conference on World wide web companion, pages 207--208, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. L.-A. Tang, X. Yu, S. Kim, Q. Gu, J. Han, A. Leung, and T. La Porta. Trustworthiness analysis of sensor data in cyber-physical systems. Journal of Computer and System Sciences, 79(3):383--401, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. P. Victor, M. De Cock, and C. Cornelis. Trust and recommendations. In Recommender systems handbook, pages 645--675. Springer, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  64. D. Wang, T. Abdelzaher, L. Kaplan, and C. Aggarwal. Recursive fact-finding: A streaming approach to truth estimation in crowdsourcing applications. In Proc. of the International Conference on Distributed Computing Systems (ICDCS'13), pages 530--539, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. D. Wang, M. T. Amin, S. Li, T. Abdelzaher, L. Kaplan, S. Gu, C. Pan, H. Liu, C. C. Aggarwal, R. Ganti, et al. Using humans as sensors: An estimation-theoretic perspective. In Proc. of the International Conference on Information Processing in Sensor Networks (IPSN'14), pages 35--46, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. D. Wang, L. Kaplan, and T. F. Abdelzaher. Maximum likelihood analysis of conflicting observations in social sensing. ACM Transactions on Sensor Networks (ToSN), 10(2):30, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. D. Wang, L. Kaplan, H. Le, and T. Abdelzaher. On truth discovery in social sensing: A maximum likelihood estimation approach. In Proc. of the International Conference on Information Processing in Sensor Networks (IPSN'12), pages 233--244, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. S. Wang, L. Su, S. Li, S. Yao, S. Hu, L. Kaplan, T. Amin, T. Abdelzaher, and W. Hongwei. Scalable social sensing of interdependent phenomena. In Proc. of the International Conference on Information Processing in Sensor Networks (IPSN'15), pages 202--213, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. S. Wang, D. Wang, L. Su, L. Kaplan, and T. Abdelzaher. Towards cyber-physical systems in social spaces: The data reliability challenge. In Proc. of the IEEE Real-Time Systems Symposium (RTSS'14), pages 74--85, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  70. P. Welinder, S. Branson, S. Belongie, and P. Perona. The multidimensional wisdom of crowds. In Advances in Neural Information Processing Systems (NIPS'10), pages 2424--2432, 2010.Google ScholarGoogle Scholar
  71. J. Whitehill, P. Ruvolo, T. Wu, J. Bergsma, and J. Movellan. Whose vote should count more: Optimal integration of labelers of unknown expertise. In Advances in Neural Information Processing Systems (NIPS'09), pages 2035--2043, 2009.Google ScholarGoogle Scholar
  72. M. Wu and A. Marian. A framework for corroborating answers from multiple web sources. Information Systems, 36(2):431--449, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. C. Xu, D. Tao, and C. Xu. A survey on multi-view learning. arXiv preprint arXiv:1304.5634, 2013.Google ScholarGoogle Scholar
  74. X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07), pages 1048--1052, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. X. Yin and W. Tan. Semi-supervised truth discovery. In Proc. of the International Conference on World Wide Web (WWW'11), pages 217--226, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. D. Yu, H. Huang, T. Cassidy, H. Ji, C. Wang, S. Zhi, J. Han, C. Voss, and M. Magdon-Ismail. The wisdom of minority: Unsupervised slot filling validation based on multi-dimensional truth-finding. In Proc. of the International Conference on Computational Linguistics (COLING'14), 2014.Google ScholarGoogle Scholar
  77. B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In Proc. of the VLDB workshop on Quality in Databases (QDB'12), 2012.Google ScholarGoogle Scholar
  78. B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6):550--561, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Z. Zhao, J. Cheng, and W. Ng. Truth discovery in data streams: A single-pass probabilistic approach. In Proc. of the ACM Conference on Information and Knowledge Management (CIKM'14), pages 1589--1598, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. S. Zhi, B. Zhao, W. Tong, J. Gao, D. Yu, H. Ji, and J. Han. Modeling truth existence in truth discovery. In Proc. of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'15), pages 1543--1552, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. D. Zhou, J. C. Platt, S. Basu, and Y. Mao. Learning from the wisdom of crowds by minimax entropy. In Advances in Neural Information Processing Systems (NIPS'12), pages 2204--2212, 2012.Google ScholarGoogle Scholar
  82. Z.-H. Zhou. Ensemble methods: foundations and algorithms. Chapman & Hall/CRC Machine Learning & Pattern Recognition Series, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Survey on Truth Discovery
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGKDD Explorations Newsletter
      ACM SIGKDD Explorations Newsletter  Volume 17, Issue 2
      December 2015
      41 pages
      ISSN:1931-0145
      EISSN:1931-0153
      DOI:10.1145/2897350
      Issue’s Table of Contents

      Copyright © 2016 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 February 2016

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader