skip to main content
research-article

Knowledge-based trust: estimating the trustworthiness of web sources

Published:01 May 2015Publication History
Skip Abstract Section

Abstract

The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.

The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model.

We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method.

References

  1. J. Bleiholder and F. Naumann. Data fusion. ACM Computing Surveys, 41(1): 1--41, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247--1250, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Borodin, G. Roberts, J. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. TOIT, 5: 231--297, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7): 107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In SIGIR, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: Easy, efficient data-parallel pipelines. In PLDI, pages 363--375, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. X. L. Dong and F. Naumann. Data fusion--resolving data conflicts for integration. PVLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. PVLDB, 6, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open information extraction: the second generation. In IJCAI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, pages 413--422, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Top 15 most popular celebrity gossip websites. http://www.ebizmba.com/articles/gossip-websites, 2014.Google ScholarGoogle Scholar
  17. Z. Gyngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In VLDB, pages 576--587, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Kamvar, M. Schlosser, and H. Garcia-Molina. The Eigentrust algorithm for reputation management in P2P networks. In WWW, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In AIRWeb, 2006.Google ScholarGoogle Scholar
  21. Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, pages 1187--1198, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the Deep Web: Is the problem solved? PVLDB, 6(2), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Scaling up copy detection. In ICDE, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, pages 877--885, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Pasternack and D. Roth. Making better informed trust decisions with generalized fact-finding. In IJCAI, pages 2324--2329, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Pasternack and D. Roth. Latent credibility analysis. In WWW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. Pochampally, A. D. Sarma, X. L. Dong, A. Meliou, and D. Srivastava. Fusing data with correlations. In Sigmod, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Singh and L. Liu. TrustMe: anonymous management of trust relationshiops in decentralized P2P systems. In IEEE Intl. Conf. on Peer-to-Peer Computing, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Wu and A. Marian. Corroborating answers from multiple web sources. In Proc. of the WebDB Workshop, 2007.Google ScholarGoogle Scholar
  30. X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In Proc. of SIGKDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. X. Yin and W. Tan. Semi-supervised truth discovery. In WWW, pages 217--226, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In QDB, 2012.Google ScholarGoogle Scholar
  33. B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A Bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6): 550--561, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Knowledge-based trust: estimating the trustworthiness of web sources

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Proceedings of the VLDB Endowment
              Proceedings of the VLDB Endowment  Volume 8, Issue 9
              May 2015
              76 pages

              Publisher

              VLDB Endowment

              Publication History

              • Published: 1 May 2015
              Published in pvldb Volume 8, Issue 9

              Qualifiers

              • research-article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader