Abstract
The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.
The facts are automatically extracted from each source by information extraction methods commonly used to construct knowledge bases. We propose a way to distinguish errors made in the extraction process from factual errors in the web source per se, by using joint inference in a novel multi-layer probabilistic model.
We call the trustworthiness score we computed Knowledge-Based Trust (KBT). On synthetic data, we show that our method can reliably compute the true trustworthiness levels of the sources. We then apply it to a database of 2.8B facts extracted from the web, and thereby estimate the trustworthiness of 119M webpages. Manual evaluation of a subset of the results confirms the effectiveness of the method.
- J. Bleiholder and F. Naumann. Data fusion. ACM Computing Surveys, 41(1): 1--41, 2008. Google ScholarDigital Library
- K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor. Freebase: a collaboratively created graph database for structuring human knowledge. In SIGMOD, pages 1247--1250, 2008. Google ScholarDigital Library
- A. Borodin, G. Roberts, J. Rosenthal, and P. Tsaparas. Link analysis ranking: algorithms, theory, and experiments. TOIT, 5: 231--297, 2005. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7): 107--117, 1998. Google ScholarDigital Library
- C. Castillo, D. Donato, A. Gionis, V. Murdock, and F. Silvestri. Know your neighbors: Web spam detection using the web topology. In SIGIR, 2007. Google ScholarDigital Library
- C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, and N. Weizenbaum. Flumejava: Easy, efficient data-parallel pipelines. In PLDI, pages 363--375, 2010. Google ScholarDigital Library
- X. L. Dong, L. Berti-Equille, Y. Hu, and D. Srivastava. Global detection of complex copying relationships between sources. PVLDB, 2010. Google ScholarDigital Library
- X. L. Dong, L. Berti-Equille, and D. Srivastava. Integrating conflicting data: the role of source dependence. PVLDB, 2(1), 2009. Google ScholarDigital Library
- X. L. Dong, L. Berti-Equille, and D. Srivastava. Truth discovery and copying detection in a dynamic world. PVLDB, 2(1), 2009. Google ScholarDigital Library
- X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In SIGKDD, 2014. Google ScholarDigital Library
- X. L. Dong, E. Gabrilovich, G. Heitz, W. Horn, K. Murphy, S. Sun, and W. Zhang. From data fusion to knowledge fusion. PVLDB, 2014. Google ScholarDigital Library
- X. L. Dong and F. Naumann. Data fusion--resolving data conflicts for integration. PVLDB, 2009. Google ScholarDigital Library
- X. L. Dong, B. Saha, and D. Srivastava. Less is more: Selecting sources wisely for integration. PVLDB, 6, 2013. Google ScholarDigital Library
- O. Etzioni, A. Fader, J. Christensen, S. Soderland, and Mausam. Open information extraction: the second generation. In IJCAI, 2011. Google ScholarDigital Library
- L. A. Galárraga, C. Teflioudi, K. Hose, and F. Suchanek. Amie: association rule mining under incomplete evidence in ontological knowledge bases. In WWW, pages 413--422, 2013. Google ScholarDigital Library
- Top 15 most popular celebrity gossip websites. http://www.ebizmba.com/articles/gossip-websites, 2014.Google Scholar
- Z. Gyngyi, H. Garcia-Molina, and J. Pedersen. Combating web spam with TrustRank. In VLDB, pages 576--587, 2014. Google ScholarDigital Library
- S. Kamvar, M. Schlosser, and H. Garcia-Molina. The Eigentrust algorithm for reputation management in P2P networks. In WWW, 2003. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, 1998. Google ScholarDigital Library
- V. Krishnan and R. Raj. Web spam detection with anti-trust rank. In AIRWeb, 2006.Google Scholar
- Q. Li, Y. Li, J. Gao, B. Zhao, W. Fan, and J. Han. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In SIGMOD, pages 1187--1198, 2014. Google ScholarDigital Library
- X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Truth finding on the Deep Web: Is the problem solved? PVLDB, 6(2), 2013. Google ScholarDigital Library
- X. Li, X. L. Dong, K. B. Lyons, W. Meng, and D. Srivastava. Scaling up copy detection. In ICDE, 2015.Google ScholarCross Ref
- J. Pasternack and D. Roth. Knowing what to believe (when you already know something). In COLING, pages 877--885, 2010. Google ScholarDigital Library
- J. Pasternack and D. Roth. Making better informed trust decisions with generalized fact-finding. In IJCAI, pages 2324--2329, 2011. Google ScholarDigital Library
- J. Pasternack and D. Roth. Latent credibility analysis. In WWW, 2013. Google ScholarDigital Library
- R. Pochampally, A. D. Sarma, X. L. Dong, A. Meliou, and D. Srivastava. Fusing data with correlations. In Sigmod, 2014. Google ScholarDigital Library
- A. Singh and L. Liu. TrustMe: anonymous management of trust relationshiops in decentralized P2P systems. In IEEE Intl. Conf. on Peer-to-Peer Computing, 2003. Google ScholarDigital Library
- M. Wu and A. Marian. Corroborating answers from multiple web sources. In Proc. of the WebDB Workshop, 2007.Google Scholar
- X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the web. In Proc. of SIGKDD, 2007. Google ScholarDigital Library
- X. Yin and W. Tan. Semi-supervised truth discovery. In WWW, pages 217--226, 2011. Google ScholarDigital Library
- B. Zhao and J. Han. A probabilistic model for estimating real-valued truth from conflicting sources. In QDB, 2012.Google Scholar
- B. Zhao, B. I. P. Rubinstein, J. Gemmell, and J. Han. A Bayesian approach to discovering truth from conflicting sources for data integration. PVLDB, 5(6): 550--561, 2012. Google ScholarDigital Library
Index Terms
- Knowledge-based trust: estimating the trustworthiness of web sources
Recommendations
Benevolence trust: a key determinant of user continuance use of online social networks
Online social networking (OSN) has attracted increased attention and growing membership in recent years. In this paper, we propose and test an extended and unified theory of acceptance and use of technology (UTAUT) model, including the additional areas ...
Does Technology Trust Substitute Interpersonal Trust?: Examining Technology Trust's Influence on Individual Decision-Making
While an increasing number of trust studies examine technological artifacts as trust recipients, there is still a lack of basic understanding of how technology trust relates to traditional trust and its role within the broader nomological net ...
How Do Institution-Based Trust and Interpersonal Trust Affect Interdepartmental Knowledge Sharing?
There are two typical forms of trust in organisational settings-institution-based trust and interpersonal trust. The role of interpersonal trust in promoting interdepartmental knowledge sharing has been investigated. The effect of institution-based ...
Comments