ABSTRACT
The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites often provide conflicting information on a subject, such as different specifications for the same product. In this paper we propose a new problem called Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. Our experiments show that TruthFinder successfully finds true facts among conflicting information, and identifies trustworthy web sites better than the popular search engines.
Supplemental Material
- A. Borodin, G. Roberts, J. Rosenthal, P. Tsaparas. Link analysis ranking: Algorithms, theory, and experiments. ACM Transactions on Internet Technology, 5(1):231--297, 2005. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, 1998. Google ScholarDigital Library
- Logistical Equation from Wolfram MathWorld. http://mathworld.wolfram.com/LogisticEquation.htmlGoogle Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.Google Scholar
- Princeton Survey Research Associates International. Leap of faith: using the Internet despite the dangers. Results of a National Survey of Internet Users for Consumer Reports WebWatch, Oct 2005.Google Scholar
- Sigmoid Function from Wolfram MathWorld. http://mathworld.wolfram.com/SigmoidFunction.htmlGoogle Scholar
Index Terms
- Truth discovery with multiple conflicting information providers on the web
Recommendations
Truth Discovery with Multiple Conflicting Information Providers on the Web
The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites often provide conflicting information on a subject, ...
Current challenges in web crawling
ICWE'13: Proceedings of the 13th international conference on Web EngineeringWeb crawling, a process of collecting web pages in an automated manner, is the primary and ubiquitous operation used by a large number of web systems and agents starting from a simple program for website backup to a major web search engine. Due to an ...
European political trends viewed through patterns of Web linking
This study explored the feasibility of using Web hyperlink data to study European political Web sites. Ninety-six European Union (EU) political parties belonging to a wide range of ideological, historical, and linguistic backgrounds were included in the ...
Comments