Abstract
Our news are saturated with claims of "facts" made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-picking"? This paper proposes a framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate practical fact-checking tasks---reverse-engineering (often intentionally) vague claims, and countering questionable claims---as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of "meta" algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.
- C. C. Aggarwal, editor. Managing and Mining Uncertain Data. Springer, 2009. Google ScholarDigital Library
- P. Agrawal and J. Widom. Confidence-aware join algorithms. ICDE, 2009, 628--639. Google ScholarDigital Library
- A. M. Andrew. Another efficient algorithm for convex hulls in two dimensions. Information Processing Letters, 9(1979), 216--219.Google ScholarCross Ref
- M. A. Bender and M. Farach-Colton. The LCA problem revisited. LATIN, 2000, 88--94. Google ScholarDigital Library
- S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. ICDE, 2001, 421--430. Google ScholarDigital Library
- S Cohen, J. T. Hamilton, and F. Turner. Computational journalism. CACM, 54(2011), 66--71. Google ScholarDigital Library
- S. Cohen, C. Li, J. Yang, and C. Yu. Computational journalism: A call to arms to database researchers. CIDR, 2011.Google Scholar
- Harish D., P. N. Darera, and J. R. Haritsa. Identifying robust plans through plan diagram reduction. VLDB, 2008, 1124--1140. Google ScholarDigital Library
- N. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: Diamonds in the dirt. CACM, 52(2009), 86--94. Google ScholarDigital Library
- J. Fischer and V. Heun. A new succinct representation of rmq-information and improvements in the enhanced suffix array. ESCAPE, 2007, 459--470. Google ScholarDigital Library
- S. Ganguly. Design and analysis of parametric query optimization algorithms. VLDB, 1998, 228--238. Google ScholarDigital Library
- J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. ICDE, 1996, 152--159. Google ScholarDigital Library
- D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM, 13(1984), 338--355. Google ScholarDigital Library
- Z. He and E. Lo. Answering why-not questions on top-k queries. ICDE, 2012, 750--761. Google ScholarDigital Library
- A. Hulgeri and S. Sudarshan. AniPQO: Almost non-intrusive parametric query optimization for nonlinear cost functions. VLDB, 2003, 766--777. Google ScholarDigital Library
- Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis. Parametric query optimization. VLDB, 1992, 103--114. Google ScholarDigital Library
- R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. The Monte Carlo database system: Stochastic analysis close to the data. TODS, 36(2011), 18. Google ScholarDigital Library
- H. T. Kung, F. Luccio, and F. P. Preparata. On finding the maxima of a set of vectors. JACM, 22(1975), 469--476. Google ScholarDigital Library
- X Lin, A. Mukherji, E. A. Rundensteiner, C. Ruiz, and M. O. Ward. PARAS: A parameter space framework for online association mining. VLDB 6(2013), 193--204. Google ScholarDigital Library
- Y. Luo, X. Lin, W. Wang, and X. Zhou. Spark: top-k keyword query in relational databases. SIGMOD, 2007, 115--126. Google ScholarDigital Library
- K. Mouratidis and H. Pang. Computing immutable regions for sub-space top-k queries. VLDB, 6(2012), 73--84. Google ScholarDigital Library
- A. Das Sarma, A. G. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. ICDT, 2010, 89--103. Google ScholarDigital Library
- M. A. Soliman, I. F. Ilyas, D. Martinenghi, and M. Tagliasacchi. Ranking with uncertain scoring functions: Semantics and sensitivity measures. SIGMOD, 2011, 805--816. Google ScholarDigital Library
- R. E. Tarjan. Applications of path compression on balanced trees. JACM, 26(1979), 690--715. Google ScholarDigital Library
- Q. T. Tran and C. Y. Chan. How to ConQueR why-not questions. SIGMOD, 2010, 15--26. Google ScholarDigital Library
- Q. T. Tran, C. Y. Chan, and S. Parthasarathy. Query by output. SIGMOD, 2009, 535--548. Google ScholarDigital Library
- E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. VLDB, 6(2013), 553--564. Google ScholarDigital Library
- Y. Wu, P. K. Agarwal, C. Li, J. Yang, and C. Yu. Toward computational fact-checking. Technical report, Duke University, 2013. http://www.cs.duke.edu/dbgroup/papers/WuAgarwalEtAl-13-fact_check.pdf.Google Scholar
- A. Yu, P. K. Agarwal, and J. Yang. Processing a large number of continuous preference top-k queries. SIGMOD, 2012, 397--408. Google ScholarDigital Library
Recommendations
Toward Automated Fact-Checking: Detecting Check-worthy Factual Claims by ClaimBuster
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThis paper introduces how ClaimBuster, a fact-checking platform, uses natural language processing and supervised learning to detect important factual claims in political discourses. The claim spotting model is built using a human-labeled dataset of ...
Computational Fact Checking through Query Perturbations
Invited Paper from ICDT 2014, Invited Paper from EDBT 2015, Regular Papers and Technical CorrespondenceOur media is saturated with claims of “facts” made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, for example, is a claim “...
Linguistic Signals under Misinformation and Fact-Checking: Evidence from User Comments on Social Media
Misinformation and fact-checking are opposite forces in the news environment: the former creates inaccuracies to mislead people, while the latter provides evidence to rebut the former. These news articles are often posted on social media and attract user ...
Comments