skip to main content
research-article

Toward computational fact-checking

Published:01 March 2014Publication History
Skip Abstract Section

Abstract

Our news are saturated with claims of "facts" made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-picking"? This paper proposes a framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate practical fact-checking tasks---reverse-engineering (often intentionally) vague claims, and countering questionable claims---as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of "meta" algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

References

  1. C. C. Aggarwal, editor. Managing and Mining Uncertain Data. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Agrawal and J. Widom. Confidence-aware join algorithms. ICDE, 2009, 628--639. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. M. Andrew. Another efficient algorithm for convex hulls in two dimensions. Information Processing Letters, 9(1979), 216--219.Google ScholarGoogle ScholarCross RefCross Ref
  4. M. A. Bender and M. Farach-Colton. The LCA problem revisited. LATIN, 2000, 88--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Börzsönyi, D. Kossmann, and K. Stocker. The skyline operator. ICDE, 2001, 421--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S Cohen, J. T. Hamilton, and F. Turner. Computational journalism. CACM, 54(2011), 66--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Cohen, C. Li, J. Yang, and C. Yu. Computational journalism: A call to arms to database researchers. CIDR, 2011.Google ScholarGoogle Scholar
  8. Harish D., P. N. Darera, and J. R. Haritsa. Identifying robust plans through plan diagram reduction. VLDB, 2008, 1124--1140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: Diamonds in the dirt. CACM, 52(2009), 86--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Fischer and V. Heun. A new succinct representation of rmq-information and improvements in the enhanced suffix array. ESCAPE, 2007, 459--470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Ganguly. Design and analysis of parametric query optimization algorithms. VLDB, 1998, 228--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Gray, A. Bosworth, A. Layman, and H. Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. ICDE, 1996, 152--159. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Harel and R. E. Tarjan. Fast algorithms for finding nearest common ancestors. SIAM, 13(1984), 338--355. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Z. He and E. Lo. Answering why-not questions on top-k queries. ICDE, 2012, 750--761. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Hulgeri and S. Sudarshan. AniPQO: Almost non-intrusive parametric query optimization for nonlinear cost functions. VLDB, 2003, 766--777. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis. Parametric query optimization. VLDB, 1992, 103--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. Jampani, F. Xu, M. Wu, L. L. Perez, C. Jermaine, and P. J. Haas. The Monte Carlo database system: Stochastic analysis close to the data. TODS, 36(2011), 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. T. Kung, F. Luccio, and F. P. Preparata. On finding the maxima of a set of vectors. JACM, 22(1975), 469--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X Lin, A. Mukherji, E. A. Rundensteiner, C. Ruiz, and M. O. Ward. PARAS: A parameter space framework for online association mining. VLDB 6(2013), 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Luo, X. Lin, W. Wang, and X. Zhou. Spark: top-k keyword query in relational databases. SIGMOD, 2007, 115--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Mouratidis and H. Pang. Computing immutable regions for sub-space top-k queries. VLDB, 6(2012), 73--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Das Sarma, A. G. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. ICDT, 2010, 89--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. A. Soliman, I. F. Ilyas, D. Martinenghi, and M. Tagliasacchi. Ranking with uncertain scoring functions: Semantics and sensitivity measures. SIGMOD, 2011, 805--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. E. Tarjan. Applications of path compression on balanced trees. JACM, 26(1979), 690--715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Q. T. Tran and C. Y. Chan. How to ConQueR why-not questions. SIGMOD, 2010, 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Q. T. Tran, C. Y. Chan, and S. Parthasarathy. Query by output. SIGMOD, 2009, 535--548. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. Wu and S. Madden. Scorpion: Explaining away outliers in aggregate queries. VLDB, 6(2013), 553--564. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Wu, P. K. Agarwal, C. Li, J. Yang, and C. Yu. Toward computational fact-checking. Technical report, Duke University, 2013. http://www.cs.duke.edu/dbgroup/papers/WuAgarwalEtAl-13-fact_check.pdf.Google ScholarGoogle Scholar
  29. A. Yu, P. K. Agarwal, and J. Yang. Processing a large number of continuous preference top-k queries. SIGMOD, 2012, 397--408. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 7, Issue 7
    March 2014
    108 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    • Published: 1 March 2014
    Published in pvldb Volume 7, Issue 7

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader