skip to main content
research-article

On the provenance of non-answers to queries over extracted data

Published:01 August 2008Publication History
Skip Abstract Section

Abstract

In information extraction, uncertainty is ubiquitous. For this reason, it is useful to provide users querying extracted data with explanations for the answers they receive. Providing the provenance for tuples in a query result partially addresses this problem, in that provenance can explain why a tuple is in the result of a query. However, in some cases explaining why a tuple is not in the result may be just as helpful. In this work we focus on providing provenance-style explanations for non-answers and develop a mechanism for providing this new type of provenance. Our experience with an information extraction prototype suggests that our approach can provide effective provenance information that can help a user resolve their doubts over non-answers to a query.

References

  1. GATE. http://gate.ac.uk/ie/annie.html.Google ScholarGoogle Scholar
  2. MALLET. http://mallet.cs.umass.edu.Google ScholarGoogle Scholar
  3. Computer Research Association. http://www.cra.org/.Google ScholarGoogle Scholar
  4. MinorThird. http://minorthird.sourceforge.net.Google ScholarGoogle Scholar
  5. O. Benjelloun, A. D. Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Binnig, D. Kossmann, E. Lo. Reverse Query Processing. In ICDE, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  8. J. Boulos, N. Dalvi, B. Mandhani, S. Mathur, C. Re, and D. Suciu. Mystiq: a system for finding more answers by using probabilities. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. J. Cafarella, C. Re, D. Suciu, and O. Etzioni. Structured querying of web text data: A technical challenge. In CIDR, 2007.Google ScholarGoogle Scholar
  11. A. Chandel, P. C. Nagesh, and S. Sarawagi. Efficient batch top-k search for dictionary-based entity recognition. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Chiticariu, W. C. Tan, and G. Vijayvargiya. DBNotes: a post-it system for relational databases based on provenance. In SIGMOD, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Chomicki. Consistent Query Answering: Five Easy Pieces. In ICDT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Chu, A. Baid, T. Chen, A. Doan, and J. F. Naughton. A relational approach to incrementally extracting and querying structure in unstructured data. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Cohen and A. McCallum. Information extraction from the web. In KDD, 2003.Google ScholarGoogle Scholar
  16. V. Crescenzi, G. Mecca, and P. Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. U. Dayal and P. A. Bernstein. On the Updatability of Relational Views. In VLDB, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. DeRose, W. Shen, F. Chen, A. Doan, R. Ramakrishnan Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Doan, R. Ramakrishnan, and S. Vaithyanathan. Managing information extraction: state of the art and research directions. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Garofalakis and D. Suciu. Special issue on probabilistic data management. In IEEE Data Engineering Bulletin, 2006.Google ScholarGoogle Scholar
  22. T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Gubanov and P. A. Bernstein. Structural text search and comparison using automatically extracted schema. In WebDB, 2006.Google ScholarGoogle Scholar
  24. A. Jain, A. Doan, L. Gravano Optimizing SQL Queries over Text Databases In ICDE, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. Imielinski and W. Lipski. Incomplete information in relational databases. J. ACM, 31(4), 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. G. Ipeirotis, E. Agichtein, P. Jain, and L. Gravano. To search or to crawl?: towards a query optimizer for text-centric tasks. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bull., 29(1), 2006.Google ScholarGoogle Scholar
  28. S. Sarawagi. Automation in information extraction and data integration. In VLDB, 2002.Google ScholarGoogle Scholar
  29. W. Shen, P. DeRose, R. McCann, A. Doan, R. Ramakrishnan Toward Best-effort Information Extraction In SIGMOD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Shen, A. Doan, J. Naughton, R. Ramakrishnan Declarative Information Extraction Using Datalog with Embedded Extraction Predicates In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Suciu. Managing imprecisions with probabilistic databases. In Twente Data Management, 2006.Google ScholarGoogle Scholar
  32. W. C. Tan. Research problems in data provenance. IEEE Data Eng. Bull., 27(4), 2004.Google ScholarGoogle Scholar
  33. D. Weld, F. Wu, E. Adar, S. Amershi, J. Fogarty, R. Hoffmann, K. Patel, M. Skinner Intelligence in Wikipedia In AAAI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.Google ScholarGoogle Scholar
  35. A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In ICDE, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On the provenance of non-answers to queries over extracted data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader