Abstract
In information extraction, uncertainty is ubiquitous. For this reason, it is useful to provide users querying extracted data with explanations for the answers they receive. Providing the provenance for tuples in a query result partially addresses this problem, in that provenance can explain why a tuple is in the result of a query. However, in some cases explaining why a tuple is not in the result may be just as helpful. In this work we focus on providing provenance-style explanations for non-answers and develop a mechanism for providing this new type of provenance. Our experience with an information extraction prototype suggests that our approach can provide effective provenance information that can help a user resolve their doubts over non-answers to a query.
- GATE. http://gate.ac.uk/ie/annie.html.Google Scholar
- MALLET. http://mallet.cs.umass.edu.Google Scholar
- Computer Research Association. http://www.cra.org/.Google Scholar
- MinorThird. http://minorthird.sourceforge.net.Google Scholar
- O. Benjelloun, A. D. Sarma, A. Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarDigital Library
- D. Bhagwat, L. Chiticariu, W. C. Tan, and G. Vijayvargiya. An annotation management system for relational databases. In VLDB, 2004. Google ScholarDigital Library
- C. Binnig, D. Kossmann, E. Lo. Reverse Query Processing. In ICDE, 2007.Google ScholarCross Ref
- J. Boulos, N. Dalvi, B. Mandhani, S. Mathur, C. Re, and D. Suciu. Mystiq: a system for finding more answers by using probabilities. In SIGMOD, 2005. Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarDigital Library
- M. J. Cafarella, C. Re, D. Suciu, and O. Etzioni. Structured querying of web text data: A technical challenge. In CIDR, 2007.Google Scholar
- A. Chandel, P. C. Nagesh, and S. Sarawagi. Efficient batch top-k search for dictionary-based entity recognition. In ICDE, 2006. Google ScholarDigital Library
- L. Chiticariu, W. C. Tan, and G. Vijayvargiya. DBNotes: a post-it system for relational databases based on provenance. In SIGMOD, 2005. Google ScholarDigital Library
- J. Chomicki. Consistent Query Answering: Five Easy Pieces. In ICDT, 2007. Google ScholarDigital Library
- E. Chu, A. Baid, T. Chen, A. Doan, and J. F. Naughton. A relational approach to incrementally extracting and querying structure in unstructured data. In VLDB, 2007. Google ScholarDigital Library
- W. Cohen and A. McCallum. Information extraction from the web. In KDD, 2003.Google Scholar
- V. Crescenzi, G. Mecca, and P. Merialdo. Roadrunner: Towards automatic data extraction from large web sites. In VLDB, 2001. Google ScholarDigital Library
- Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. In VLDB, 2001. Google ScholarDigital Library
- U. Dayal and P. A. Bernstein. On the Updatability of Relational Views. In VLDB, 1978. Google ScholarDigital Library
- P. DeRose, W. Shen, F. Chen, A. Doan, R. Ramakrishnan Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach In VLDB, 2007. Google ScholarDigital Library
- A. Doan, R. Ramakrishnan, and S. Vaithyanathan. Managing information extraction: state of the art and research directions. In SIGMOD, 2006. Google ScholarDigital Library
- M. Garofalakis and D. Suciu. Special issue on probabilistic data management. In IEEE Data Engineering Bulletin, 2006.Google Scholar
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google ScholarDigital Library
- M. Gubanov and P. A. Bernstein. Structural text search and comparison using automatically extracted schema. In WebDB, 2006.Google Scholar
- A. Jain, A. Doan, L. Gravano Optimizing SQL Queries over Text Databases In ICDE, 2008. Google ScholarDigital Library
- T. Imielinski and W. Lipski. Incomplete information in relational databases. J. ACM, 31(4), 1984. Google ScholarDigital Library
- P. G. Ipeirotis, E. Agichtein, P. Jain, and L. Gravano. To search or to crawl?: towards a query optimizer for text-centric tasks. In SIGMOD, 2006. Google ScholarDigital Library
- T. S. Jayram, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. Zhu. Avatar information extraction system. IEEE Data Eng. Bull., 29(1), 2006.Google Scholar
- S. Sarawagi. Automation in information extraction and data integration. In VLDB, 2002.Google Scholar
- W. Shen, P. DeRose, R. McCann, A. Doan, R. Ramakrishnan Toward Best-effort Information Extraction In SIGMOD, 2008. Google ScholarDigital Library
- W. Shen, A. Doan, J. Naughton, R. Ramakrishnan Declarative Information Extraction Using Datalog with Embedded Extraction Predicates In VLDB, 2007. Google ScholarDigital Library
- D. Suciu. Managing imprecisions with probabilistic databases. In Twente Data Management, 2006.Google Scholar
- W. C. Tan. Research problems in data provenance. IEEE Data Eng. Bull., 27(4), 2004.Google Scholar
- D. Weld, F. Wu, E. Adar, S. Amershi, J. Fogarty, R. Hoffmann, K. Patel, M. Skinner Intelligence in Wikipedia In AAAI, 2008. Google ScholarDigital Library
- J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In CIDR, 2005.Google Scholar
- A. Woodruff and M. Stonebraker. Supporting fine-grained data lineage in a database visualization environment. In ICDE, 1997. Google ScholarDigital Library
Index Terms
- On the provenance of non-answers to queries over extracted data
Recommendations
Querying data provenance
SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of dataMany advanced data management operations (e.g., incremental maintenance, trust assessment, debugging schema mappings, keyword search over databases, or query answering in probabilistic databases), involve computations that look at how a tuple was ...
Explaining missing answers to SPJUA queries
This paper addresses the problem of explaining missing answers in queries that include selection, projection, join, union, aggregation and grouping (SPJUA). Explaining missing answers of queries is useful in various scenarios, including query ...
Data Provenance for Historical Queries in Relational Database
Compute '15: Proceedings of the 8th Annual ACM India ConferenceCapturing, modeling, and querying data provenance in databases has gained considerable importance in the last decade. All kinds of applications developed on top of databases, now a days collect provenance for various purposes like trustworthiness of ...
Comments