ABSTRACT
We consider the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M. We define two different measures for the error of a universal sentence in a relation. For a set of n universal sentences each with at most k universal quantifiers, we give upper and lower bounds for the sample sizes required for having a high probability that all the sentences with error at least ε can be detected as false by considering the sample. The sample sizes are O((log n)/ε) or O((|M|1–1/k)log n/ε), depending on the error measure used. We also consider universal-existential sentences.
- Coh93.William W. Cohen. PAC-learning a restricted class of recursive logic programs. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 86-92, July 1993. AAAI Press, Menlo Park, CA.Google ScholarDigital Library
- Fag82.Ronald Fagin. Horn clauses and database dependencies. Journal of the A CM, 29(4):952-985, October 1982. Google ScholarDigital Library
- GHK92.Adam J. Grove, Joseph Y. Halpern and Daphne Koller. Asymptotic conditional probabilities for first-order logic. In Proceedings of the Twenty Fourth Symposium on the Theory of Computin9 (STOC 'gP), pages 294-305, May 1992. ACM. Google ScholarDigital Library
- HKM94.Lauri Hells, Jyrki Kivinen and Heikki Mannila. Error measures and generalized quantifiers. In preparation, 1994.Google Scholar
- HO91.Weh-Chi Hou and Gultekin Ozsoyoglu. Statistical estimators for aggregate relational algebra queries. A CM Transactions on Database Systems, 16(4):600-654, December 1991. Google ScholarDigital Library
- HS92.Peter J. Haas and Arun N. Swami. Sequential sampling procedures for query size optimization. In Michael Stonebraker, editor, Proceedings of the 199P A GM SIGMOD Conference on Management of Data, pages 341-350, June 1992. ACM. Google ScholarDigital Library
- KI91.Ravi Krishnamurthy and Tomasz imielinski. Research directions in knowledge discovery. SIG- MOD Record, 20(3):76-78, September 1991.Google Scholar
- KM92.Jyrki Kivinen and Heikki Mannila. Approximate dependency inference from relations. In Joachim Biskup and Richard Hull, editors, Proceedings of the Fourth International Conference on Database Theory (ICDT '9P), pages 86-98, October 1992. Springer-Verlag. Google ScholarDigital Library
- LNS90.Richard J. Lipton, Jeffrey F. Naughton and D. A. Schneider. Practical selectivity estimation through adaptive sampling. In Hector Garcia- Molina and H. V. Jagadish, editors, Proceedings of the 1990 A GM SIGMOD Conference on Management o/Data, pages 1-11, May 1990. ACM. Google ScholarDigital Library
- Mug92.Steven Muggleton, editor. Inductive Logic Programmin9. Academic Press, 1992.Google Scholar
- NS90.Jeffrey F. Naughton and S. Seshadri. On estimating the size of projections. In Serge Abiteboul and Paris C. Kanellalds, editors, Proceedings of the Third International Conference on Database Theory (IGDT '90), pages 499-513, December 1990. Springer-Verlag. Google ScholarDigital Library
- Nii87.Ilkka Niiniluoto. Truthlikeness. D. Reidel Pubfishing Company, Dordrecht, Holland, 1987.Google Scholar
- Odd86.Graham Oddie. Likeness to Truth. D. Reidel Publishing Company, Dordrecht, Holland, 1986.Google ScholarCross Ref
- OSW89.Daniel N. Osherson, Michael Stob and Scott Weinstein. On approximate truth. In Ronald Rivest, David Haussler and Manfred K. Warmuth, editors, Proceedings of the Second Workshop on Computational Learning Theory, pages 88-101, July 1989. Morgan Kaufmann, San Mateo, CA. Google ScholarDigital Library
- PS91.Gregory Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Gregory Piatetsky-Shapiro and William J. Frawley, editors, Knowledge Discovery in Databases, pages 229-248, 1991. AAAI Press, Menlo Park, CA.Google Scholar
- PSF91.Gregory Piatetsky-Shapiro and William J. Frawley, editors. Knowledge Discovery in Databases. AAAI Press, Menlo Park, CA, 1991.Google ScholarDigital Library
- Roy91.Shaibal Roy. Semantic complexity of classes of relational queries and query independent data partitioning. In Proceedings of the Tenth Symposium on Principles of Database Systems (PODS '90), pages 259-267, May 1991. ACM. Google ScholarDigital Library
- Ull88.Jeffrey D. Ullman. Principles of Database and Knowledge-Base Systems, Volume L Computer Science Press, 1988. Google ScholarDigital Library
- Val84.Leslie G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, November 1984. Google ScholarDigital Library
- ZZ93.Robert Zembowicz and Jan M. Zytkov. Testing the existence of functional relationship in data. In Proceedings of the AAAI-93 Workshop on Knowledge Discovery in Databases, 1993.Google Scholar
Index Terms
- The power of sampling in knowledge discovery
Recommendations
Stratified random sampling for power estimation
In this paper, we present new statistical sampling techniques for performing power estimation at the circuit level. These techniques first transform the power estimation problem to a survey sampling problem, and then apply stratified random sampling to ...
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a ...
AND/OR importance sampling
UAI'08: Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial IntelligenceThe paper introduces AND/OR importance sampling for probabilistic graphical models. In contrast to importance sampling, AND/OR importance sampling caches samples in the AND/OR space and then extracts a new sample mean from the stored samples. We prove ...
Comments