skip to main content
10.1145/182591.182601acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article
Free Access

The power of sampling in knowledge discovery

Published:24 May 1994Publication History

ABSTRACT

We consider the problem of approximately verifying the truth of sentences of tuple relational calculus in a given relation M by considering only a random sample of M. We define two different measures for the error of a universal sentence in a relation. For a set of n universal sentences each with at most k universal quantifiers, we give upper and lower bounds for the sample sizes required for having a high probability that all the sentences with error at least ε can be detected as false by considering the sample. The sample sizes are O((log n)/ε) or O((|M|1–1/k)log n/ε), depending on the error measure used. We also consider universal-existential sentences.

References

  1. Coh93.William W. Cohen. PAC-learning a restricted class of recursive logic programs. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 86-92, July 1993. AAAI Press, Menlo Park, CA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Fag82.Ronald Fagin. Horn clauses and database dependencies. Journal of the A CM, 29(4):952-985, October 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. GHK92.Adam J. Grove, Joseph Y. Halpern and Daphne Koller. Asymptotic conditional probabilities for first-order logic. In Proceedings of the Twenty Fourth Symposium on the Theory of Computin9 (STOC 'gP), pages 294-305, May 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. HKM94.Lauri Hells, Jyrki Kivinen and Heikki Mannila. Error measures and generalized quantifiers. In preparation, 1994.Google ScholarGoogle Scholar
  5. HO91.Weh-Chi Hou and Gultekin Ozsoyoglu. Statistical estimators for aggregate relational algebra queries. A CM Transactions on Database Systems, 16(4):600-654, December 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. HS92.Peter J. Haas and Arun N. Swami. Sequential sampling procedures for query size optimization. In Michael Stonebraker, editor, Proceedings of the 199P A GM SIGMOD Conference on Management of Data, pages 341-350, June 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. KI91.Ravi Krishnamurthy and Tomasz imielinski. Research directions in knowledge discovery. SIG- MOD Record, 20(3):76-78, September 1991.Google ScholarGoogle Scholar
  8. KM92.Jyrki Kivinen and Heikki Mannila. Approximate dependency inference from relations. In Joachim Biskup and Richard Hull, editors, Proceedings of the Fourth International Conference on Database Theory (ICDT '9P), pages 86-98, October 1992. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. LNS90.Richard J. Lipton, Jeffrey F. Naughton and D. A. Schneider. Practical selectivity estimation through adaptive sampling. In Hector Garcia- Molina and H. V. Jagadish, editors, Proceedings of the 1990 A GM SIGMOD Conference on Management o/Data, pages 1-11, May 1990. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Mug92.Steven Muggleton, editor. Inductive Logic Programmin9. Academic Press, 1992.Google ScholarGoogle Scholar
  11. NS90.Jeffrey F. Naughton and S. Seshadri. On estimating the size of projections. In Serge Abiteboul and Paris C. Kanellalds, editors, Proceedings of the Third International Conference on Database Theory (IGDT '90), pages 499-513, December 1990. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nii87.Ilkka Niiniluoto. Truthlikeness. D. Reidel Pubfishing Company, Dordrecht, Holland, 1987.Google ScholarGoogle Scholar
  13. Odd86.Graham Oddie. Likeness to Truth. D. Reidel Publishing Company, Dordrecht, Holland, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  14. OSW89.Daniel N. Osherson, Michael Stob and Scott Weinstein. On approximate truth. In Ronald Rivest, David Haussler and Manfred K. Warmuth, editors, Proceedings of the Second Workshop on Computational Learning Theory, pages 88-101, July 1989. Morgan Kaufmann, San Mateo, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. PS91.Gregory Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In Gregory Piatetsky-Shapiro and William J. Frawley, editors, Knowledge Discovery in Databases, pages 229-248, 1991. AAAI Press, Menlo Park, CA.Google ScholarGoogle Scholar
  16. PSF91.Gregory Piatetsky-Shapiro and William J. Frawley, editors. Knowledge Discovery in Databases. AAAI Press, Menlo Park, CA, 1991.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Roy91.Shaibal Roy. Semantic complexity of classes of relational queries and query independent data partitioning. In Proceedings of the Tenth Symposium on Principles of Database Systems (PODS '90), pages 259-267, May 1991. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Ull88.Jeffrey D. Ullman. Principles of Database and Knowledge-Base Systems, Volume L Computer Science Press, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Val84.Leslie G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134-1142, November 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ZZ93.Robert Zembowicz and Jan M. Zytkov. Testing the existence of functional relationship in data. In Proceedings of the AAAI-93 Workshop on Knowledge Discovery in Databases, 1993.Google ScholarGoogle Scholar

Index Terms

  1. The power of sampling in knowledge discovery

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PODS '94: Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
              May 1994
              313 pages
              ISBN:0897916425
              DOI:10.1145/182591
              • Chairman:
              • Victor Vianu

              Copyright © 1994 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 24 May 1994

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • Article

              Acceptance Rates

              PODS '94 Paper Acceptance Rate28of117submissions,24%Overall Acceptance Rate642of2,707submissions,24%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader