skip to main content
article

Adaptive algorithms for set containment joins

Published:01 March 2003Publication History
Skip Abstract Section

Abstract

A set containment join is a join between set-valued attributes of two relations, whose join condition is specified using the subset (⊆) operator. Set containment joins are deployed in many database applications, even those that do not support set-valued attributes. In this article, we propose two novel partitioning algorithms, called the Adaptive Pick-and-Sweep Join (APSJ) and the Adaptive Divide-and-Conquer Join (ADCJ), which allow computing set containment joins efficiently. We show that APSJ outperforms previously suggested algorithms for many data sets, often by an order of magnitude. We present a detailed analysis of the algorithms and study their performance on real and synthetic data using an implemented testbed.

References

  1. Böhm, C. and Kriegel, H.-P. 2000. Dynamically optimizing high-dimensional index structures. In Proceedings of Advances in Database Technology---EDBT 2000, 7th International Conference on Extending Database Technology (Konstanz, Germany, Mar. 27--31). C. Zaniolo, P. C. Lockemann, M. H. Scholl, and T. Grust, Eds. Lecture Notes in Computer Science, vol. 1777. Springer-Verlag, New York.Google ScholarGoogle ScholarCross RefCross Ref
  2. Cai, J.-Y., Chakaravarthy, V. T., Kaushik, R., and Naughton, J. 2001. On the complexity of join predicates. In PODS'01, Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Santa Barbara, Calif., May 21--23). ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Faloutsos, C. and Christodoulakis, S. 1984. Signature files: An access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2, 4, 267--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gray, J., Sundaresan, P., Englert, S., Baclawski, K., and Weinberger, P. J. 1994. Quickly generating billion-record synthetic databases. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (Minneapolis, Minn., May 24--27). R. T. Snodgrass and M. Winslett, Eds. ACM, New York, 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Hellerstein, J. M., Koutsoupias, E., and Papadimitriou, C. H. 1997. On the analysis of indexing schemes. In PODS'97, Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Tucson, AZ., May 12--14). ACM, New York, 249--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Helmer, S. and Moerkotte, G. 1997. Evaluation of main memory join algorithms for joins with set comparison join predicates. In VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases (Athens, Greece, Aug. 25--29). M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, and M. A. Jeusfeld, Eds. Morgan Kaufmann, Reading, Mass., 386--395. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Hirai, J., Raghavan, S., Garcia-Molina, H., and Paepcke, A. 2000. Webbase: A repository of web pages. In WWW'00, Proceedings of the 9th International World Wide Web Conference. Computer Networks, vol. 33. Elsevier-North Holland, Amsterdam, The Netherlands. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Ishikawa, Y., Kitagawa, H., and Ohbo, N. 1993. Evaluation of signature files as set access facilities in oodbs. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (Washington, D.C., May 26--28). P. Buneman and S. Jajodia, Eds. ACM, New York, 247--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Melnik, S. and Garcia-Molina, H. 2002. Divide-and-conquer algorithm for computing set containment joins. In Proceedings of Advances in Database Technology---EDBT 2002, 8th International Conference on Extending Database Technology (Prague, Czech Republic, Mar. 25--27). C. S. Jensen, K. G. Jeffery, J. Pokorný, S. Saltenis, E. Bertino, K. Böhm, and M. Jarke, Eds. Lecture Notes in Computer Science, vol. 2287. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Patel, J. M. and DeWitt, D. J. 1996. Partition based spatial-merge join. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (Montreal, Que, Canada, June 4--6). H. V. Jagadish and I. S. Mumick, Eds. ACM, New York, 259--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ramasamy, K., Patel, J. M., Naughton, J. F., and Kaushik, R. 2000. Set containment joins: The good, the bad and the ugly. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases (Cairo, Egypt, Sept. 10--14). A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. Morgan-Kaufmann, Reading, Mass. 351--362. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Adaptive algorithms for set containment joins

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 28, Issue 1
        March 2003
        99 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/762471
        Issue’s Table of Contents

        Copyright © 2003 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 March 2003
        Published in tods Volume 28, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader