Abstract
A set containment join is a join between set-valued attributes of two relations, whose join condition is specified using the subset (⊆) operator. Set containment joins are deployed in many database applications, even those that do not support set-valued attributes. In this article, we propose two novel partitioning algorithms, called the Adaptive Pick-and-Sweep Join (APSJ) and the Adaptive Divide-and-Conquer Join (ADCJ), which allow computing set containment joins efficiently. We show that APSJ outperforms previously suggested algorithms for many data sets, often by an order of magnitude. We present a detailed analysis of the algorithms and study their performance on real and synthetic data using an implemented testbed.
- Böhm, C. and Kriegel, H.-P. 2000. Dynamically optimizing high-dimensional index structures. In Proceedings of Advances in Database Technology---EDBT 2000, 7th International Conference on Extending Database Technology (Konstanz, Germany, Mar. 27--31). C. Zaniolo, P. C. Lockemann, M. H. Scholl, and T. Grust, Eds. Lecture Notes in Computer Science, vol. 1777. Springer-Verlag, New York.Google ScholarCross Ref
- Cai, J.-Y., Chakaravarthy, V. T., Kaushik, R., and Naughton, J. 2001. On the complexity of join predicates. In PODS'01, Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Santa Barbara, Calif., May 21--23). ACM Press. Google ScholarDigital Library
- Faloutsos, C. and Christodoulakis, S. 1984. Signature files: An access method for documents and its analytical performance evaluation. ACM Trans. Inf. Syst. 2, 4, 267--288. Google ScholarDigital Library
- Gray, J., Sundaresan, P., Englert, S., Baclawski, K., and Weinberger, P. J. 1994. Quickly generating billion-record synthetic databases. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (Minneapolis, Minn., May 24--27). R. T. Snodgrass and M. Winslett, Eds. ACM, New York, 243--252. Google ScholarDigital Library
- Hellerstein, J. M., Koutsoupias, E., and Papadimitriou, C. H. 1997. On the analysis of indexing schemes. In PODS'97, Proceedings of the 16th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Tucson, AZ., May 12--14). ACM, New York, 249--256. Google ScholarDigital Library
- Helmer, S. and Moerkotte, G. 1997. Evaluation of main memory join algorithms for joins with set comparison join predicates. In VLDB'97, Proceedings of 23rd International Conference on Very Large Data Bases (Athens, Greece, Aug. 25--29). M. Jarke, M. J. Carey, K. R. Dittrich, F. H. Lochovsky, P. Loucopoulos, and M. A. Jeusfeld, Eds. Morgan Kaufmann, Reading, Mass., 386--395. Google ScholarDigital Library
- Hirai, J., Raghavan, S., Garcia-Molina, H., and Paepcke, A. 2000. Webbase: A repository of web pages. In WWW'00, Proceedings of the 9th International World Wide Web Conference. Computer Networks, vol. 33. Elsevier-North Holland, Amsterdam, The Netherlands. Google ScholarDigital Library
- Ishikawa, Y., Kitagawa, H., and Ohbo, N. 1993. Evaluation of signature files as set access facilities in oodbs. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (Washington, D.C., May 26--28). P. Buneman and S. Jajodia, Eds. ACM, New York, 247--256. Google ScholarDigital Library
- Melnik, S. and Garcia-Molina, H. 2002. Divide-and-conquer algorithm for computing set containment joins. In Proceedings of Advances in Database Technology---EDBT 2002, 8th International Conference on Extending Database Technology (Prague, Czech Republic, Mar. 25--27). C. S. Jensen, K. G. Jeffery, J. Pokorný, S. Saltenis, E. Bertino, K. Böhm, and M. Jarke, Eds. Lecture Notes in Computer Science, vol. 2287. Springer-Verlag. Google ScholarDigital Library
- Patel, J. M. and DeWitt, D. J. 1996. Partition based spatial-merge join. In Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (Montreal, Que, Canada, June 4--6). H. V. Jagadish and I. S. Mumick, Eds. ACM, New York, 259--270. Google ScholarDigital Library
- Ramasamy, K., Patel, J. M., Naughton, J. F., and Kaushik, R. 2000. Set containment joins: The good, the bad and the ugly. In VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases (Cairo, Egypt, Sept. 10--14). A. E. Abbadi, M. L. Brodie, S. Chakravarthy, U. Dayal, N. Kamel, G. Schlageter, and K.-Y. Whang, Eds. Morgan-Kaufmann, Reading, Mass. 351--362. Google ScholarDigital Library
Index Terms
- Adaptive algorithms for set containment joins
Recommendations
Query containment under bag and bag-set semantics
Conjunctive queries (CQs) are at the core of query languages encountered in many logic-based research fields such as AI, or database systems. The majority of existing work assumes set semantics but often in real applications the manipulation of ...
View-based query containment
PODS '03: Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsQuery containment is the problem of checking whether for all databases the answer to a query is a subset of the answer to a second query. In several data management tasks, such as data integration, mobile computing, etc., the data of interest are only ...
Divide-and-Conquer Algorithm for Computing Set Containment Joins
EDBT '02: Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database TechnologyA set containment join is a join between set-valued attributes of two relations, whose join condition is specified using the subset ( ) operator. Set containment joins are used in a variety of database applications. In this paper, we propose a novel ...
Comments