ABSTRACT
We consider an architecture of mediators and wrappers for Internet accessible WebSources of limited query capability. Each call to a source is a WebSource Implementation (WSI) and it is associated with both a capability and (a possibly dynamic) cost. The multiplicity of WSIs with varying costs and capabilities increases the complexity of a traditional optimizer that must assign WSIs for each remote relation in the query while generating an (optimal) plan. We present a two-phase Web Query Optimizer (WQO). In a pre-optimization phase, the WQO selects one or more WSIs for a pre-plan; a pre-plan represents a space of query evaluation plans (plans) based on this choice of WSIs. The WQO uses cost-based heuristics to evaluate the choice of WSI assignment in the pre-plan and to choose a good pre-plan. The WQO uses the pre-plan to drive the extended relational optimizer to obtain the best plan for a pre-plan. A prototype of the WQO has been developed. We compare the effectiveness of the WQO, i.e., its ability to efficiently search a large space of plans and obtain a low cost plan, in comparison to a traditional optimizer. We also validate the cost-based heuristics by experimental evaluation of queries in the noisy Internet environment.
- S. Adali, K. S. Candan, Y. Papakonstantinou, and V. S. Subrahmanian. Query caching and optimization in distributed mediator systems. Proceedings of the ACM SIGMOD Conference, 1996.]] Google ScholarDigital Library
- L. Amsaleg, M. Franklin, A. Tomasic, and T. Urhan. Dynamic query operator scheduling for wide-area remote access. Journal of Distributed and Parallel Databases, 6(3), 1998.]] Google ScholarDigital Library
- R. Avnur and J. Hellerstein. Eddies: Continuously adaptive query processing. Proceedings of the ACM Sigmod Conference, 2000.]] Google ScholarDigital Library
- L. Bright, J.-R. Gruser, L. Raschid, and M. E. Vidal. A wrapper generation toolkit to specify and construct wrappers for webaccesible data sources (websources). Journal of Computer Systems Special Issue on Semantics in the WWW, 14(2):83-98, 1999.]]Google Scholar
- L. Bright and L. Raschid. Cost modeling of wrappers for web accesible data sources (websources). http://www.umiacs.umd.edu/labs/CLIP/DARPA/ww97.html. (under review), 1998.]]Google Scholar
- S. Chaudhuri and K. Shim. Query optimization in the presence of foreing fucntions. Proceedings of the VLDB Conference, 1993.]] Google ScholarDigital Library
- J. Chen, D. DeWitt, F. Tian, and Y. Wang. Niagaracq: A scalable continuous query system for internet databases. Proceedings of the ACM SIGMOD Conference, 2000.]] Google ScholarDigital Library
- W. Du, R. Krishnamurthy, and M. C. Shan. Query optimization in a heterogeneous dbms. Proceedings of the VLDB Conference, 1992.]] Google ScholarDigital Library
- B. Eckman, A. Kosky, and L. Laroco. Extending traditional query-based integration approaches for functional characterization of post-genomic data. BioInformatics, 17(7):587-601, 2001.]]Google ScholarCross Ref
- B. Eckman, Z. Lacroix, and L. Raschid. Optimized seamless integration of biomolecular data. Proceedings of the IEEE International Symposium on Bio-Informatics and Biomedical Engineering (BIBE), 2001.]] Google ScholarDigital Library
- D. Florescu, A. Levy, I. Manolescu, and D. Suciu. Query optimization in the presence of limited access patterns. Proceedings of the ACM SIGMOD Conference, 1999.]] Google ScholarDigital Library
- H. Garcia-Molina, Labio W., and Yerneni R. Capability-sensitive query processing on internet sources. Proceedings of the International Conference on Database Engineering, 1999.]] Google ScholarDigital Library
- G. Gardarin, B. Finance, and P. Fankhauser. IRO-DB: A Distributed System Federating Object and Relational Databases, In Object-Oriented Multidatabase Systems : A solution for Advanced Applications, Bukhres, O. and Elmagarmid, A. Prentice Hall, 1996.]] Google ScholarDigital Library
- R. Goldman and J. Widom. Wsq/dsq: A practical approach for combined querying of databases and the web. Proceedings of the ACM SIGMOD Conference, pages 285-296, 2000.]] Google ScholarDigital Library
- J.-R. Gruser, L. Raschid, V. Zadorozhny, and T. Zhan. Learning response time for websources using query feedback and application in query optimization. VLDB Journal, Special Issue on Databases and the Web, Mendelzon, A. and Atzeni, P., editors., 9(1):18-37, 2000.]] Google ScholarDigital Library
- L. Haas, D. Kossmann, E. Wimmers, and J. Yang. Optimizing queries across diverse data sources. Proceedings of the VLDB Conference, 1997.]] Google ScholarDigital Library
- L. Haas, P. Schwarz, P. Kodali, E. Kotlar, J. Rice, and W. Swope. Discoverylink: A system for integrating life sciences data. IBM Systems Journal, 40(2), 2001.]] Google ScholarDigital Library
- P. Haas and J. Hellerstein. Ripple joins for online aggregation. Proceedings of the ACM Sigmod Conference, 1999.]] Google ScholarDigital Library
- J. Hellerstein et al. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000.]]Google Scholar
- Y. Ioanidis and Y. Kang. Randomized algorithms for optimizing large join queries. Proceedings of the ACM SIGMOD Conference, 1990.]] Google ScholarDigital Library
- Z. Ives et al. An adaptive query execution system for data integration. Proceedings of the ACM SIGMOD Conference, 1999.]] Google ScholarDigital Library
- Z. Ives, A. Levy, D. Weld, D. Florescu, and M. Friedman. Adaptive query processing for internet applications. IEEE Data Engineering Bulletin, 23(2):19-26, 2000.]]Google Scholar
- A. Y. Levy et al. Querying heterogeneous information sources using source descriptions. Proceedings of VLDB, 1996.]] Google ScholarDigital Library
- C. Li and E. Chang. Query planning with limited source capabilities. Proceedings of ICDE, 2000.]]Google ScholarCross Ref
- C. Li and E. Chang. On answering queries in the presence of limited access patterns. Proceedings of the International Conference on Database Theory, 2001.]] Google ScholarDigital Library
- ACM Digital Library. http://www.acm.org/dl/Search.html.]]Google Scholar
- L. Haas M. Tork Roth, F. Ozcan. Cost models do matter: Providing cost information for diverse data sources in a federated system. Proceedings of the VLDB Conference, 1999.]] Google ScholarDigital Library
- H. Naacke, G. Gardarin, and A. Tomasic. Leveraging mediator cost models with heterogeneous data sources. Proceedings of the International Conference on Data Engineering, 1998.]] Google ScholarDigital Library
- Bureau of Labor Statistics. http://stats.bls.gov.]]Google Scholar
- Y. Papakonstantinou, A. Gupta, and L. Haas. Capabilities-based query rewriting in mediator systems. Proceedings of the International Conference on Parallel and Distributed Information Systems, 1996.]] Google ScholarDigital Library
- R. Ramakrishnan P. Seshadri, M. Livny. The case for enhanced abstract data types. Proceedings of the VLDB Conference, 1997.]] Google ScholarDigital Library
- P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. 1979.]] Google ScholarDigital Library
- A. Tomasic et al. Scaling heterogeneous databases and the design of disco. Proceedings of the International Conference on Distributed Computing Systems, 1996.]] Google ScholarDigital Library
- J. Ullman. Principles of DataBase and Knowledge-Base Systems, volume I. Computer Science Press, 1988.]] Google ScholarDigital Library
- J. Ullman. Principles of DataBase and Knowledge-Base Systems, volume II. Computer Science Press, 1989.]] Google ScholarDigital Library
- T. Urhan and M. Franklin. Xjoin: A reactively-scheduled pipelined join operator. IEEE Data Engineering Bulletin, 23(2):27-33, 2000.]]Google Scholar
- T. Urhan and M. Franklin. Dynamic pipeline scheduling for improving interactive performance of online queries. Proceedings of the VLDB Conference, 2001.]] Google ScholarDigital Library
- T. Urhan, M. Franklin, and L. Amsaleg. Cost-based query scrambling for initial delays. Proceedings of the ACM Sigmod Conference, 1998.]] Google ScholarDigital Library
- V. Vassalos and Y. Papakonstantinou. Describing and using query capabilities of heterogeneos sources. In Proceedings of the VLDB Conference, 1997.]] Google ScholarDigital Library
- V. Vassalos and Y. Papakonstantinou. Using knowledge of redundancy for query optimization in mediators. Proceedings of the AAAI Symposium on AI and Data Integration, 1998.]]Google Scholar
- M. E. Vidal. A Mediator for Scaling up to Multiple WebSources. PhD thesis, University Simon Bolivar, 2000.]]Google Scholar
- M. E. Vidal, L. Raschid, and V. Zadorozhny. Decision support model for pre-plans. In preparation, 2001.]]Google Scholar
- R. Yerneni, Chen Li, J. Ullman, and H. Garcia-Molina. Optimizing large join queries in mediation systems. Proceedings of the International Conference on Database Theory, 1999.]] Google ScholarDigital Library
- V. Zadorozhny, L. Raschid, T. Zhan, and L Bright. Validating a cost model for wide area applications. Proceedings of the International Conference on Cooperating Information Systems, 2001.]] Google ScholarDigital Library
Index Terms
- Efficient evaluation of queries in a mediator for WebSources
Recommendations
Scalable and efficient processing of top-k multiple-type integrated queries
AbstractIn this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various ...
Efficient approximations of conjunctive queries
PODS '12: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database SystemsWhen finding exact answers to a query over a large database is infeasible, it is natural to approximate the query by a more efficient one that comes from a class with good bounds on the complexity of query evaluation. In this paper we study such ...
Optimization and Evaluation of Disjunctive Queries
It is striking that the optimization of disjunctive queries i.e., those which contain at least one or-connective in the query predicate has been vastly neglected in the literature, as well as in commercial systems. In this paper, we propose a novel ...
Comments