Abstract
Few database query optimizer models have been validated against actual performance. This paper presents the methodology and results of a thorough validation of the optimizer and evaluation of the performance of the experimental distributed relational database management system R*, which inherited and extended to a distributed environment the optimization algorithms of System R. Optimizer estimated costs and actual R* resources consumed were written to database tables using new SQL commands, permitting automated control from SQL application programs of test data collection and reduction. A number of tests were run over a wide variety of dynamically-created test databases, SQL queries, and system parameters. The results for single-table access, sorting, and local 2-table joins are reported here. The tests confirmed the accuracy of the majority of the I/O cost model, the significant contribution of CPU cost to total cost, and the need to model CPU cost in more detail than was done in System R. The R* optimizer now retains cost components separately and estimates the number of CPU instructions, including those for applying different kinds of predicates. The sensitivity of I/O cost to buffer space motivated the development of more detailed models of buffer utilization unclustered index scans and nested-loop joins often benefit from pages remaining in the buffers, whereas concurrent scans of the data pages and the index pages for multiple tables during joins compete for buffer share. Without an index on the join column of the inner table, the optimizer correctly avoids the nested-loop join, confirming the need for merge-scan joins. When the join column of the inner is indexed, the optimizer overestimates the cost of the nested-loop join, whose actual performance is very sensitive to three parameters that are extremely difficult to estimate (1) the join (result) cardinality, (2) the outer table's cardinality, and (3) the number of buffer pages available to store the inner table. Suggestions are given for improved database statistics, prefetch and page replacement strategies for the buffer manager, and the use of temporary indexes and Bloom filters (hashed semijoins) to reduce access of unneeded data.
- APER 83 P M G Apers, A R Hevner, and S B Yao, Optnntzmg Algorithms for Dtstnbuted Queries, IEEE Trans on Software Engmeermg SE-9 (January 1983) pp 57-68]]Google ScholarDigital Library
- ASTR 80 M M Astrahan, M Schkolmck, and W Kun, Performance of the System R Access Path Selecuon Mechamsm, Informatton Processing 80 (1980) pp 487-491]]Google Scholar
- BERN 81 P A Bernstem, N Goodman, E Wong, C L Reeve, J Rothme, Query Processing m a System for D#stnbuted Databases (SDD-1), ACM Trans on Database Systems6,4 (December 1981) pp 602-625]] Google ScholarDigital Library
- BLAS 77 M W Blasgen and K P Eswaren, Storage and Access m RelaUonal Data Bases, IBM Systems Journal 16,4 (1977) Also available as "On the EvaluaUon of Queries m a RelaUonal Data Base System", IBM Research Report RJ1745, San Jose, CA, April 1976]]Google ScholarDigital Library
- BLOO 70 B H Bloom, Space/Trine Trade-offs m Hash Codmg wRh Allowable Errors, Commumcatlons of the A CM 13,7 (July 1970) pp 422-426]] Google ScholarDigital Library
- BRAT 84 K Bratbergsengen, Hashmg Methods and RelaUonal Algebra Operations, Procs of the Tenth lnternattonal Conf on Very Large Data Bases (Smgapore, 1984) pp 323-333]] Google ScholarDigital Library
- CARE 85 M J Carey and H Lu, Some Expertmental Results on Dtstnbuted Jom Algorithms m a Local Network, Procs of the Elewnth International Conf on Very Large Data Bases (Stockholm, Sweden, August 1985)]]Google Scholar
- CHAM 81 D D Chamberlm, M M Astrahan, W F Kmg, R A Lone, J W Mehl, T G Price, M Sehkolmck, P Gnfflths Selmger, D R Slutz, B W Wade, and R A Yost, Support for RepeUttve TransacUons and Ad Hoc Queries m System R, A CM Trans on Database Systems 6,1 (March 1981) pp 70-94]] Google ScholarDigital Library
- CHAN 82 J-M Chang, A Heurtstlc Approach to Distributed Query Processmg, Procs of the Etghth lnternattonal Conf on Very Large Data Bases (Mexico City, VLDB Endowment, Saratoga, pp 54-61]] Google ScholarDigital Library
- CHOU 85 H-T Chou and D J DeWltt, An Evaluatmn of Buffer Management Strategtes for Relauonal Database Systems, Procs of the Eleventh International Conf on Very Large Data Bases (Stockholm, Sweden, September 1985) pp 127-141]]Google Scholar
- CHU 82 W W Chu and P Hurley, Optunal Query Processing for D#stnbuted Database Systems, IEEE Trans on Computers C-31 (September 1982) pp 835-850]]Google ScholarDigital Library
- DANI 82 D Darnels, P G Selmger, L M Haas, B G Lmdsay, C Mohan, A Walker, and P Wdms, An Introducuon to D#stnbuted Query Compdataon m R*, Procs Second International Conf on D#trtbuted Databases (Berlm, September 1982) Also avatlable as IBM Research Report RJ3497, San Jose, CA, June 1982]]Google Scholar
- DEWI 84 D J DeWltt, R H Katz, F Olken, L D Shapiro, M R Stonebraker, and D Wood, Implementation Techmques for Mare Memory Database Systems, Proc SIGMOD 84 (Boston, MA, May 1984) pp 1-8]] Google ScholarDigital Library
- DEWI 85 D J DeWltt and R Gerber, Multi-Processor Hash- Based Join Algortthms, Procs of the Eleventh Internatzonal Conf on Very Large Data Bases (Stockholm, Sweden, September 1985) pp 151-164]]Google Scholar
- EFFE 84 Effelsberg, W and Haerder, T, Principles of Database Buffer Management, A CM Trans Database Systems 9 (December 1984) pp 560-595]] Google ScholarDigital Library
- EPST 78 R Epstein, M Stonebraker, and E Wong, Distributed Query Processing m a Relational Data Base System, Procs ofACM-SIGMOD (Austm,TX, May 1978) pp 169-180]] Google ScholarDigital Library
- EPST 80 R Epstein and M Stonebraker, Analysis of Distributed Data Base Processing Strategies, Procs of the Szxth lnternatwnal Conf on Very Large Data Bases (Montreal,IEEE, October 1980) pp 92-101]]Google Scholar
- FINK 82 S Fmkelstem, M Schkolmck, and P Tlbeno, DBDSGN -- a Physical Database Design Tool for System R, IEEE Database Engineering 5,1 (1982) pp 228-235]]Google Scholar
- GDDM 84 Graphtcal Data D#splay Manager Presentatwn Graphzcs Feature Interactive Chart Utzhty User's GuMe, Release 4, IBM Reference Manual SC33-0111-3 (IBM Corp, October 1984)]]Google Scholar
- HEVN 79 A R Hevner and S B Yao, Query Processing m Dlstnbuted Database Systems, IEEE Trans Software Engineering SE-5 (May 1979) pp 177-187]]Google ScholarDigital Library
- KERS 82 L Kerschberg, P D Tmg, and S B Yao, Query Optmuzauon m Star Computer Networks, A CM Trans on Database Systems 7,4 (December 1982) pp 678-711]] Google ScholarDigital Library
- LOHM 84 G M Lohman, D Darnels, L M Haas, R gastler, P G Sehnger, OptlmtzaUon of Nested Quenes m a Distributed Relational Database, Procs of the Tenth lnternatwnal Conf on Very Large Data Bases (Singapore, 1984) pp 403-415 Also avadable as IBM Research Report RJ4260, San Jose, CA, Aprd 1984]] Google ScholarDigital Library
- LOHM 85 G M Lohman, C Mohan, L M Haas, B G Lmdsay, P G Sehnger, P F Wllms, and D Darnels, Query Processing m R*, Query Processing m Database Systems, Sprmger-Verlag (Klm, Batory, & Remer (eds), 1985) pp 31-47 Also available as IBM Research Report R.14272, San Jose, CA, April 1984]]Google Scholar
- MACK 85 L F Mackert and G M Lohman, Index Scans using a Flmte LRU Buffer A Validated I/O Model, IBM Research Report RJ4836 (Almaden Research Center, San Jose, CA,]]Google Scholar
- MACK 86 L F Mackert and G M Lohman, R* Optmmmzer Validation and Performance Evaluation for Distributed Quenes, IBM Research Report RJ5050 (Almaden Research Center, San jose, CA,]]Google Scholar
- ONUE 83 E Onuegbe, S Rahum, and A R Hevner, Local Query Translation and Optmuzation m a Dmtnbuted System, Procs NCC 1983 (July 1983) pp 229-239]]Google Scholar
- PALE 74 F P Palermo, A Data Base Search Problem, Informatwn Systems COINS IV (Procs of COINS-72 Symp) (Mmnu, FL, 1974) pp 67-101 Plenum Press, JuhusT Tou, ed]]Google Scholar
- RDT 84 RDT Relational Design Tool, IBM Reference Manual SH20-6415 (IBM Corp, June 1984)]]Google Scholar
- SACC 82 G M Sacco and M Schkolmck, A Mechamsm for Managing the Buffer Pool m a Relational Database System using the Hot Set Model, Procs of the Eighth International Conf on Very Large Data Bases (Mexico City, 1982) pp 257-262]] Google ScholarDigital Library
- SCHK 79 M Schkolmck and P Ttbeno, Considerations m Developing a Design Tool for a Relational DBMS, Proc IEEE COMPSAC Conf (Cbacago, November 1979) pp 228-235]]Google Scholar
- SCHK 85 M Schkolmck and P Tlbeno, Estimating the Cost of Updates m a Relational Database, A CM Tram on Database Systems 10,2 (June 1985) pp 163-179 Also available as IBM Research Report RJ3327, San Jose, CA, 1981]] Google ScholarDigital Library
- SELI 79 P G Selmger, M M Astrahan, D D Chamberhn, R A Lone, and T G Price, Access Path Selection m a Relational Database Management System, Procs of ACM-SIGMOD 20 (1979) pp 23-34]] Google ScholarDigital Library
- SELI 80 P G Sehnger and M Adiba, Access Path Selection m Dtstnbuted Database Management Systems, Procs lnternatwnal Conf on Data Bases (Deen and Hammersly, ed, Umv of Abet pp 204-215]]Google Scholar
- SEVE 76 D G Severance and G M Lohman, Differential Fdes Their Apphcation to the Maintenance of Large Databases, ACM Trans on Database Systems 1,3 (September 1976) pp 256-267]] Google ScholarDigital Library
- SQL 84 SQL/Data System Application Programming for VM/ System Product, Release 3, IBM Reference Manual SH24-5068-0 (IBM Corp, December 1984)]]Google Scholar
- STON 80 M Stonebraker, Retrospection on a Data Base System, ACM Trans on Database Systems 5,2 (June 1980) pp225-240]] Google ScholarDigital Library
- STON 81 M Stonebraker, Operating System Support for Database Management, Commumcatwns of the A CM 24 (July 1981) pp 412-418]] Google ScholarDigital Library
- STON 82 M Stonebraker, J V#oodfdl, J Ranstrom, M Murphy, J Kalash, M Carey, K Arnold, Performance Analysts of Distributed Data Base Systems, Database Engineermg 5 (IEEE Computer Society, December 1982) pp 58-65]]Google Scholar
- VTAM 85 Network Program Products Planning (MVS, VSE, and VM), IBM Reference Manual SC23-0110-1 (IBM Corp, April 1985)]]Google Scholar
- WONG 76 E Wong and K Youssefl, Decomposition E a Strategy for Query Processing, A CM Tram on Database Systems 1,3 (September 1976) pp 223-241]] Google ScholarDigital Library
- YAO 79 S B Yao, Optmnzation of Query Algorithms, ACM Trans on Database Systems 4,2 (June 1979) pp 133-155]] Google ScholarDigital Library
- YU 83 C T Yu, and C C Chang, On the Design of a Query Processing Strategy m a D#stnbuted Database Enwronment, Proc SIGMOD 83 (San Jose, CA, May 1983) pp 30-39]] Google ScholarDigital Library
Index Terms
- R* optimizer validation and performance evaluation for local queries
Recommendations
R* optimizer validation and performance evaluation for local queries
SIGMOD '86: Proceedings of the 1986 ACM SIGMOD international conference on Management of dataFew database query optimizer models have been validated against actual performance. This paper presents the methodology and results of a thorough validation of the optimizer and evaluation of the performance of the experimental distributed relational ...
Computing Local Sensitivities of Counting Queries with Joins
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of DataLocal sensitivity of a query Q given a database instance D, i.e. how much the output Q(D) changes when a tuple is added to D or deleted from D, has many applications including query analysis, outlier detection, and differential privacy. However, it is ...
R* optimizer validation and performance evaluation for distributed queries
Readings in database systems (3rd ed.)
Comments