Skip to main content
Top

2009 | Book

Performance Evaluation and Benchmarking

First TPC Technology Conference, TPCTC 2009, Lyon, France, August 24-28, 2009, Revised Selected Papers

Editors: Raghunath Nambiar, Meikel Poess

Publisher: Springer Berlin Heidelberg

Book Series : Lecture Notes in Computer Science

insite
SEARCH

About this book

First established in August 1988, the Transaction Processing Performance Council (TPC) has shaped the landscape of modern transaction processing and database benchmarks over two decades. Now, the world is in the midst of an extraordinary information explosion led by rapid growth in the use of the Internet and connected devices. Both user-generated data and enterprise data levels continue to grow ex- nentially. With substantial technological breakthroughs, Moore's law will continue for at least a decade, and the data storage capacities and data transfer speeds will continue to increase exponentially. These have challenged industry experts and researchers to develop innovative techniques to evaluate and benchmark both hardware and software technologies. As a result, the TPC held its First Conference on Performance Evaluation and Benchmarking (TPCTC 2009) on August 24 in Lyon, France in conjunction with the 35th International Conference on Very Large Data Bases (VLDB 2009). TPCTC 2009 provided industry experts and researchers with a forum to present and debate novel ideas and methodologies in performance evaluation, measurement and characteri- tion for 2010 and beyond. This book contains the proceedings of this conference, including 16 papers and keynote papers from Michael Stonebraker and Karl Huppler.

Table of Contents

Frontmatter
Transaction Processing Performance Council (TPC): Twenty Years Later – A Look Back, a Look Ahead
Abstract
The Transaction Processing Performance Council (TPC) [1] is a non-profit corporation founded to define transaction processing and database benchmarks and to disseminate objective, verifiable TPC performance data to the industry. Established in August 1988, the TPC has been integral in shaping the landscape of modern transaction processing and database benchmarks over the past twenty years. Today the TPC is developing an energy efficiency metric and a new ETL benchmark, as well as investigating new areas for benchmark development in 2010 and beyond.
Raghunath Othayoth Nambiar, Matthew Lanken, Nicholas Wakou, Forrest Carman, Michael Majdalany
A New Direction for TPC?
Abstract
This paper gives the author’s opinion concerning the contributions the Transaction Processing Council (TPC) has made in the past, how it is viewed in the present by me and my colleagues, and offers some suggestions on where it should go in the future. In short, TPC has become vendor-dominated, and it is time for TPC to reinvent itself to serve its customer community.
Michael Stonebraker
The Art of Building a Good Benchmark
Abstract
What makes a good benchmark? This is a question that has been asked often, answered often, altered often. In the past 25 years, the information processing industry has seen the creation of dozens of “industry standard” performance benchmarks – some highly successful, some less so. This paper will explore the overall requirements of a good benchmark, using existing industry standards as examples along the way.
Karl Huppler
Database Are Not Toasters: A Framework for Comparing Data Warehouse Appliances
Abstract
The success of Business Intelligence (BI) applications depends on two factors, the ability to analyze data ever more quickly and the ability to handle ever increasing volumes of data. Data Warehouse (DW) and Data Mart (DM) installations that support BI applications have historically been built using traditional architectures either designed from the ground up or based on customized reference system designs. The advent of Data Warehouse Appliances (DA) brings packaged software and hardware solutions that address performance and scalability requirements for certain market segments. The differences between DAs and custom installations make direct comparisons between them impractical and suggest the need for a targeted DA benchmark. In this paper we review data warehouse appliances by surveying thirteen products offered today. We assess the common characteristics among them and propose a classification for DA offerings. We hope our results will help define a useful benchmark for DAs.
Omer Trajman, Alain Crolotte, David Steinhoff, Raghunath Othayoth Nambiar, Meikel Poess
The State of Energy and Performance Benchmarking for Enterprise Servers
Abstract
To address the server industry’s marketing focus on performance, benchmarking organizations have played a pivotal role in developing techniques to determine the maximum achievable performance level of a system. Generally missing has been an assessment of energy use to achieve that performance. The connection between performance and energy consumption is becoming necessary information for designers and operators as they grapple with power constraints in the data center. While industry and policy makers continue to strategize about a universal metric to holistically measure IT equipment efficiency, existing server benchmarks for various workloads could provide an interim proxy to assess the relative energy efficiency of general servers. This paper discusses ideal characteristics a future energy-performance benchmark might contain, suggests ways in which current benchmarks might be adapted to provide a transitional step to this end, and notes the need for multiple workloads to provide a holistic proxy for a universal metric.
Andrew Fanara, Evan Haines, Arthur Howard
From Performance to Dependability Benchmarking: A Mandatory Path
Abstract
The work on performance benchmarking has started long ago. Ranging from simple benchmarks that target a very specific system or component to very complex benchmarks for complex infrastructures, performance benchmarks have contributed to improve successive generations of systems. However, the fact that nowadays most systems need to guarantee high availability and reliability shows that it is necessary to shift the focus from measuring pure performance to the measurement of both performance and dependability. Research on dependability benchmarking has started in the beginning of this decade, having already led to the proposal of several benchmarks. However, no dependability benchmark has yet achieved the status of a real benchmark endorsed by a standardization body or corporation. In this paper we argue that standardization bodies must shift focus and start including dependability metrics in their benchmarks. We present an overview of the state-of-the-art on dependability benchmarking and define a set of research needs and challenges that have to be addressed for the establishment of real dependability benchmarks.
Marco Vieira, Henrique Madeira
Overview of TPC Benchmark E: The Next Generation of OLTP Benchmarks
Abstract
Set to replace the aging TPC-C, the TPC Benchmark E is the next generation OLTP benchmark, which more accurately models client database usage. TPC-E addresses the shortcomings of TPC-C. It has a much more complex workload, requires the use of RAID-protected storage, generates much less I/O, and is much cheaper and easier to set up, run, and audit. After a period of overlap, it is expected that TPC-E will become the de facto OLTP benchmark.
Trish Hogan
Converting TPC-H Query Templates to Use DSQGEN for Easy Extensibility
Abstract
The ability to automatically generate queries that are not known a-priory is crucial for ad-hoc benchmarks. TPC-H solves this problem with a query generator, QGEN, which utilizes query templates to generate SQL queries. QGEN’s architecture makes it difficult to maintain, change or adapt to new types of query templates since every modification requires code changes. DSQGEN, a generic query generator, originally written for the TPC-DS benchmark, uses a query template language, which allows for easy modification and extension of existing query templates. In this paper we show how the current set of TPC-H query templates can be migrated to the template language of DSQGEN without any change to comparability of published TPC-H results. The resulting query template model provides opportunities for easier enhancement and extension of the TPC-H workload, which we demonstrate.
John M. Stephens Jr., Meikel Poess
Generating Shifting Workloads to Benchmark Adaptability in Relational Database Systems
Abstract
A large body of research concerns the adaptability of database systems. Many commercial systems already contain autonomic processes that adapt configurations as well as data structures and data organization. Yet there is virtually no possibility for a just measurement of the quality of such optimizations. While standard benchmarks have been developed that simulate real-world database applications very precisely, none of them considers variations in workloads produced by human factors. Today’s benchmarks test the performance of database systems by measuring peak performance on homogeneous request streams. Nevertheless, in systems with user interaction access patterns are constantly shifting. We present a benchmark that simulates a web information system with interaction of large user groups. It is based on the analysis of a real online eLearning management system with 15,000 users. The benchmark considers the temporal dependency of user interaction. Main focus is to measure the adaptability of a database management system according to shifting workloads. We will give details on our design approach that uses sophisticated pattern analysis and data mining techniques.
Tilmann Rabl, Andreas Lang, Thomas Hackl, Bernhard Sick, Harald Kosch
Measuring Database Performance in Online Services: A Trace-Based Approach
Abstract
Many large-scale online services use structured storage to persist metadata and sometimes data. The structured storage is typically provided by standard database servers such as Microsoft’s SQL Server. It is important to understand the workloads seen by these servers, both for provisioning server hardware as well as to exploit opportunities for energy savings and server consolidation. In this paper we analyze disk I/O traces from production servers in four internet services as well as servers running TPC benchmarks. We show using a range of load metrics that the services differ substantially from each other and from standard TPC benchmarks. Online services also show significant diurnal patterns in load that can be exploited for energy savings or consolidation. We argue that TPC benchmarks do not capture these important characteristics and argue for developing benchmarks that can be parameterized with workload features extracted from live production workload traces.
Swaroop Kavalanekar, Dushyanth Narayanan, Sriram Sankar, Eno Thereska, Kushagra Vaid, Bruce Worthington
Issues in Benchmark Metric Selection
Abstract
It is true that a metric can influence a benchmark but will esoteric metrics create more problems than they will solve? We answer this question affirmatively by examining the case of the TPC-D metric which used the much debated geometric mean for the single-stream test. We will show how a simple choice influenced the benchmark and its conduct and, to some extent, DBMS development. After examining other alternatives our conclusion is that the “real” measure for a decision-support benchmark is the arithmetic mean.
Alain Crolotte
Benchmarking Query Execution Robustness
Abstract
Benchmarks that focus on running queries on a well-tuned database system ignore a long-standing problem: adverse runtime conditions can cause database system performance to vary widely and unexpectedly. When the query execution engine does not exhibit resilience to these adverse conditions, addressing the resultant performance problems can contribute significantly to the total cost of ownership for a database system in over-provisioning, lost efficiency, and increased human administrative costs. For example, focused human effort may be needed to manually invoke workload management actions or fine-tune the optimization of specific queries.
We believe a benchmark is needed to measure query execution robustness, that is, how adverse or unexpected conditions impact the performance of a database system. We offer a preliminary analysis of barriers to query execution robustness and propose some metrics for quantifying the impact of those barriers. We present and analyze results from preliminary tests on four real database systems and discuss how these results could be used to increase the robustness of query processing in each case. Finally, we outline how our efforts could be expanded into a benchmark to quantify query execution robustness.
Janet L. Wiener, Harumi Kuno, Goetz Graefe
Benchmarking Database Performance in a Virtual Environment
Abstract
Data center consolidation, for power and space conservation, has driven the steady development and adoption of virtualization technologies. This in turn has lead to customer demands for better metrics to compare virtualization technologies. The technology industry has responded with standardized methods and measures for benchmarking hardware and software performance with virtualization. This paper compares the virtualization technologies available today and existing benchmarks to measure them. We describe some real-life data center scenarios that are not addressed by current benchmarks and highlight the need for virtualization workloads that incorporate database-heavy computing needs. We present data from experiments running existing TPC database workloads in a virtualized environment and demonstrate that virtualization technologies are available today to meet the demands of the most resource–intensive database application. We conclude with ideas to the TPC for a benchmark that can effectively measure database performance in a virtual environment.
Sharada Bose, Priti Mishra, Priya Sethuraman, Reza Taheri
Principles for an ETL Benchmark
Abstract
Conditions in the marketplace for ETL tools suggest that an industry standard benchmark is needed. The benchmark should provide useful data for comparing the performance of ETL systems, be based on a meaningful scenario, and be scalable over a wide range of data set sizes. This paper gives a general scoping of the proposed benchmark and outlines some key decision points. The Transaction Processing Performance Council (TPC) has formed a development subcommittee to define and produce such a benchmark.
Len Wyatt, Brian Caufield, Daniel Pol
Benchmarking ETL Workflows
Abstract
Extraction–Transform–Load (ETL) processes comprise complex data workflows, which are responsible for the maintenance of a Data Warehouse. A plethora of ETL tools is currently available constituting a multi-million dollar market. Each ETL tool uses its own technique for the design and implementation of an ETL workflow, making the task of assessing ETL tools extremely difficult. In this paper, we identify common characteristics of ETL workflows in an effort of proposing a unified evaluation method for ETL. We also identify the main points of interest in designing, implementing, and maintaining ETL workflows. Finally, we propose a principled organization of test suites based on the TPC-H schema for the problem of experimenting with ETL workflows.
Alkis Simitsis, Panos Vassiliadis, Umeshwar Dayal, Anastasios Karagiannis, Vasiliki Tziovara
A Performance Study of Event Processing Systems
Abstract
Event processing engines are used in diverse mission-critical scenarios such as fraud detection, traffic monitoring, or intensive care units. However, these scenarios have very different operational requirements in terms of, e.g., types of events, queries/patterns complexity, throughput, latency and number of sources and sinks. What are the performance bottlenecks? Will performance degrade gracefully with increasing loads? In this paper we make a first attempt to answer these questions by running several micro-benchmarks on three different engines, while we vary query parameters like window size, window expiration type, predicate selectivity, and data values. We also perform some experiments to assess engines scalability with respect to number of queries and propose ways for evaluating their ability in adapting to changes in load conditions. Lastly, we show that similar queries have widely different performances on the same or different engines and that no engine dominates the other two in all scenarios.
Marcelo R. N. Mendes, Pedro Bizarro, Paulo Marques
The Star Schema Benchmark and Augmented Fact Table Indexing
Abstract
We provide a benchmark measuring star schema queries retrieving data from a fact table with Where clause column restrictions on dimension tables. Clustering is crucial to performance with modern disk technology, since retrievals with filter factors down to 0.0005 are now performed most efficiently by sequential table search rather than by indexed access. DB2’s Multi-Dimensional Clustering (MDC) provides methods to "dice" the fact table along a number of orthogonal "dimensions", but only when these dimensions are columns in the fact table. The diced cells cluster fact rows on several of these "dimensions" at once so queries restricting several such columns can access crucially localized data, with much faster query response. Unfortunately, columns of dimension tables of a star schema are not usually represented in the fact table. In this paper, we show a simple way to adjoin physical copies of dimension columns to the fact table, dicing data to effectively cluster query retrieval, and explain how such dicing can be achieved on database products other than DB2. We provide benchmark measurements to show successful use of this methodology on three commercial database products.
Patrick O’Neil, Elizabeth O’Neil, Xuedong Chen, Stephen Revilak
An Approach of Performance Evaluation in Authentic Database Applications
Abstract
This paper proposes a benchmark test management framework (BTMF) to simulate realistic database application environments based on TPC benchmarks. BTMF provides configuration parameters for both test system (TS) and system under test (SUT), so a more authentic SUT performance can be obtained by tuning these parameters. We use Petri net and transfer matrix to describe the intricate testing workload characteristics, so configuration parameters for different database applications can easily be determined. We conduct three workload characteristics experiments basing on the TPC-App benchmark to validate the BTMF and the workload modeling approach.
Xiaojun Ye, Jingmin Xie, Jianmin Wang, Hao Tang, Naiqiao Du
Backmatter
Metadata
Title
Performance Evaluation and Benchmarking
Editors
Raghunath Nambiar
Meikel Poess
Copyright Year
2009
Publisher
Springer Berlin Heidelberg
Electronic ISBN
978-3-642-10424-4
Print ISBN
978-3-642-10423-7
DOI
https://doi.org/10.1007/978-3-642-10424-4

Premium Partner