Skip to main content

About this book

This book constitutes the proceedings of the Third Technology Conference on Performance Evaluation and Benchmarking, TPCTC 2011, held in conjunction with the 37th International Conference on Very Large Data Bases, VLDB 2011, in Seattle, August/September 2011. The 12 full papers and 2 keynote papers were carefully selected and reviewed from numerous submissions. The papers present novel ideas and methodologies in performance evaluation, measurement, and characterization.

Table of Contents


Shaping the Landscape of Industry Standard Benchmarks: Contributions of the Transaction Processing Performance Council (TPC)

Established in 1988, the Transaction Processing Performance Council (TPC) has had a significant impact on the computing industry’s use of industry-standard benchmarks. These benchmarks are widely adapted by systems and software vendors to illustrate performance competitiveness for their existing products, and to improve and monitor the performance of their products under development. Many buyers use TPC benchmark results as points of comparison when purchasing new computing systems and evaluating new technologies.
In this paper, the authors look at the contributions of the Transaction Processing Performance Council in shaping the landscape of industry standard benchmarks – from defining the fundamentals like performance, price for performance, and energy efficiency, to creating standards for independently auditing and reporting various aspects of the systems under test.
Raghunath Nambiar, Nicholas Wakou, Andrew Masland, Peter Thawley, Matthew Lanken, Forrest Carman, Michael Majdalany

Metrics for Measuring the Performance of the Mixed Workload CH-benCHmark

Advances in hardware architecture have begun to enable database vendors to process analytical queries directly on operational database systems without impeding the performance of mission-critical transaction processing too much. In order to evaluate such systems, we recently devised the mixed workload CH-benCHmark, which combines transactional load based on TPC-C order processing with decision support load based on TPC-H-like query suite run in parallel on the same tables in a single database system. Just as the data volume of actual enterprises tends to increase over time, an inherent characteristic of this mixed workload benchmark is that data volume increases during benchmark runs, which in turn may increase response times of analytic queries. For purely transactional loads, response times typically do not depend that much on data volume, as the queries used within business transactions are less complex and often indexes are used to answer these queries with point-wise accesses only. But for mixed workloads, the insert throughput metric of the transactional component interferes with the response-time metric of the analytic component. In order to address the problem, in this paper we analyze the characteristics of CH-benCHmark queries and propose normalized metrics which account for data volume growth.
Florian Funke, Alfons Kemper, Stefan Krompass, Harumi Kuno, Raghunath Nambiar, Thomas Neumann, Anisoara Nica, Meikel Poess, Michael Seibold

Towards an Enhanced Benchmark Advocating Energy-Efficient Systems

The growing energy consumption of data centers has become an area of research interest lately. For this reason, the research focus has broadened from a solely performance-oriented system evaluation to an exploration where energy efficiency is considered as well. The Transaction Processing Performance Council (TPC) has also reflected this shift by introducing the TPC-Energy benchmark. In this paper, we recommend extensions, refinements, and variations for such benchmarks. For this purpose, we present performance measurements of real-world DB servers and show that their mean utilization is far from peak and, thus, benchmarking results, even in conjunction with TPC-Energy, lead to inadequate assessment decisions, e.g., when a database server has to be purchased. Therefore, we propose a new kind of benchmarking paradigm that includes more realistic power measures. Our proposal will enable appraisals of database servers based on broader requirement profiles instead of focusing on sole performance. Furthermore, our energy-centric benchmarks will encourage the design and development of energy-proportional hardware and the evolution of energy-aware DBMSs.
Daniel Schall, Volker Hoefner, Manuel Kern

Optimization of Analytic Data Flows for Next Generation Business Intelligence Applications

This paper addresses the challenge of optimizing analytic data flows for modern business intelligence (BI) applications. We first describe the changing nature of BI in today’s enterprises as it has evolved from batch-based processes, in which the back-end extraction-transform-load (ETL) stage was separate from the front-end query and analytics stages, to near real-time data flows that fuse the back-end and front-end stages. We describe industry trends that force new BI architectures, e.g., mobile and cloud computing, semi-structured content, event and content streams as well as different execution engine architectures. For execution engines, the consequence of “one size does not fit all” is that BI queries and analytic applications now require complicated information flows as data is moved among data engines and queries span systems. In addition, new quality of service objectives are desired that incorporate measures beyond performance such as freshness (latency), reliability, accuracy, and so on. Existing approaches that optimize data flows simply for performance on a single system or a homogeneous cluster are insufficient. This paper describes our research to address the challenge of optimizing this new type of flow. We leverage concepts from earlier work in federated databases, but we face a much larger search space due to new objectives and a larger set of operators. We describe our initial optimizer that supports multiple objectives over a single processing engine. We then describe our research in optimizing flows for multiple engines and objectives and the challenges that remain.
Umeshwar Dayal, Kevin Wilkinson, Alkis Simitsis, Malu Castellanos, Lupita Paz

Normalization in a Mixed OLTP and OLAP Workload Scenario

The historically introduced separation of online analytical processing (OLAP) from online transaction processing (OLTP) is in question considering the current developments of databases. Column-oriented databases mainly used in the OLAP environment so far, with the addition of in-memory data storage are adapted to accommodate OLTP as well, thus paving the way for mixed OLTP and OLAP processing. To assess mixed workload systems benchmarking has to evolve along with the database technology. Especially in mixed workload scenarios the question arises of how to layout the database. In this paper, we present a case study on the impact of database design focusing on normalization with respect to various workload mixes and database implementations. We use a novel benchmark methodology that provides mixed OLTP and OLAP workloads based on a real scenario.
Anja Bog, Kai Sachs, Alexander Zeier, Hasso Plattner

Measuring Performance of Complex Event Processing Systems

Complex Event Processing (CEP) or stream data processing are becoming increasingly popular as the platform underlying event-driven solutions and applications in industries such as financial services, oil & gas, smart grids, health care, and IT monitoring. Satisfactory performance is crucial for any solution across these industries. Typically, performance of CEP engines is measured as (1) data rate, i.e., number of input events processed per second, and (2) latency, which denotes the time it takes for the result (output events) to emerge from the system after the business event (input event) happened. While data rates are typically easy to measure by capturing the numbers of input events over time, latency is less well defined. As it turns out, a definition becomes particularly challenging in the presence of data arriving out of order. That means that the order in which events arrive at the system is different from the order of their timestamps. Many important distributed scenarios need to deal with out-of-order arrival because communication delays easily introduce disorder.
With out-of-order arrival, a CEP system cannot produce final answers as events arrive. Instead, time first needs to progress enough in the overall system before correct results can be produced. This introduces additional latency beyond the time it takes the system to perform the processing of the events. We denote the former as information latency and the latter as system latency. This paper discusses both types of latency in detail and defines them formally without depending on particular semantics of the CEP query plans. In addition, the paper suggests incorporating these definitions as metrics into the benchmarks that are being used to assess and compare CEP systems.
Torsten Grabs, Ming Lu

Benchmarking with Your Head in the Cloud

Recent advances in Cloud Computing present challenges to those who create and manage performance benchmarks. A performance benchmark tends to rely on physical consistency – a known hardware configuration, a known software configuration and consistent measurements from run to run. These aspects are not typically present in Cloud Computing. Other aspects change, too. For the consumer, the computation of Total Cost of Ownership shifts to a computation of ongoing expense. Concepts of service and reliability also change from the end-user perspective.
For an organization like the Transaction Processing Performance Council, the expansion of clouds into the commercial, run-your-business space presents new challenges that must be addressed if viable benchmarks are to be created in this important sector of the computing industry. This paper explores these challenges and proposes methods for addressing them.
Karl Huppler

Extending TPC-E to Measure Availability in Database Systems

High-availability is a critical feature to database customers; having a way to measure and characterize availability is important for guiding system development and evaluating different HA technologies. This paper describes extensions to the TPC-E benchmark for availability measurement, including HA scenario simulation, fault injection, and availability metric reporting. The implementation details and exemplary test results on SQL Server 2008 Database Mirroring are also described.
Yantao Li, Charles Levine

SI-CV: Snapshot Isolation with Co-located Versions

Snapshot Isolation is an established concurrency control algorithm, where each transaction executes against its own version/snapshot of the database. Version management may produce unnecessary random writes. Compared to magnetic disks Flash storage offers fundamentally different IO characteristics, e.g. excellent random read, low random write performance and strong read/write asymmetry. Therefore the performance of snapshot isolation can be improved by minimizing the random writes. We propose a variant of snapshot isolation (called SI-CV) that collocates tuple versions created by a transaction in adjacent blocks and therefore minimizes random writes at the cost of random reads. Its performance, relative to the original algorithm, in overloaded systems under heavy transactional loads in TPC-C scenarios on Flash SSD storage increases significantly. At high loads that bring the original system into overload, the transactional throughput of SI-CV increases further, while maintaining response times that are multiple factors lower.
Robert Gottstein, Ilia Petrov, Alejandro Buchmann

Introducing Skew into the TPC-H Benchmark

While uniform data distributions were a design choice for the TPC-D benchmark and its successor TPC-H, it has been universally recognized that data skew is prevalent in data warehousing. A modern benchmark should therefore provide a test bed to evaluate the ability of database engines to handle skew. This paper introduces a concrete and practical way to introduce skew in the TPC-H data model by modifying the customer and supplier tables to reflect non-uniform customer and supplier populations. The first proposal consists in defining customer and supplier populations by nation that are roughly proportional to the actual nation populations. In a second proposal, nations are divided into two groups, one with large and equal populations and the other with equal and small populations. We then experiment with the proposed skew models to show how the optimizer of a parallel system can recognize skew and potentially produce different plans depending on the presence of skew. A comparison is made between query performance with the proposed method vs. the original uniform TPC-H distributions. Finally, an approach is presented to introduce skew into TPC-H with the current query set that is compatible with the current benchmark specification rules and could be implemented today.
Alain Crolotte, Ahmad Ghazal

Time and Cost-Efficient Modeling and Generation of Large-Scale TPCC/TPCE/TPCH Workloads

Large-scale TPC workloads are critical for the evaluation of datacenter-scale storage systems. However, these workloads have not been previously characterized, in-depth, and modeled in a DC environment. In this work, we categorize the TPC workloads into storage threads that have unique features and characterize the storage activity of TPCC, TPCE and TPCH based on I/O traces from real server installations. We also propose a framework for modeling and generation of large-scale TPC workloads, which allows us to conduct a wide spectrum of storage experiments without requiring knowledge on the structure of the application or the overhead of fully deploying it in different storage configurations. Using our framework, we eliminate the time for TPC setup and reduce the time for experiments by two orders of magnitude, due to the compression in storage activity enforced by the model. We demonstrate the accuracy of the model and the applicability of our method to significant datacenter storage challenges, including identification of early disk errors, and SSD caching.
Christina Delimitrou, Sriram Sankar, Badriddine Khessib, Kushagra Vaid, Christos Kozyrakis

When Free Is Not Really Free: What Does It Cost to Run a Database Workload in the Cloud?

The current computing trend towards cloud-based Database-as-a-Service (DaaS) as an alternative to traditional on-site relational database management systems (RDBMSs) has largely been driven by the perceived simplicity and cost-effectiveness of migrating to a DaaS. However, customers that are attracted to these DaaS alternatives may find that the range of different services and pricing options available to them add an unexpected level of complexity to their decision making. Cloud service pricing models are typically ‘pay-as-you-go’ in which the customer is charged based on resource usage such as CPU and memory utilization. Thus, customers considering different DaaS options must take into account how the performance and efficiency of the DaaS will ultimately impact their monthly bill. In this paper, we show that the current DaaS model can produce unpleasant surprises – for example, the case study that we present in this paper illustrates a scenario in which a DaaS service powered by a DBMS that has a lower hourly rate actually costs more to the end user than a DaaS service that is powered by another DBMS that charges a higher hourly rate. Thus, what we need is a method for the end-user to get an accurate estimate of the true costs that will be incurred without worrying about the nuances of how the DaaS operates. One potential solution to this problem is for DaaS providers to offer a new service called Benchmark as a Service (BaaS) where in the user provides the parameters of their workload and SLA requirements, and get a price quote.
Avrilia Floratou, Jignesh M. Patel, Willis Lang, Alan Halverson

A Fine-Grained Performance-Based Decision Model for Virtualization Application Solution

Virtualization technology has been widely applied across a broad range of contemporary datacenters. While constructing a datacenter, architects have to choose a Virtualization Application Solution (VAS) to maximize performance as well as minimize cost. However, the performance of a VAS involves a great number of metric concerns, such as virtualization overhead, isolation, manageability, consolidation, and so on. Further, datacenter architects have their own preference of metrics correlate with datacenters’ specific application scenarios. Nevertheless, previous research on virtualization performance either focus on a single performance concern or test several metrics respectively, rather than gives a holistic evaluation, which leads to the difficulties in VAS decision-making. In this paper, we propose a fine-grained performance-based decision model termed as VirtDM to aid architects to determine the best VAS for them via quantifying the overall performance of VAS according to datacenter architects’ own preference. First, our model defines a measurable, in-depth, fine-grained, human friendly metric system with organized hierarchy to achieve accurate and precise quantitative results. Second, the model harnesses a number of classic Multiple Criteria Decision-Making (MCDM) methods, such as the Analytical Hierarchical Process (AHP), to relieve people’s effort of deciding the weight of different metrics base on their own preference accordingly. Our case study addresses an decision process based on three real VAS candidates as an empirical example exploiting VirtDM and demonstrates the effectiveness of our VirtDM model.
Jianhai Chen, Dawei Huang, Bei Wang, Deshi Ye, Qinming He, Wenzhi Chen

A PDGF Implementation for TPC-H

With 182 benchmark results from 20 hardware vendors, TPC-H has established itself as the industry standard benchmark to measure performance of decision support systems. The release of TPC-H twelve years ago by the Transaction Processing Performance Council’s (TPC) was based on an earlier decision support benchmark, called TPC-D, which was released 1994. TPC-H inherited TPC-D’s data and query generators, DBgen and Qgen. As systems evolved over time, maintenance of these tools has become a major burden for the TPC. DBgen and Qgen need to be ported on new hardware architectures and adapted as the system grew in size to multiple terabytes. In this paper we demonstrate how Parallel Data Generation Framework (PDGF), a generic data generator, developed at the University of Passau for massively parallel data generation, can be adapted for TPC-H.
Meikel Poess, Tilmann Rabl, Michael Frank, Manuel Danisch


Additional information