Skip to main content
main-content

Über dieses Buch

This book constitutes the refereed post-proceedings of the 5th TPC Technology Conference, TPCTC 2013, held in Trento, Italy, in August 2013. It contains 7 selected peer-reviewed papers, a report from the TPC Public Relations Committee and one invited paper. The papers present novel ideas and methodologies in performance evaluation, measurement and characterization.

Inhaltsverzeichnis

Frontmatter

TPC State of the Council 2013

Abstract
The TPC has played, and continues to play, a crucial role in providing the computer industry and its customers with relevant standards for total system performance, price-performance, and energy efficiency comparisons. Historically known for database-centric standards, the TPC is now developing benchmark standards for consolidation using virtualization technologies and multi-source data integration. The organization is also exploring new ideas such as Big Data and Big Data Analytics as well as an Express benchmark model to keep pace with rapidly changing industry demands. This paper gives a high level overview of the current state of the TPC in terms of existing standards, standards under development and future outlook.
Raghunath Nambiar, Meikel Poess, Andrew Masland, H. Reza Taheri, Andrew Bond, Forrest Carman, Michael Majdalany

TPC-BiH: A Benchmark for Bitemporal Databases

Abstract
An increasing number of applications such as risk evaluation in banking or inventory management require support for temporal data. After more than a decade of standstill, the recent adoption of some bitemporal features in SQL:2011 has reinvigorated the support among commercial database vendors, who incorporate an increasing number of relevant bitemporal features. Naturally, assessing the performance and scalability of temporal data storage and operations is of great concern for potential users. The cost of keeping and querying history with novel operations (such as time travel, temporal joins or temporal aggregations) is not adequately reflected in any existing benchmark. In this paper, we present a benchmark proposal which provides comprehensive coverage of the bitemporal data management. It builds on the solid foundations of TPC-H but extends it with a rich set of queries and update scenarios. This workload stems both from real-life temporal applications from SAP’s customer base and a systematic coverage of temporal operators proposed in the academic literature. We present preliminary results of our benchmark on a number of temporal database systems, also highlighting the need for certain language extensions.
Martin Kaufmann, Peter M. Fischer, Norman May, Andreas Tonder, Donald Kossmann

Towards Comprehensive Measurement of Consistency Guarantees for Cloud-Hosted Data Storage Services

Abstract
The CAP theorem and the PACELC model have described the existence of direct trade-offs between consistency and availability as well as consistency and latency in distributed systems. Cloud storage services and NoSQL systems, both optimized for the web with high availability and low latency requirements, hence, typically opt to relax consistency guarantees. In particular, these systems usually offer eventual consistency which guarantees that all replicas will, in the absence of failures and further updates, eventually converge towards a consistent state where all replicas are identical. This, obviously, is a very imprecise description of actual guarantees.
Motivated by the popularity of eventually consistent storage systems, we take the position that a standard consistency benchmark is of great practical value. This paper is intended as a call for action; its goal is to motivate further research on building a standard comprehensive benchmark for quantifying the consistency guarantees of eventually consistent storage systems. We discuss the main challenges and requirements of such a benchmark, and present first steps towards a comprehensive consistency benchmark for cloud-hosted data storage systems. We evaluate our approach using experiments on both Cassandra and MongoDB.
David Bermbach, Liang Zhao, Sherif Sakr

TPC Express – A New Path for TPC Benchmarks

Abstract
To accommodate differences in systems architecture and DBMS functions and features, the TPC has long held that the best way to define a database benchmark is to author a paper specification of the application to be measured, leaving the implementation of that specification to the individual analyst. While this technique allows for the optimal implementation for a specific DBMS on a specific platform, it makes the initial entry into benchmark development a costly one – often cost prohibitive. The TPC has embarked on a plan to develop a new benchmark category, dubbed TPC Express, where benchmarks based on predefined, executable kits that can be rapidly deployed and measured. This paper defines the TPC Express model, contrasts it to the TPC’s existing “Enterprise” model, and highlights many of the changes needed within the TPC to ensure the Express model is a successful one.
Karl Huppler, Douglas Johnson

TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark

Abstract
The TPC-D benchmark was developed almost 20 years ago, and even though its current existence as TPC-H could be considered superseded by TPC-DS, one can still learn from it. We focus on the technical level, summarizing the challenges posed by the TPC-H workload as we now understand them, which we call “choke points”. We identify 28 different such choke points, grouped into six categories: Aggregation Performance, Join Performance, Data Access Locality, Expression Calculation, Correlated Subqueries and Parallel Execution. On the meta-level, we make the point that the rich set of choke-points found in TPC-H sets an example on how to design future DBMS benchmarks.
Peter Boncz, Thomas Neumann, Orri Erling

Architecture and Performance Characteristics of a PostgreSQL Implementation of the TPC-E and TPC-V Workloads

Abstract
The TPC has been developing a publicly available, end-to-end benchmarking kit to run the new TPC-V benchmark, with the goal of measuring the performance of databases subjected to the variability and elasticity of load demands that are common in cloud environments. This kit is being developed completely from scratch in Java and C++ with PostgreSQL as the target database. Since the TPC-V workload is based on the mature TPC-E benchmark, the kit initially implements the TPC-E schema and transactions. In this paper, we will report on the status of the kit, describe the architectural details, and provide results from prototyping experiments at performance levels that are representative of enterprise-class databases. We are not aware of other PostgreSQL benchmarking results running at the levels we will describe in the paper. We will list the optimizations that were made to PostgreSQL parameters, to hardware/operating system/file system settings, and to the benchmarking code to maximize the performance of PostgreSQL, and saturate a large, 4-socket server.
Andrew Bond, Douglas Johnson, Greg Kopczynski, H. Reza Taheri

A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems

Abstract
While NoSQL database systems are well established, it is not clear how to process multidimensional OLAP queries on current key-value stores. In this paper, we detail how to match the high-level cube model with the low-level key-value stores built on NoSQL databases, and illustrate how to support efficiently OLAP queries by scale out while retaining a MapReduce-like execution engine. For big data the functional problem of storage and processing power is compounded, we balanced them with partial aggregation between batch processing and query runtime. Base cuboids are initially constructed for TPC-DS fact tables by using multidimensional array, and cuboids for various granularity aggregation data are derived at runtime with base ones. The cube storage module converts dimension members into binary keys and leverages a novel distributed database to provide efficient storage for huge cuboids. The OLAP engine built on lightweight concurrent actors can scale out seamlessly; provide highly concurrent distributed cuboid processing. Finally, we illustrate some experiments on the implementation prototype based on TPC-DS queries. The results show that multidimensional models for OLAP applications on NoSQL systems are possible for future big data analytics.
Hongwei Zhao, Xiaojun Ye

PRIMEBALL: A Parallel Processing Framework Benchmark for Big Data Applications in the Cloud

Abstract
In this position paper, we draw the specifications for a novel benchmark for comparing parallel processing frameworks in the context of big data applications hosted in the cloud. We aim at filling several gaps in already existing cloud data processing benchmarks, which lack a real-life context for their processes, thus losing relevance when trying to assess performance for real applications. Hence, we propose a fictitious news site hosted in the cloud that is to be managed by the framework under analysis, together with several objective use case scenarios and measures for evaluating system performance. The main strengths of our benchmark definition are parallelization capabilities supporting cloud features and big data properties.
Jaume Ferrarons, Mulu Adhana, Carlos Colmenares, Sandra Pietrowska, Fadila Bentayeb, Jérôme Darmont

CEPBen: A Benchmark for Complex Event Processing Systems

Abstract
Complex Event processing (CEP) has emerged over the last ten years. CEP systems are outstanding in processing large amount of data and responding in a timely fashion. While CEP applications are fast growing, performance management in this area has not gain much attention. It is critical to meet the promised level of service for both system designers and users. In this paper, we present a benchmark for complex event processing systems: CEPBen. The CEPBen benchmark is designed to evaluate CEP functional behaviours, i.e., filtering, transformation and event pattern detection and provides a novel methodology of evaluating the performance of CEP systems. A performance study by running the CEPBen on Esper CEP engine is described and discussed. The results obtained from performance tests demonstrate the influences of CEP functional behaviours on the system performance.
Chunhui Li, Robert Berry

Backmatter

Weitere Informationen