Skip to main content
Top

2019 | Book

Performance Evaluation and Benchmarking for the Era of Artificial Intelligence

10th TPC Technology Conference, TPCTC 2018, Rio de Janeiro, Brazil, August 27–31, 2018, Revised Selected Papers

insite
SEARCH

About this book

This book constitutes the thoroughly refereed post-conference proceedings of the 10th TPC Technology Conference on Performance Evaluation and Benchmarking, TPCTC 2018, held in conjunction with the 44th International Conference on Very Large Databases (VLDB 2018) in August 2018.

The 10 papers presented were carefully reviewed and selected from numerous submissions.

The TPC encourages researchers and industry experts to present and debate novel ideas and methodologies in performance evaluation, measurement, and characterization.

Table of Contents

Frontmatter
Industry Panel on Defining Industry Standards for Benchmarking Artificial Intelligence
Abstract
Introduced in 2009, the Technology Conference on Performance Evaluation and Benchmarking (TPCTC) is a forum bringing together industry experts and researchers to develop innovative techniques for evaluation, measurement and characterization. This panel at the tenth TPC Technology Conference on Performance Evaluation and Benchmarking (TPCTC 2018) brought together industry experts and researchers from a broad spectrum of interests in the field of Artificial Intelligence (AI).
Raghunath Nambiar, Shahram Ghandeharizadeh, Gary Little, Christoph Boden, Ajay Dholakia
UniBench: A Benchmark for Multi-model Database Management Systems
Abstract
Unlike traditional database management systems which are organized around a single data model, a multi-model database (MMDB) utilizes a single, integrated back-end to support multiple data models, such as document, graph, relational, and key-value. As more and more platforms are proposed to deal with multi-model data, it becomes crucial to establish a benchmark for evaluating the performance and usability of MMDBs. Previous benchmarks, however, are inadequate for such scenario because they lack a comprehensive consideration for multiple models of data. In this paper, we present a benchmark, called UniBench, with the goal of facilitating a holistic and rigorous evaluation of MMDBs. UniBench consists of a mixed data model, a synthetic multi-model data generator, and a set of core workloads. Specifically, the data model simulates an emerging application: Social Commerce, a Web-based application combining E-commerce and social media. The data generator provides diverse data format including JSON, XML, key-value, tabular, and graph. The workloads are comprised of a set of multi-model queries and transactions, aiming to cover essential aspects of multi-model data management. We implemented all workloads on ArangoDB and OrientDB to illustrate the feasibility of our proposed benchmarking system and show the learned lessons through the evaluation of these two multi-model databases. The source code and data of this benchmark can be downloaded at http://​udbms.​cs.​helsinki.​fi/​bench/​.
Chao Zhang, Jiaheng Lu, Pengfei Xu, Yuxing Chen
PolyBench: The First Benchmark for Polystores
Abstract
Modern business intelligence requires data processing not only across a huge variety of domains but also across different paradigms, such as relational, stream, and graph models. This variety is a challenge for existing systems that typically only support a single or few different data models. Polystores were proposed as a solution for this challenge and received wide attention both in academia and in industry. These are systems that integrate different specialized data processing engines to enable fast processing of a large variety of data models. Yet, there is no standard to assess the performance of polystores. The goal of this work is to develop the first benchmark for polystores. To capture the flexibility of polystores, we focus on high level features in order to enable an execution of our benchmark suite on a large set of polystore solutions.
Jeyhun Karimov, Tilmann Rabl, Volker Markl
Benchmarking Distributed Data Processing Systems for Machine Learning Workloads
Abstract
Distributed data processing systems have been widely adopted to robustly scale out computations on massive data sets to many compute nodes in recent years. These systems are also popular choices to scale out the training of machine learning models. However, there is a lack of benchmarks to assess how efficiently data processing systems actually perform at executing machine learning algorithms at scale. For example, the learning algorithms chosen in the corresponding systems papers tend to be those that fit well onto the system’s paradigm rather than state of the art methods. Furthermore, experiments in those papers often neglect important aspects such as addressing all aspects of scalability. In this paper, we share our experience in evaluating novel data processing systems and present a core set of experiments of a benchmark for distributed data processing systems for machine learning workloads, a rationale for their necessity as well as an experimental evaluation.
Christoph Boden, Tilmann Rabl, Sebastian Schelter, Volker Markl
Characterizing the Performance and Resilience of HCI Clusters with the TPCx-HCI Benchmark
Abstract
We use the newly-released TPCx-HCI benchmark to characterize the performance and resilience properties of Hyper-Converged Infrastructure clusters. We demonstrate that good performance on an HCI cluster requires delivering all properties of high IOPS, low latencies, low CPU overhead, and uniform access to data from all Nodes. We show that unless the cluster can quickly and efficiently rebalance the VMs after a change in the workload, performance will be severely impacted.
We use the data accessibility test of TPCx-HCI to show how performance is impacted by rebuilding traffic after a Node goes down, and how long it takes for the rebuilding to finish.
H. Reza Taheri, Gary Little, Bhavik Desai, Andrew Bond, Doug Johnson, Greg Kopczynski
Requirements for an Enterprise AI Benchmark
Abstract
Artificial Intelligence (AI) is now the center of attention for many industries, ranging from private companies to academic institutions. While domains of interest and AI applications vary, one concern remains unchanged for everyone: How to determine if an end-to-end AI solution is performant? As AI is spreading to more industries, what metrics might be the reference for AI applications and benchmarks in the enterprise space? This paper intends to answer some of these questions. At present, the AI benchmarks either focus on evaluating deep learning approaches or infrastructure capabilities. Unfortunately, these approaches don’t capture end-to-end performance behavior of enterprise AI workloads. It is also clear that there is not one reference metric that will be suitable for all AI applications nor all existing platforms. We will first present the state of the art regarding the current basic and most popular AI benchmarks. We will then present the main characteristics of AI workloads from various industrial domains. Finally, we will focus on the needs for ongoing and future industry AI benchmarks and conclude on the gaps to improve AI benchmarks for enterprise workloads.
Cedric Bourrasset, France Boillod-Cerneux, Ludovic Sauge, Myrtille Deldossi, Francois Wellenreiter, Rajesh Bordawekar, Susan Malaika, Jean-Armand Broyelle, Marc West, Brian Belgodere
Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment
Abstract
Tensorflow (TF) is a highly popular Deep Learning (DL) software framework. Neural network training, a critical part of DL workflow, is a computationally intensive process that can take days or even weeks. Therefore, achieving faster training times is an active area of research and practise. TF supports multiple GPU parallelization, both within a single machine and between multiple physical servers. However, the distributed case is hard to use and consequently, almost all published performance data comes from the single machine use case. To fill this gap, here we benchmark Tensorflow in a GPU-equipped distributed environment. Our work evaluates performance of various hardware and software combinations. In particular, we examine several types of interconnect technologies to determine their impact on performance. Our results show that with the right choice of input parameters and appropriate hardware, GPU-equipped general-purpose compute clusters can provide comparable deep learning training performance to specialized machines designed for AI workloads.
Miro Hodak, Ajay Dholakia
A Comparison of Two Cache Augmented SQL Architectures
Abstract
Cloud service providers augment a SQL database management system with a cache to enhance system performance for workloads that exhibit a high read to write ratio. These in-memory caches provide a simple programming interface such as get, put, and delete. Using their software architecture, different caching frameworks can be categorized into Client-Server (CS) and Shared Address Space (SAS) systems. Example CS caches are memcached and Redis. Example SAS caches are Java Cache standard and its Google Guava implementation, Terracotta BigMemory and KOSAR. How do CS and SAS architectures compare with one another and what are their tradeoffs? This study quantifies an answer using BG, a benchmark for interactive social networking actions. In general, obtained results show SAS provides a higher performance with write policies playing an important role.
Shahram Ghandeharizadeh, Hieu Nguyen
Benchmarking and Performance Analysis of Event Sequence Queries on Relational Database
Abstract
The relational database has been the fundamental technology for data-driven decision making based on the histories of event occurrences about the analysis target. Thus the performance of analytical workloads in relational databases has been studied intensively. As a common language for performance analysis, decision support benchmarks such as TPC-H have been widely used. These benchmarks focus on summarization of the event occurrence information. Individual event occurrences or inter-occurrence associations are rarely examined in these benchmarks. However, this type of query, called an event sequence query in this paper, is becoming important in various real-world applications. Typically, an event sequence query extracts event sequences starting from a small number of interesting event occurrences. In a relational database, these queries are described by multiple self-joins on the whole sequence of events. Furthermore, each pair of events to be joined tends to have a strong correlation in the timestamp attribute, resulting in heavily skewed join workloads. Despite the usefulness in real-world data analysis, very little work has been done on performance analysis of event sequence queries.
In this paper, we present the initial design of ESQUE benchmark, a benchmark for event sequence queries. We then give experimental results of the comparison of database system implementations: PostgreSQL v.s. MySQL, and the comparison of historical versions of PostgreSQL. Conducted performance analysis shows that ESQUE benchmark allows us to discover performance problems which had been overlooked in existing benchmarks.
Yuto Hayamizu, Ryoji Kawamichi, Kazuo Goda, Masaru Kitsuregawa
Data Consistency Properties of Document Store as a Service (DSaaS): Using MongoDB Atlas as an Example
Abstract
Document-oriented database systems, also known as document stores, are attractive for building modern web applications where the speed of development and deployment are critical, especially due to the prevalence of data in document-structured formats such as JSON and XML. MongoDB Atlas is a hosted offering of MongoDB as a Service, which is easy to set up, operate, and scale in the cloud. Like many NoSQL stores, MongoDB Atlas allows users to accept possible temporary inconsistency among the replicas, as a trade-off for lower latency and higher availability during partitions. In this work, we describe an empirical study to quantify the amount of inconsistency observed in data that is held in MongoDB Atlas.
Chenhao Huang, Michael Cahill, Alan Fekete, Uwe Röhm
Lessons Learned from the Industry’s First TPC Benchmark DS (TPC-DS)
Abstract
The TPC Benchmark DS (TPC-DS) is a decision support benchmark that models several generally applicable aspects of a decision support system, including queries and data maintenance, which is representative of modern decision support and big data applications. TPC-DS was initially designed for Relational Database Management Systems (RDBMS), later extended support for Apache Hadoop. This paper provides the lessons learned including hardware and software tuning parameters from the first TPC-DS publication which was on Cisco UCS® Integrated Infrastructure with Transwarp Data Hub.
Manan Trivedi, Zhenqiang Chen
Backmatter
Metadata
Title
Performance Evaluation and Benchmarking for the Era of Artificial Intelligence
Editors
Raghunath Nambiar
Meikel Poess
Copyright Year
2019
Electronic ISBN
978-3-030-11404-6
Print ISBN
978-3-030-11403-9
DOI
https://doi.org/10.1007/978-3-030-11404-6