research-article

BOAT: Building Auto-Tuners with Structured Bayesian Optimization

Authors:
Valentin Dalibard

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

,
Michael Schaarschmidt

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

,
Eiko Yoneki

University of Cambridge, Cambridge, United Kingdom

University of Cambridge, Cambridge, United Kingdom
View Profile

WWW '17: Proceedings of the 26th International Conference on World Wide WebApril 2017Pages 479–488https://doi.org/10.1145/3038912.3052662

Published:03 April 2017Publication History

WWW '17: Proceedings of the 26th International Conference on World Wide Web

Pages 479–488

ABSTRACT

Due to their complexity, modern systems expose many configuration parameters which users must tune to maximize performance. Auto-tuning has emerged as an alternative in which a black-box optimizer iteratively evaluates configurations to find efficient ones. Unfortunately, for many systems, such as distributed systems, evaluating performance takes too long and the space of configurations is too large for the optimizer to converge within a reasonable time.

We present BOAT, a framework which allows developers to build efficient bespoke auto-tuners for their system, in situations where generic auto-tuners fail. At BOAT's core is structured Bayesian optimization (SBO), a novel extension of the Bayesian optimization algorithm. SBO leverages contextual information provided by system developers, in the form of a probabilistic model of the system's behavior, to make informed decisions about which configurations to evaluate. In a case study, we tune the scheduling of a neural network computation on a heterogeneous cluster. Our auto-tuner converges within ten iterations. The optimized configurations outperform those found by generic auto-tuners in thirty iterations by up to 2X.

References

Marín Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.Google Scholar
Marín Abadi et al. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 265--283, 2016. Google ScholarDigital Library
Jason Ansel et al. Petabricks: A language and compiler for algorithmic choice. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '09, pages 38--49, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
Jason Ansel et al. Opentuner: an extensible framework for program autotuning. In Proceedings of the 23rd international conference on Parallel architectures and compilation, pages 303--316. ACM, 2014. Google ScholarDigital Library
Eric Brochu, Vlad M Cora, and Nando de Freitas. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Technical Report UBC TR-2009-023, University of British Columbia, 2009.Google Scholar
Brendan Burns, Brian Grant, David Oppenheimer, Eric Brewer, and John Wilkes. Borg, Omega, and Kubernetes. ACM Queue, 14:70--93, 2016. Google ScholarDigital Library
Surajit Chaudhuri. An overview of query optimization in relational systems. In Proceedings of the seventeenth symposium on Principles of database systems, pages 34--43. ACM, 1998. Google ScholarDigital Library
Jianmin Chen, Rajat Monga, Samy Bengio, and Rafal Jozefowicz. Revisiting distributed synchronous SGD. arXiv preprint arXiv:1604.00981, 2016.Google Scholar
Soumith Chintala. Deepmark benchmark. https://github.com/DeepMark/deepmark.Google Scholar
Scott Clark, Eric Liu, Peter Frazier, JiaLei Wang, Deniz Oktay, and Norases Vesdapunt. MOE: A global, black box optimization engine for real world metric optimization. https://github.com/Yelp/MOE, 2014.Google Scholar
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143--154. ACM, 2010. Google ScholarDigital Library
Valentin Dalibard. A framework to build bespoke auto-tuners with structured Bayesian optimisation. PhD thesis, University of Cambridge (UCAM-CL-TR-900), January 2017.Google Scholar
Jeffrey Dean et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, pages 1232--1240, 2012. Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. Paragon: QoS-aware scheduling for heterogeneous datacenters. In ACM SIGPLAN Notices, volume 48, pages 77--88. ACM, 2013. Google ScholarDigital Library
Christina Delimitrou and Christos Kozyrakis. Quasar: resource-efficient and QoS-aware cluster management. In ACM SIGPLAN Notices, volume 49, pages 127--144. ACM, 2014. Google ScholarDigital Library
Diego Didona, Nuno Diegues, Anne-Marie Kermarrec, Rachid Guerraoui, Ricardo Neves, and Paolo Romano. Proteus™: Abstraction meets performance in transactional memory. In Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, pages 757--771. ACM, 2016. Google ScholarDigital Library
Andrew D. Gordon, Thomas A. Henzinger, Aditya V. Nori, and Sriram K. Rajamani. Probabilistic programming. In International Conference on Software Engineering (ICSE, FOSE track), 2014. Google ScholarDigital Library
Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, and Shivnath Babu. Starfish: A self-tuning system for big data analytics. In CIDR, volume 11, pages 261--272, 2011.Google Scholar
Frank Hutter, Holger H Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization, pages 507--523. Springer, 2011. Google ScholarDigital Library
Alex Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014.Google Scholar
Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. The MIT Press, 2012. Google ScholarDigital Library
Carl Edward Rasmussen. Gaussian processes for machine learning. 2006.Google ScholarDigital Library
Benjamin Recht, Christopher Re, Stephen Wright, and Feng Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 693--701, 2011. Google ScholarDigital Library
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In INTERSPEECH, pages 1058--1062, 2014.Google Scholar
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando de Freitas. Taking the human out of the loop: A review of Bayesian optimization. Proceedings of the IEEE, 104(1):148--175, 2016.Google ScholarCross Ref
Jasper Snoek, Hugo Larochelle, and Ryan Prescott Adams. Practical bayesian optimization of machine learning algorithms. In Neural Information Processing Systems, 2012. Google ScholarDigital Library
Christian Szegedy et al. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1--9, 2015.Google Scholar
The Apache Software Foundation. Apache Cassandra. http://cassandra.apache.org.Google Scholar
Shivaram Venkataraman, Zongheng Yang, Michael Franklin, Benjamin Recht, and Ion Stoica. Ernest: efficient performance prediction for large-scale advanced analytics. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 363--378, 2016. Google ScholarDigital Library
R Clint Whaley and Jack J Dongarra. Automatically tuned linear algebra software. In Proceedings of the 1998 ACM/IEEE conference on Supercomputing, pages 1--27. IEEE Computer Society, 1998. Google ScholarDigital Library

Index Terms

BOAT: Building Auto-Tuners with Structured Bayesian Optimization
1. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic inference problems
      1. Bayesian computation
2. Software and its engineering
  1. Software notations and tools

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs
SCC '12: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis

OpenCL (Open Computing Language) is a framework for general-purpose parallel programming. Programs written in OpenCL are functionally portable across multiple processors including CPUs, GPUs, and also FPGAs. Using an auto-tuning technique makes ...
Read More
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

In this work, we evaluate OpenCL as a programming tool for developing performance-portable applications for GPGPU. While the Khronos group developed OpenCL with programming portability in mind, performance is not necessarily portable. OpenCL has ...
Read More
Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms
IWOCL '19: Proceedings of the International Workshop on OpenCL

A key challenge in programming high-performance applications is achieving portable performance, such that the same program code can reach a consistent level of performance over the variety of modern parallel processors, including multi-core CPU and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '17: Proceedings of the 26th International Conference on World Wide Web
April 2017
1678 pages
ISBN:9781450349130
General Chairs:
Rick Barrett
W3Events
,
Rick Cummings
Murdoch University
,
Program Chairs:
Eugene Agichtein
Emory University
,
Evgeniy Gabrilovich
Google Research
Copyright © 2017 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 3 April 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
auto-tuning
bayesian optimization
distributed stochastic gradient descent
distributed systems
neural networks
probabilistic programming
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '17 Paper Acceptance Rate164of966submissions,17%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 729
  Total Downloads
- Downloads (Last 12 months)51
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

BOAT: Building Auto-Tuners with Structured Bayesian Optimization

WWW '17: Proceedings of the 26th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

BOAT: Building Auto-Tuners with Structured Bayesian Optimization

WWW '17: Proceedings of the 26th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance Tuning of Matrix Multiplication in OpenCL on Different GPUs and CPUs

From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

Developing High-Performance, Portable OpenCL Code via Multi-Dimensional Homomorphisms

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media