research-article

Automating model search for large scale machine learning

Authors:
Evan R. Sparks

UC Berkeley

UC Berkeley
View Profile

,
Ameet Talwalkar

UCLA

UCLA
View Profile

,
Daniel Haas

UC Berkeley

UC Berkeley
View Profile

,
Michael J. Franklin

UC Berkeley

UC Berkeley
View Profile

,
Michael I. Jordan

UC Berkeley

UC Berkeley
View Profile

,
Tim Kraska

Brown University

Brown University
View Profile

SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud ComputingAugust 2015Pages 368–380https://doi.org/10.1145/2806777.2806945

Published:27 August 2015Publication History

SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing

Pages 368–380

ABSTRACT

The proliferation of massive datasets combined with the development of sophisticated analytical techniques has enabled a wide variety of novel applications such as improved product recommendations, automatic image tagging, and improved speech-driven interfaces. A major obstacle to supporting these predictive applications is the challenging and expensive process of identifying and training an appropriate predictive model. Recent efforts aiming to automate this process have focused on single node implementations and have assumed that model training itself is a black box, limiting their usefulness for applications driven by large-scale datasets. In this work, we build upon these recent efforts and propose an architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyper-parameter tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation. The result is TuPAQ, a component of the MLbase system that automatically finds and trains models for a user's predictive application with comparable quality to those found using exhaustive strategies, but an order of magnitude more efficiently than the standard baseline approach. TuPAQ scales to models trained on Terabytes of data across hundreds of machines.

References

Anaconda python distribution. http://docs.continuum.io/anaconda/.Google Scholar
Apache Mahout. http://mahout.apache.org/.Google Scholar
Cluster parallel learning. {With Vowpal Wabbit}. https://github.com/JohnLangford/vowpal_wabbit/wiki/Cluster_parallel.pdf.Google Scholar
GraphLab Create Documentation: model parameter search.Google Scholar
WEKA. http://www.cs.waikato.ac.nz/ml/weka/.Google Scholar
A. Agarwal, J. Duchi, and P. Bartlett. Oracle inequalities for computationally adaptive model selection. arXiv.org, Aug. 2012.Google Scholar
S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. SIGMOD, 2014. Google ScholarDigital Library
A. Alexandrov et al. The Stratosphere Platform for Big Data Analytics. VLDB, 2014. Google ScholarDigital Library
G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18--20, 1967, spring joint computer conference, 1967. Google ScholarDigital Library
K. Bache and M. Lichman. UCI Machine Learning Repository, 2013.Google Scholar
Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. JRSS B, 1995.Google ScholarCross Ref
A. Berg, J. Deng, and F.-F. Li. ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010), 2010.Google Scholar
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. Algorithms for Hyper-Parameter Optimization. NIPS, 2011.Google ScholarDigital Library
J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 2012. Google ScholarDigital Library
V. R. Borkar et al. Hyracks: A Flexible and Extensible Foundation for Data-Intensive Computing. In ICDE, 2011. Google ScholarDigital Library
S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 2012.Google ScholarCross Ref
J. Canny and H. Zhao. Big data analytics with small footprint: squaring the cloud. In KDD, 2013. Google ScholarDigital Library
J. Deng, J. Krause, A. C. Berg, and L. Fei-Fei. Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. CVPR, 2012. Google ScholarDigital Library
E. Even-Dar, S. Mannor, and Y. Mansour. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. JMLR, 2006. Google ScholarDigital Library
J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, and V. Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus. 1993.Google Scholar
A. Ghoting et al. SystemML: Declarative machine learning on MapReduce. In ICDE, 2011. Google ScholarDigital Library
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, 2012. Google ScholarDigital Library
H. Herodotou and S. Babu. Profiling, what-if analysis, and cost-based optimization of mapreduce programs. VLDB, 2011.Google ScholarDigital Library
G. Hinton et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 2012.Google Scholar
P.-S. Huang, H. Avron, T. N. Sainath, V. Sindhwani, and B. Ramabhadran. Kernel methods match deep neural networks on TIMIT. IEEE, 2014.Google ScholarCross Ref
F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential Model-Based Optimization for General Algorithm Configuration. pages 507--523, 2011. Google ScholarDigital Library
K. Jamieson and A. Talwalkar. Non-stochastic Best Arm Identification and Hyperparameter Optimization. CoRR, 2015.Google Scholar
B. Komer et al. Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn. In ICML workshop on AutoML, 2014.Google ScholarCross Ref
T. Kraska, A. Talwalkar, J. Duchi, R. Griffith, M. Franklin, and M. Jordan. MLbase: A Distributed Machine-learning System. In CIDR, 2013.Google Scholar
A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. NIPS, 2012.Google ScholarDigital Library
M. Kuhn et al. caret: Classification and Regression Training, 2015. R package version 6.0--41.Google Scholar
A. Kumar, P. Konda, and C. Ré. Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System. VLDB Demo, 2013. Google ScholarDigital Library
C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM Trans. Math. Softw., 1979. Google ScholarDigital Library
J. D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical report, University of Virginia, 1991--2007.Google Scholar
J. D. McCalpin. Memory Bandwidth and Machine Balance in Current High Performance Computers. TCCA Newsletter, 1995.Google Scholar
X. Meng et al. MLlib: Machine Learning in Apache Spark. CoRR, 2015.Google Scholar
J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. The computer journal, 1965.Google Scholar
B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively Parallel Learning of Tree Ensembles with MapReduce. VLDB, 2009. Google ScholarDigital Library
F. Pedregosa et al. Scikit-learn: Machine learning in Python. JMLR, 2011. Google ScholarDigital Library
M. J. Powell. An Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating Derivatives. The computer journal, 1964.Google Scholar
A. Rahimi and B. Recht. Random Features for Large-Scale Kernel Machines. In NIPS, 2007.Google ScholarDigital Library
T. N. Sainath, B. Ramabhadran, M. Picheny, D. Nahamoo, and D. Kanevsky. Exemplar-based sparse representation features: From timit to lvcsr. IEEE Transactions on Audio, Speech, and Language Processing, 2011. Google ScholarDigital Library
J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian Optimization of Machine Learning Algorithms. arXiv.org, June 2012.Google Scholar
E. R. Sparks, A. Talwalkar, et al. MLI: An API for Distributed Machine Learning. In ICDM, 2013.Google ScholarCross Ref
C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In KDD, 2013. Google ScholarDigital Library
R. C. Whaley and J. J. Dongarra. Automatically Tuned Linear Algebra Software. In ACM/IEEE conference on Supercomputing, 1998. Google ScholarDigital Library
S. Williams, A. Waterman, and D. Patterson. Roofline: An Insightful Visual Performance Model for Multicore Architectures. CACM, 2009. Google ScholarDigital Library
M. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, 2012. Google ScholarDigital Library

Automating model search for large scale machine learning
1. Computing methodologies

Recommendations

Scalable Supervised Discrete Hashing for Large-Scale Search
WWW '18: Proceedings of the 2018 World Wide Web Conference

Supervised hashing methods have attracted much attention in these years. However, most existing supervised hashing algorithms have some of the following problems. First, most of them leverage the pairwise similarity matrix, whose size is quadratic to ...
Read More
Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval

Similarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that ...
Read More
Predicting unseen labels using label hierarchies in large-scale multi-label learning
ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

An important problem in multi-label classification is to capture label patterns or underlying structures that have an impact on such patterns. One way of learning underlying structures over labels is to project both instances and labels into the same ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing
August 2015
446 pages
ISBN:9781450336512
DOI:10.1145/2806777
General Chair:
Shahram Ghandeharizadeh
University of Southern California
,
Program Chairs:
Magdalena Balazinska
University of Washington
,
Michael J. Freedman
Princeton University
,
Publications Chair:
Sumita Barahmand
Microsoft
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 August 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
SoCC '15 Paper Acceptance Rate34of157submissions,22%Overall Acceptance Rate169of722submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 89
  Total Citations
  View Citations
- 1,349
  Total Downloads
- Downloads (Last 12 months)65
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Automating model search for large scale machine learning

SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing

ABSTRACT

References

Cited By

Recommendations

Scalable Supervised Discrete Hashing for Large-Scale Search

Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search

Predicting unseen labels using label hierarchies in large-scale multi-label learning