ABSTRACT
The proliferation of massive datasets combined with the development of sophisticated analytical techniques has enabled a wide variety of novel applications such as improved product recommendations, automatic image tagging, and improved speech-driven interfaces. A major obstacle to supporting these predictive applications is the challenging and expensive process of identifying and training an appropriate predictive model. Recent efforts aiming to automate this process have focused on single node implementations and have assumed that model training itself is a black box, limiting their usefulness for applications driven by large-scale datasets. In this work, we build upon these recent efforts and propose an architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyper-parameter tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation. The result is TuPAQ, a component of the MLbase system that automatically finds and trains models for a user's predictive application with comparable quality to those found using exhaustive strategies, but an order of magnitude more efficiently than the standard baseline approach. TuPAQ scales to models trained on Terabytes of data across hundreds of machines.
- Anaconda python distribution. http://docs.continuum.io/anaconda/.Google Scholar
- Apache Mahout. http://mahout.apache.org/.Google Scholar
- Cluster parallel learning. {With Vowpal Wabbit}. https://github.com/JohnLangford/vowpal_wabbit/wiki/Cluster_parallel.pdf.Google Scholar
- GraphLab Create Documentation: model parameter search.Google Scholar
- WEKA. http://www.cs.waikato.ac.nz/ml/weka/.Google Scholar
- A. Agarwal, J. Duchi, and P. Bartlett. Oracle inequalities for computationally adaptive model selection. arXiv.org, Aug. 2012.Google Scholar
- S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. SIGMOD, 2014. Google ScholarDigital Library
- A. Alexandrov et al. The Stratosphere Platform for Big Data Analytics. VLDB, 2014. Google ScholarDigital Library
- G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18--20, 1967, spring joint computer conference, 1967. Google ScholarDigital Library
- K. Bache and M. Lichman. UCI Machine Learning Repository, 2013.Google Scholar
- Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. JRSS B, 1995.Google ScholarCross Ref
- A. Berg, J. Deng, and F.-F. Li. ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010), 2010.Google Scholar
- J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. Algorithms for Hyper-Parameter Optimization. NIPS, 2011.Google ScholarDigital Library
- J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 2012. Google ScholarDigital Library
- V. R. Borkar et al. Hyracks: A Flexible and Extensible Foundation for Data-Intensive Computing. In ICDE, 2011. Google ScholarDigital Library
- S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 2012.Google ScholarCross Ref
- J. Canny and H. Zhao. Big data analytics with small footprint: squaring the cloud. In KDD, 2013. Google ScholarDigital Library
- J. Deng, J. Krause, A. C. Berg, and L. Fei-Fei. Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. CVPR, 2012. Google ScholarDigital Library
- E. Even-Dar, S. Mannor, and Y. Mansour. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. JMLR, 2006. Google ScholarDigital Library
- J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, and V. Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus. 1993.Google Scholar
- A. Ghoting et al. SystemML: Declarative machine learning on MapReduce. In ICDE, 2011. Google ScholarDigital Library
- J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, 2012. Google ScholarDigital Library
- H. Herodotou and S. Babu. Profiling, what-if analysis, and cost-based optimization of mapreduce programs. VLDB, 2011.Google ScholarDigital Library
- G. Hinton et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 2012.Google Scholar
- P.-S. Huang, H. Avron, T. N. Sainath, V. Sindhwani, and B. Ramabhadran. Kernel methods match deep neural networks on TIMIT. IEEE, 2014.Google ScholarCross Ref
- F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential Model-Based Optimization for General Algorithm Configuration. pages 507--523, 2011. Google ScholarDigital Library
- K. Jamieson and A. Talwalkar. Non-stochastic Best Arm Identification and Hyperparameter Optimization. CoRR, 2015.Google Scholar
- B. Komer et al. Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn. In ICML workshop on AutoML, 2014.Google ScholarCross Ref
- T. Kraska, A. Talwalkar, J. Duchi, R. Griffith, M. Franklin, and M. Jordan. MLbase: A Distributed Machine-learning System. In CIDR, 2013.Google Scholar
- A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. NIPS, 2012.Google ScholarDigital Library
- M. Kuhn et al. caret: Classification and Regression Training, 2015. R package version 6.0--41.Google Scholar
- A. Kumar, P. Konda, and C. Ré. Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System. VLDB Demo, 2013. Google ScholarDigital Library
- C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM Trans. Math. Softw., 1979. Google ScholarDigital Library
- J. D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical report, University of Virginia, 1991--2007.Google Scholar
- J. D. McCalpin. Memory Bandwidth and Machine Balance in Current High Performance Computers. TCCA Newsletter, 1995.Google Scholar
- X. Meng et al. MLlib: Machine Learning in Apache Spark. CoRR, 2015.Google Scholar
- J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. The computer journal, 1965.Google Scholar
- B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively Parallel Learning of Tree Ensembles with MapReduce. VLDB, 2009. Google ScholarDigital Library
- F. Pedregosa et al. Scikit-learn: Machine learning in Python. JMLR, 2011. Google ScholarDigital Library
- M. J. Powell. An Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating Derivatives. The computer journal, 1964.Google Scholar
- A. Rahimi and B. Recht. Random Features for Large-Scale Kernel Machines. In NIPS, 2007.Google ScholarDigital Library
- T. N. Sainath, B. Ramabhadran, M. Picheny, D. Nahamoo, and D. Kanevsky. Exemplar-based sparse representation features: From timit to lvcsr. IEEE Transactions on Audio, Speech, and Language Processing, 2011. Google ScholarDigital Library
- J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian Optimization of Machine Learning Algorithms. arXiv.org, June 2012.Google Scholar
- E. R. Sparks, A. Talwalkar, et al. MLI: An API for Distributed Machine Learning. In ICDM, 2013.Google ScholarCross Ref
- C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In KDD, 2013. Google ScholarDigital Library
- R. C. Whaley and J. J. Dongarra. Automatically Tuned Linear Algebra Software. In ACM/IEEE conference on Supercomputing, 1998. Google ScholarDigital Library
- S. Williams, A. Waterman, and D. Patterson. Roofline: An Insightful Visual Performance Model for Multicore Architectures. CACM, 2009. Google ScholarDigital Library
- M. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, 2012. Google ScholarDigital Library
- Automating model search for large scale machine learning
Recommendations
Scalable Supervised Discrete Hashing for Large-Scale Search
WWW '18: Proceedings of the 2018 World Wide Web ConferenceSupervised hashing methods have attracted much attention in these years. However, most existing supervised hashing algorithms have some of the following problems. First, most of them leverage the pairwise similarity matrix, whose size is quadratic to ...
Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search
SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information RetrievalSimilarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that ...
Predicting unseen labels using label hierarchies in large-scale multi-label learning
ECMLPKDD'15: Proceedings of the 2015th European Conference on Machine Learning and Knowledge Discovery in Databases - Volume Part IAn important problem in multi-label classification is to capture label patterns or underlying structures that have an impact on such patterns. One way of learning underlying structures over labels is to project both instances and labels into the same ...
Comments