skip to main content
10.1145/2806777.2806945acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Automating model search for large scale machine learning

Published:27 August 2015Publication History

ABSTRACT

The proliferation of massive datasets combined with the development of sophisticated analytical techniques has enabled a wide variety of novel applications such as improved product recommendations, automatic image tagging, and improved speech-driven interfaces. A major obstacle to supporting these predictive applications is the challenging and expensive process of identifying and training an appropriate predictive model. Recent efforts aiming to automate this process have focused on single node implementations and have assumed that model training itself is a black box, limiting their usefulness for applications driven by large-scale datasets. In this work, we build upon these recent efforts and propose an architecture for automatic machine learning at scale comprised of a cost-based cluster resource allocation estimator, advanced hyper-parameter tuning techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching and optimal resource allocation. The result is TuPAQ, a component of the MLbase system that automatically finds and trains models for a user's predictive application with comparable quality to those found using exhaustive strategies, but an order of magnitude more efficiently than the standard baseline approach. TuPAQ scales to models trained on Terabytes of data across hundreds of machines.

References

  1. Anaconda python distribution. http://docs.continuum.io/anaconda/.Google ScholarGoogle Scholar
  2. Apache Mahout. http://mahout.apache.org/.Google ScholarGoogle Scholar
  3. Cluster parallel learning. {With Vowpal Wabbit}. https://github.com/JohnLangford/vowpal_wabbit/wiki/Cluster_parallel.pdf.Google ScholarGoogle Scholar
  4. GraphLab Create Documentation: model parameter search.Google ScholarGoogle Scholar
  5. WEKA. http://www.cs.waikato.ac.nz/ml/weka/.Google ScholarGoogle Scholar
  6. A. Agarwal, J. Duchi, and P. Bartlett. Oracle inequalities for computationally adaptive model selection. arXiv.org, Aug. 2012.Google ScholarGoogle Scholar
  7. S. Agarwal, H. Milner, A. Kleiner, A. Talwalkar, M. Jordan, S. Madden, B. Mozafari, and I. Stoica. Knowing when you're wrong: Building fast and reliable approximate query processing systems. SIGMOD, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Alexandrov et al. The Stratosphere Platform for Big Data Analytics. VLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. M. Amdahl. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings of the April 18--20, 1967, spring joint computer conference, 1967. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Bache and M. Lichman. UCI Machine Learning Repository, 2013.Google ScholarGoogle Scholar
  11. Y. Benjamini and Y. Hochberg. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. JRSS B, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Berg, J. Deng, and F.-F. Li. ImageNet Large Scale Visual Recognition Challenge 2010 (ILSVRC2010), 2010.Google ScholarGoogle Scholar
  13. J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. Algorithms for Hyper-Parameter Optimization. NIPS, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. JMLR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. R. Borkar et al. Hyracks: A Flexible and Extensible Foundation for Data-Intensive Computing. In ICDE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Bubeck and N. Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Canny and H. Zhao. Big data analytics with small footprint: squaring the cloud. In KDD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Deng, J. Krause, A. C. Berg, and L. Fei-Fei. Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. CVPR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Even-Dar, S. Mannor, and Y. Mansour. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. JMLR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, and V. Zue. TIMIT Acoustic-Phonetic Continuous Speech Corpus. 1993.Google ScholarGoogle Scholar
  21. A. Ghoting et al. SystemML: Declarative machine learning on MapReduce. In ICDE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Herodotou and S. Babu. Profiling, what-if analysis, and cost-based optimization of mapreduce programs. VLDB, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. G. Hinton et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. Signal Processing Magazine, IEEE, 2012.Google ScholarGoogle Scholar
  25. P.-S. Huang, H. Avron, T. N. Sainath, V. Sindhwani, and B. Ramabhadran. Kernel methods match deep neural networks on TIMIT. IEEE, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  26. F. Hutter, H. H. Hoos, and K. Leyton-Brown. Sequential Model-Based Optimization for General Algorithm Configuration. pages 507--523, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Jamieson and A. Talwalkar. Non-stochastic Best Arm Identification and Hyperparameter Optimization. CoRR, 2015.Google ScholarGoogle Scholar
  28. B. Komer et al. Hyperopt-sklearn: Automatic hyperparameter configuration for scikit-learn. In ICML workshop on AutoML, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  29. T. Kraska, A. Talwalkar, J. Duchi, R. Griffith, M. Franklin, and M. Jordan. MLbase: A Distributed Machine-learning System. In CIDR, 2013.Google ScholarGoogle Scholar
  30. A. Krizhevsky et al. Imagenet classification with deep convolutional neural networks. NIPS, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Kuhn et al. caret: Classification and Regression Training, 2015. R package version 6.0--41.Google ScholarGoogle Scholar
  32. A. Kumar, P. Konda, and C. Ré. Feature Selection in Enterprise Analytics: A Demonstration using an R-based Data Analytics System. VLDB Demo, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic Linear Algebra Subprograms for Fortran Usage. ACM Trans. Math. Softw., 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical report, University of Virginia, 1991--2007.Google ScholarGoogle Scholar
  35. J. D. McCalpin. Memory Bandwidth and Machine Balance in Current High Performance Computers. TCCA Newsletter, 1995.Google ScholarGoogle Scholar
  36. X. Meng et al. MLlib: Machine Learning in Apache Spark. CoRR, 2015.Google ScholarGoogle Scholar
  37. J. A. Nelder and R. Mead. A Simplex Method for Function Minimization. The computer journal, 1965.Google ScholarGoogle Scholar
  38. B. Panda, J. S. Herbach, S. Basu, and R. J. Bayardo. Planet: Massively Parallel Learning of Tree Ensembles with MapReduce. VLDB, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. F. Pedregosa et al. Scikit-learn: Machine learning in Python. JMLR, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. M. J. Powell. An Efficient Method for Finding the Minimum of a Function of Several Variables Without Calculating Derivatives. The computer journal, 1964.Google ScholarGoogle Scholar
  41. A. Rahimi and B. Recht. Random Features for Large-Scale Kernel Machines. In NIPS, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. N. Sainath, B. Ramabhadran, M. Picheny, D. Nahamoo, and D. Kanevsky. Exemplar-based sparse representation features: From timit to lvcsr. IEEE Transactions on Audio, Speech, and Language Processing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. J. Snoek, H. Larochelle, and R. P. Adams. Practical Bayesian Optimization of Machine Learning Algorithms. arXiv.org, June 2012.Google ScholarGoogle Scholar
  44. E. R. Sparks, A. Talwalkar, et al. MLI: An API for Distributed Machine Learning. In ICDM, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  45. C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown. Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms. In KDD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. R. C. Whaley and J. J. Dongarra. Automatically Tuned Linear Algebra Software. In ACM/IEEE conference on Supercomputing, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. S. Williams, A. Waterman, and D. Patterson. Roofline: An Insightful Visual Performance Model for Multicore Architectures. CACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. M. Zaharia et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Automating model search for large scale machine learning

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing
      August 2015
      446 pages
      ISBN:9781450336512
      DOI:10.1145/2806777

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 August 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SoCC '15 Paper Acceptance Rate34of157submissions,22%Overall Acceptance Rate169of722submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader