skip to main content
research-article
Free Access

Hadoop Superlinear Scalability: The perpetual motion of parallel performance

Published:08 May 2015Publication History
Skip Abstract Section

Abstract

We often see more than 100 percent speedup efficiency! came the rejoinder to the innocent reminder that you can’t have more than 100 percent of anything. But this was just the first volley from software engineers during a presentation on how to quantify computer system scalability in terms of the speedup metric. In different venues, on subsequent occasions, that retort seemed to grow into a veritable chorus that not only was superlinear speedup commonly observed, but also the model used to quantify scalability for the past 20 years failed when applied to superlinear speedup data.

References

  1. Apache Whirr; https://whirr.apache.org.Google ScholarGoogle Scholar
  2. Calvert, C., Kulkarni, D. 2009. Essential LINQ. Boston, MA: Pearson Education Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cloudera Hadoop; http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-4-7-0.html.Google ScholarGoogle Scholar
  4. Eijkhout, V. 2014. Introduction to high-performance scientific computing. Lulu.com. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Feynman, R. P. The Papp perpetual motion engine; http://hoaxes.org/comments/papparticle2.html.Google ScholarGoogle Scholar
  6. Gunther, N. J. 1993. A simple capacity model of massively parallel transaction systems. In Proceedings of International Computer Measurement Group Conference; http://www.perfdynamics.com/Papers/njgCMG93.pdf.Google ScholarGoogle Scholar
  7. Gunther, N. J. 2001. Performance and scalability models for a hypergrowth e-commerce Web site. In Performance Engineering, State of the Art and Current Trends. (Eds.) Dumke, R. R., Rautenstrauch, C., Schmietendorf, A., Scholz, A. Lecture Notes in Computer Science 2047: 267-282. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gunther, N. J. 2007. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer; http://www.springer.com/computer/communication+networks/book/978-3-540-26138-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gunther, N. J. 2008. A general theory of computational scalability based on rational functions; http://arxiv.org/abs/0808.1431.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Gunther, N. J. 2012. PostgreSQL scalability analysis deconstructed; http://perfdynamics.blogspot.com/2012/04/postgresql-scalability-analysis.html.Google ScholarGoogle Scholar
  11. Gunther, N. J., Subramanyam, S., Parvu, S. 2010. Hidden scalability gotchas in Memcached and friends. VELOCITY Web Performance and Operations Conference; http://velocityconf.com/velocity2010/public/schedule/detail/13046.Google ScholarGoogle Scholar
  12. Haas, R. 2011. Scalability, in graphical form, analyzed; http://rhaas.blogspot.com/2011/09/scalability-in-graphical-form-analyzed.html.Google ScholarGoogle Scholar
  13. Hadoop Log Tools; https://github.com/melrief/Hadoop-Log-Tools.Google ScholarGoogle Scholar
  14. Hennessy, J. L., Patterson, D. A. 1996. Computer Architecture: A Quantitative Approach. Second edition. Waltham, MA: Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hunt, P., Konar, M., Junqueira, F. P., Reed, B. 2010. ZooKeeper: Wait-free coordination for Internet-scale systems. In Proceedings of the Usenix Annual Technical Conference; https://www.usenix.org/legacy/event/usenix10/tech/full_papers/Hunt.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O'Malley, O. 2008. TeraByte Sort on Apache Hadoop; http://sortbenchmark.org/YahooHadoop.pdf.Google ScholarGoogle Scholar
  17. O'Malley, O., Murthy, A. C. 2009. Winning a 60 second dash with a yellow elephant; http://sortbenchmark.org/Yahoo2009.pdf.Google ScholarGoogle Scholar
  18. Parvu, S. 2012. Private communication.Google ScholarGoogle Scholar
  19. Performance Dynamics Company. 2014. How to quantify scalability (including calculator tools); http://www.perfdynamics.com/Manifesto/USLscalability.html.Google ScholarGoogle Scholar
  20. Schwartz, B. 2011. Is VoltDB really as scalable as they claim? Percona MySQL Performance Blog; http://www.percona.com/blog/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/.Google ScholarGoogle Scholar
  21. sFlow. 2010. SDN analytics and control using sFlow standard Superlinear; http://blog.sflow.com/2010/09/superlinear.html.Google ScholarGoogle Scholar
  22. Stackoverflow. Where does superlinear speedup come from?; http://stackoverflow.com/questions/4332967/where-does-super-linear-speedup-come-from.Google ScholarGoogle Scholar
  23. Sun Fire X2270 M2 super-linear scaling of Hadoop TeraSort and CloudBurst benchmarks. 2010; https://blogs.oracle.com/BestPerf/entry/20090920_x2270m2_hadoop.Google ScholarGoogle Scholar
  24. Sutter, H. 2008. Going superlinear. Dr. Dobb's Journal 33(3); http://www.drdobbs.com/cpp/going-superlinear/206100542.Google ScholarGoogle Scholar
  25. Sutter, H. 2008. Super linearity and the bigger machine. Dr. Dobb's Journal 33(4); http://www.drdobbs.com/parallel/super-linearity-and-the-bigger-machine/206903306.Google ScholarGoogle Scholar
  26. TechCrunch. 2015. AuroraTek tried to pitch us a gadget that breaks the laws of physics at CES; http://techcrunch.com/2015/01/08/auroratek-tried-to-pitch-us-a-gadget-that-breaks-the-laws-of-physics-at-ces/.Google ScholarGoogle Scholar
  27. White, T. 2012. Hadoop: The Definitive Guide. Storage and Analysis at Internet Scale, 3rd edition. O'Reilly Media, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yahoo! Hadoop Tutorial; https://developer.yahoo.com/hadoop/tutorial/module1.html#scalability.Google ScholarGoogle Scholar

Index Terms

  1. Hadoop Superlinear Scalability: The perpetual motion of parallel performance

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image Queue
              Queue  Volume 13, Issue 5
              Testing
              May 2015
              34 pages
              ISSN:1542-7730
              EISSN:1542-7749
              DOI:10.1145/2773212
              Issue’s Table of Contents

              Copyright © 2015 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 8 May 2015

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Popular
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader

            HTML Format

            View this article in HTML Format .

            View HTML Format