Abstract
Algorithmic speculation or high-level speculation is a promising programming paradigm which allows programmers to speculatively branch an execution into multiple independent parallel sections and then choose the best (perhaps fastest) amongst them. The continuing execution after the speculatively branched section sees only the modifications made by the best one. This programming paradigm allows programmers to harness parallelism and can provide dramatic performance improvements.
In this paper we present the Multiverse speculative programming model. Multiverse allows programmers to exploit parallelism through high-level speculation. It can effectively harness large amounts of parallelism by speculating across an entire cluster and is not bound by the parallelism available in a single machine. We present abstractions and a runtime which allow programmers to introduce large scale high-level speculative parallelism into applications with minimal effort. We introduce a novel on-demand address space sharing mechanism which provide speculations efficient transparent access to the original address space of the application (including the use of pointers) across machine boundaries. Multiverse provides single commit semantics across speculations while guaranteeing isolation between them. We also introduce novel mechanisms to deal with scalability bottlenecks when there are a large number of speculations.
We demonstrate that for several benchmarks, Multiverse achieves impressive speedups and good scalability across entire clusters. We study the overheads of the runtime and demonstrate how our special scalability mechanisms are crucial in scaling cluster wide.
- H. Abdel-Shafi, E. Speight, and J. K. Bennett. Efficient user-level thread migration and checkpointing on windows nt clusters. In phProceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3, WINSYM'99, pages 1--1, Berkeley, CA, USA, 1999. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1268427.1268428. Google ScholarDigital Library
- J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and S. Amarasinghe. Petabricks: a language and compiler for algorithmic choice. In phProceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, PLDI '09, pages 38--49, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--392--1. 10.1145/1542476.1542481. URL http://doi.acm.org/10.1145/1542476.1542481. Google ScholarDigital Library
- G. Antoniu, L. Bougé, and R. Namyst. An efficient and transparent thread migration scheme in the pm2 runtime system. In phProceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing, pages 496--510, London, UK, UK, 1999. Springer-Verlag. ISBN 3--540--65831--9. URL http://dl.acm.org/citation.cfm?id=645611.662349. Google ScholarDigital Library
- E. D. Berger, T. Yang, T. Liu, and G. Novark. Grace: safe multithreaded programming for c/cGoogle Scholar
- . In phProceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, OOPSLA '09, pages 81--96, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--766-0. 10.1145/1640089.1640096. URL http://doi.acm.org/10.1145/1640089.1640096.Google Scholar
- Y. Caniou, D. Diaz, F. Richoux, P. Codognet, and S. Abreu. Performance analysis of parallel constraint-based local search. In phProceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 337--338, New York, NY, USA, 2012. ACM. ISBN 978--1--4503--1160--1. 10.1145/2145816.2145883. URL http://doi.acm.org/10.1145/2145816.2145883. Google ScholarDigital Library
- R. E. Cledat, T. Kumar, and S. Pande. Efficiently speeding up sequential computation through the n-way programming model. In phProceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, OOPSLA '11, pages 537--554, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0940-0. 10.1145/2048066.2048109. URL http://doi.acm.org/10.1145/2048066.2048109. Google ScholarDigital Library
- B. Cox, D. Evans, A. Filipi, J. Rowanhill, W. Hu, J. Davidson, J. Knight, A. Nguyen-Tuong, and J. Hiser. N-variant systems: a secretless framework for security through diversity. In phProceedings of the 15th conference on USENIX Security Symposium - Volume 15, USENIX-SS'06, Berkeley, CA, USA, 2006. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1267336.1267344. Google ScholarDigital Library
- H. Croft, K. Falconer, and R. Guy. phUnsolved problems in geometry. Problem books in mathematics. Springer-Verlag, 1991. ISBN 9780387975061. URL http://books.google.com/books?id=xF4PAQAAMAAJ.Google Scholar
- D. Diaz, P. Codognet, and S. Abreu. Adaptive search distribution, http://cri-dist.univ-paris1.fr/diaz/adaptive/.Google Scholar
- W. R. Dieter and J. E. Lumpp, Jr. User-level checkpointing for linuxthreads programs. In phProceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference, pages 81--92, Berkeley, CA, USA, 2001. USENIX Association. ISBN 1--880446--10--3. URL http://dl.acm.org/citation.cfm?id=647054.715766. Google ScholarDigital Library
- G. Fursin, A. Cohen, M. O'Boyle, and O. Temam. A practical method for quickly evaluating program optimizations. In phProceedings of the First international conference on High Performance Embedded Architectures and Compilers, HiPEAC'05, pages 29--46, Berlin, Heidelberg, 2005. Springer-Verlag. ISBN 3--540--30317-0, 978--3--540--30317--6. 10.1007/11587514_4. URL http://dx.doi.org/10.1007/11587514_4. Google ScholarDigital Library
- R. P. Garg and I. Sharapov. phTechniques for Optimizing Applications: High Performance Computing. Prentice Hall Professional Technical Reference, 2002. ISBN 0130091189. Google ScholarDigital Library
- S. Golomb and H. Taylor. Constructions and properties of costas arrays. phProceedings of the IEEE, 72 (9): 1143--1163, 1984. ISSN 0018--9219. 10.1109/PROC.1984.12994.Google Scholar
- L. Hammond, M. Willey, and K. Olukotun. Data speculation support for a chip multiprocessor. In phProceedings of the eighth international conference on Architectural support for programming languages and operating systems, ASPLOS VIII, pages 58--69, New York, NY, USA, 1998. ACM. ISBN 1--58113--107-0. 10.1145/291069.291020. URL http://doi.acm.org/10.1145/291069.291020. Google ScholarDigital Library
- T. Harris, J. Larus, and R. Rajwar. phTransactional Memory, 2nd Edition. Morgan and Claypool Publishers, 2nd edition, 2010. ISBN 1608452352, 9781608452354. Google ScholarDigital Library
- K. Helsgaun. An effective implementation of the linkernighan traveling salesman heuristic. phEuropean Journal of Operational Research, 126 (1): 106--130, 2000. ISSN 0377--2217. http://dx.doi.org/10.1016/S0377--2217(99)00284--2. URL http://www.sciencedirect.com/science/article/pii/S0377221799002842.Google ScholarCross Ref
- H. Hoos. phStochastic Local Search - Methods, Models, Applications. IOS Press. ISBN 9783898382151. URL http://books.google.com/books?id=xc\_4OtzmHUUC.Google Scholar
- P. Hosek and C. Cadar. Safe software updates via multi-version execution. In phProceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 612--621, Piscataway, NJ, USA, 2013. IEEE Press. ISBN 978-1-4673-3076-3. URL http://dl.acm.org/citation.cfm?id=2486788.2486869. Google ScholarDigital Library
- T. IEEE and T. O. Group. The open group base specifications, issue 6, ieee std 1003.1, 2004.Google Scholar
- C. Jung, D. Lim, J. Lee, and S. Han. Adaptive execution techniques for smt multiprocessor architectures. In phProceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '05, pages 236--246, New York, NY, USA, 2005. ACM. ISBN 1--59593-080--9. 10.1145/1065944.1065976. URL http://doi.acm.org/10.1145/1065944.1065976. Google ScholarDigital Library
- L. V. Kale and S. Krishnan. CharmGoogle Scholar
- : a portable concurrent object oriented system based on cGoogle Scholar
- . In phProceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications, OOPSLA '93, pages 91--108, New York, NY, USA, 1993. ACM. ISBN 0--89791--587--9. 10.1145/165854.165874. URL http://doi.acm.org/10.1145/165854.165874.Google Scholar
- A. P. Kamath, N. K. Karmarkar, K. G. Ramakrishnan, and M. G. C. Resende. A continuous approach to inductive inference. phMath. Program., 57 (2): 215--238, Nov. 1992. ISSN 0025--5610. 10.1007/BF01581082. URL http://dx.doi.org/10.1007/BF01581082. Google ScholarCross Ref
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In phProceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, PLDI '07, pages 211--222, New York, NY, USA, 2007. ACM. ISBN 978--1--59593--633--2. 10.1145/1250734.1250759. URL http://doi.acm.org/10.1145/1250734.1250759. Google ScholarDigital Library
- J. Lau, M. Arnold, M. Hind, and B. Calder. Online performance auditing: using hot optimizations without getting burned. In phProceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation, PLDI '06, pages 239--251, New York, NY, USA, 2006. ACM. ISBN 1--59593--320--4. 10.1145/1133981.1134010. URL http://doi.acm.org/10.1145/1133981.1134010. Google ScholarDigital Library
- M. Luby and W. Ertel. Optimal parallelization of las vegas algorithms. In phProceedings of the 11th Annual Symposium on Theoretical Aspects of Computer Science, STACS '94, pages 463--474, London, UK, UK, 1994. Springer-Verlag. ISBN 3--540--57785--8. URL http://dl.acm.org/citation.cfm?id=646510.695012. Google ScholarDigital Library
- M. Mitzenmacher and E. Upfal. phProbability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge University Press, 2005. ISBN 9780521835404. URL http://books.google.com/books?id=0bAYl6d7hvkC. Google ScholarDigital Library
- S. Negara, G. Zheng, K.-C. Pan, N. Negara, R. E. Johnson, L. V. Kalé, and P. M. Ricker. Automatic mpi to ampi program transformation using photran. In phProceedings of the 2010 conference on Parallel processing, Euro-Par 2010, pages 531--539, Berlin, Heidelberg, 2011. Springer-Verlag. ISBN 978--3--642--21877--4. URL http://dl.acm.org/citation.cfm?id=2031978.2032050. Google ScholarDigital Library
- AMD64 Architecture Programmer's Manual, Volume 2, 2010.Google Scholar
- P. S. Pacheco. phParallel programming with MPI. 1996. ISBN 1--55860--339--5. Google ScholarDigital Library
- P. Prabhu, G. Ramalingam, and K. Vaswani. Safe programmable speculative parallelism. In phProceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation, PLDI '10, pages 50--61, New York, NY, USA, 2010. ACM. ISBN 978--1--4503-0019--3. 10.1145/1806596.1806603. URL http://doi.acm.org/10.1145/1806596.1806603. Google ScholarDigital Library
- H. K. Pyla, C. Ribbens, and S. Varadarajan. Exploiting coarse-grain speculative parallelism. In phProceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, OOPSLA '11, pages 555--574, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0940-0. 10.1145/2048066.2048110. URL http://doi.acm.org/10.1145/2048066.2048110. Google ScholarDigital Library
- M. Rieker, J. Ansel, and G. Cooperman. Transparent user-level checkpointing for the native posix thread library for linux. In phThe 2006 International Conference on Parallel and Distributed Processing Techniques and Applications, Jun 2006.Google Scholar
- Rutgers. Dimacs benchmarks, http://dimacs.rutgers.edu/challenges/.Google Scholar
- B. Selman, H. A. Kautz, and B. Cohen. Noise strategies for improving local search. In phProceedings of the twelfth national conference on Artificial intelligence (vol. 1), AAAI '94, pages 337--343, Menlo Park, CA, USA, 1994. American Association for Artificial Intelligence. ISBN 0--262--61102--3. URL http://dl.acm.org/citation.cfm?id=199288.178090. Google ScholarDigital Library
- A. Shye, J. Blomstedt, T. Moseley, V. J. Reddi, and D. A. Connors. Plr: A software approach to transient fault tolerance for multicore architectures. phIEEE Trans. Dependable Secur. Comput., 6 (2): 135--148, Apr. 2009. ISSN 1545--5971. 10.1109/TDSC.2008.62. URL http://dx.doi.org/10.1109/TDSC.2008.62. Google ScholarDigital Library
- J. Steffan and T. Mowry. The potential for using thread-level data speculation to facilitate automatic parallelization. In phProceedings of the 4th International Symposium on High-Performance Computer Architecture, HPCA '98, pages 2--, Washington, DC, USA, 1998. IEEE Computer Society. ISBN 0--8186--8323--6. URL http://dl.acm.org/citation.cfm?id=822079.822712. Google ScholarDigital Library
- O. Trachsel and T. R. Gross. Variant-based competitive parallel execution of sequential programs. In phProceedings of the 7th ACM international conference on Computing frontiers, CF '10, pages 197--206, New York, NY, USA, 2010. ACM. ISBN 978--1--4503-0044--5. 10.1145/1787275.1787325. URL http://doi.acm.org/10.1145/1787275.1787325. Google ScholarDigital Library
- S. P. Vanderwiel and D. J. Lilja. Data prefetch mechanisms. phACM Comput. Surv., 32 (2): 174--199, June 2000. ISSN 0360-0300. 10.1145/358923.358939. URL http://doi.acm.org/10.1145/358923.358939. Google ScholarDigital Library
- M. J. Voss and R. Eigenmann. High-level adaptive program optimization with adapt. In phACM SIGPLAN Notices, pages 93--102. ACM Press, 2001. Google ScholarDigital Library
- C. M. Wintersteiger, Y. Hamadi, and L. Moura. A concurrent portfolio approach to smt solving. In phProceedings of the 21st International Conference on Computer Aided Verification, CAV '09, pages 715--720, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978--3--642-02657--7. 10.1007/978--3--642-02658--4_60. URL http://dx.doi.org/10.1007/978--3--642-02658--4_60. Google ScholarCross Ref
- W. Wulf and M. Shaw. Global variable considered harmful. phSIGPLAN Not., 8 (2): 28--34, Feb. 1973. ISSN 0362--1340. 10.1145/953353.953355. URL http://doi.acm.org/10.1145/953353.953355. Google ScholarDigital Library
- K. Xu and W. Li. Exact phase transitions in random constraint satisfaction problems. phJournal of Artificial Intelligence Research, 12: 93--103, 2000. Google ScholarDigital Library
- T.-Y. Yeh and Y. N. Patt. A comparison of dynamic branch predictors that use two levels of branch history. In phProceedings of the 20th annual international symposium on computer architecture, ISCA '93, pages 257--266, New York, NY, USA, 1993. ACM. ISBN 0--8186--3810--9. 10.1145/165123.165161. URL http://doi.acm.org/10.1145/165123.165161. Google ScholarDigital Library
Index Terms
- Multiverse: efficiently supporting distributed high-level speculation
Recommendations
Multiverse: efficiently supporting distributed high-level speculation
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsAlgorithmic speculation or high-level speculation is a promising programming paradigm which allows programmers to speculatively branch an execution into multiple independent parallel sections and then choose the best (perhaps fastest) amongst them. The ...
An evaluation of speculative instruction execution on simultaneous multithreaded processors
Modern superscalar processors rely heavily on speculative execution for performance. For example, our measurements show that on a 6-issue superscalar, 93% of committed instructions for SPECINT95 are speculative. Without speculation, processor resources ...
Tuning Compiler Optimizations for Simultaneous Multithreading
Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part IISimultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions ...
Comments