Abstract
The compiler optimizations we enable and the order in which we apply them on a program have a substantial impact on the program execution time. Compilers provide default optimization sequences which can give good program speedup. As the default sequences have to optimize programs with different characteristics, they embed in them multiple subsequences which can optimize different classes of programs. These multiple subsequences may falsely interact with each other and affect the potential program speedup achievable. Instead of searching for a single universally optimal sequence, we can construct a small set of good sequences such that for every program class there exists a near-optimal optimization sequence in the good sequences set. If we can construct such a good sequences set which covers all the program classes in the program space, then we can choose the best sequence for a program by trying all the sequences in the good sequences set. This approach completely circumvents the need to solve the program classification problem. Using a sequence set size of around 10 we got an average speedup up to 14% on PolyBench programs and up to 12% on MiBench programs. Our approach is quite different from either the iterative compilation or machine-learning-based prediction modeling techniques proposed in the literature so far. We use different training and test datasets for cross-validation as against the Leave-One-Out cross-validation technique.
- Agakov, F., Bonilla, E., Cavazos, J., Franke, B., Fursin, G., O'Boyle, M. F. P., Thomson, J., Toussaint, M., and Williams, C. K. I. 2006. Using machine learning to focus iterative optimization. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'06). Google ScholarDigital Library
- Arnold, M., Fink, S., Grove, D., Hind, M., and Sweeney, P. 2005. A survey of adaptive optimization in virtual machines. Proc. IEEE 93, 2, 449--466.Google ScholarCross Ref
- Cavazos, J., Dubach, C., Agakov, F., Bonilla, E., O'Boyle, M. F. P., Fursin, G., and Temam, O. 2006. Automatic performance model construction for the fast software exploration of new hardware designs. In Proceedings of the 2006 International Conference onc Compilers, Architecture and Synthesis for Embedded Systems (CASES'06). ACM, New York, 24--34. Google ScholarDigital Library
- Cavazos, J., Fursin, G., Agakov, F., Bonilla, E., O'Boyle, M. F. P., and Temam, O. 2007. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'07). IEEE Computer Society, Washington, DC, 185--197. Google ScholarDigital Library
- Cavazos, J. and O'Boyle, M. F. P. 2006. Method-specific dynamic compilation using logistic regression. SIGPLAN Not. 41, 229--240. Google ScholarDigital Library
- Chen, Y., Huang, Y., Eeckhout, L., Fursin, G., Peng, L., Temam, O., and Wu, C. 2010. Evaluating iterative optimization across 1000 datasets. In Proceedings of the 2010 ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI'10). ACM, New York, 448--459. Google ScholarDigital Library
- Cooper, K. D., Grosul, A., Harvey, T. J., Reeves, S., Subramanian, D., Torczon, L., and Waterman, T. 2005. Acme: Adaptive compilation made efficient. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'05). Google ScholarDigital Library
- Cooper, K. D., Schielke, P. J., and Subramanian, D. 1999. Optimizing for reduced code space using genetic algorithms. SIGPLAN Not. 34, 1--9. Google ScholarDigital Library
- Cooper, K. D., Subramanian, D., and Torczon, L. 2002. Adaptive optimizing compilers for the 21st century. J. Supercomput. 23, 7--22. Google ScholarDigital Library
- Fursin, G., Kashnikov, Y., Memon, A. W., Chamski, Z., Temam, O., Namolaru, M., Yom-Tov, E., Mendelson, B., Zaks, A., Courtois, E., Bodin, F., Barnard, P., Ashton, E., Bonilla, E. V., Thomson, J., Williams, C. K. I., and O'Boyle, M. F. P. 2011. Milepost gcc: Machine learning enabled self-tuning compiler. Inte. J. Parallel Program. 39, 3, 296--327.Google ScholarCross Ref
- Fursin, G. and Temam, O. 2010. Collective optimization: A practical collaborative approach. ACM Trans. Archit. Code Optim. 7, 4, 20:1--20:29. Google ScholarDigital Library
- Gu, D. and Verbrugge, C. 2008. Phase-Based adaptive recompilation in a jvm. In Proceedings of the 6th annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'08). 24--34. Google ScholarDigital Library
- Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization. Google ScholarDigital Library
- Kisuki, T., Knijnenburg, P. M. W., and O'Boyle, M. F. P. 2000. Combined selection of tile sizes and unroll factors using iterative compilation. In Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT'00). IEEE Computer Society, Washington, DC, 237. Google ScholarDigital Library
- Kulkarni, P. A., Whalley, D. B., Tyson, G. S., and Davidson, J. W. 2006. Exhaustive optimization phase order space exploration. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'06). Google ScholarDigital Library
- Lattner, C. and Adve, V. 2004. Llvm: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO'04). Google ScholarDigital Library
- Leather, H., Bonilla, E., and O'Boyle, M. 2009. Automatic feature generation for machine learning based optimizing compilation. In Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'09). 81--91. Google ScholarDigital Library
- LLVM. 2012. http://llvm.org/docs/Passes.html.Google Scholar
- Pan, Z. and Eigenmann, R. 2006. Fast and effective orchestration of compiler optimizations for automatic performance tuning. In Proceedings of the International Symposium on Code Generation and Optimization (CGO'06). Google ScholarDigital Library
- Park, E., Kulkarni, S., and Cavazos, J. 2011. An evaluation of different modeling techniques for iterative compilation. In Proceedings of the 14th international conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES'11). ACM, New York, 65--74. Google ScholarDigital Library
- Polybench. 2012. http://www.cse.ohio-state.edu/˜pouchet/software/polybench/.Google Scholar
- Stephenson, M. and Amarasinghe, S. 2005. Predicting unroll factors using supervised classification. In Proceedings of the International Symposium on Code Generation and optimization (CGO'05). 123--134. Google ScholarDigital Library
- Thomson, J., O'Boyle, M. F. P., Fursin, G., and Franke, B. 2009. Reducing training time in a one-shot machine learning-based compiler. In Proceedings of the International Conference on Language and Compilers for Parallel Computing. 399--407. Google ScholarDigital Library
- Torczon, L. and Cooper, K. 2011. Engineering A Compiler, 2nd ed. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarDigital Library
- Touati, S.-A.-A. and Barthou, D. 2006. On the decidability of phase ordering problem in optimizing compilation. In Proceedings of the 3rd Conference on Computing Frontiers (CF'06). Google ScholarDigital Library
- Triantafyllis, S., Vachharajani, M., Vachharajani, N., and August, D. I. 2003. Compiler optimization-space exploration. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization (CGO'03). Google ScholarDigital Library
- Wang, Z. and O'Boyle, M. F. 2009. Mapping parallelism to multi-cores: A machine learning based approach. SIGPLAN Not. 44, 75--84. Google ScholarDigital Library
Index Terms
- Finding good optimization sequences covering program space
Recommendations
Particle Swarm Optimization Simulation via Optimal Halton Sequences
Inspired by the social behavior of the bird flocking or fish schooling, the particle swarm optimization (PSO) is a population based stochastic optimization method developed by Eberhart and Kennedy in 1995. It has been used across a wide range of ...
Finding the longest common subsequence for multiple biological sequences by ant colony optimization
Finding the longest common subsequence (LCS) for a set of n (an arbitrary n>2) sequences is an NP-hard problem. It is an essential operation for a wide range of applications in the areas of computational biology, pattern recognition, string editing and ...
Efficient algorithms for finding interleaving relationship between sequences
The longest common subsequence and sequence alignment problems have been studied extensively and they can be regarded as the relationship measurement between sequences. However, most of them treat sequences evenly or consider only two sequences. ...
Comments