ABSTRACT
Embedded systems have successfully entered a broad variety of application domains such as automotive and industrial control, telecommunications, networking, digital media, consumer equipment, office automation and many more. In this paper we investigate if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compilers. We develop an automated approach that is capable of identifying domain-specific workload characterizations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarize key resource utilization issues and enable compiler engineers to address the highlighted bottlenecks. We have evaluated our methodology against the industrial EEMBC benchmark suite and three popular embedded processors and have found that workload profiles differ significantly between application domains. We demonstrate that these characteristics can be exploited for the development of domain-specific compiler optimizations. In a case study we show average performance improvements of up to 44% for a class of networking applications.
- F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. O'Boyle, J. Thomson, M. Toussaint, and C. Williams. Using machine learning to focus iterative optimization. In Proceedings of the 2006 International Symposium on Code Generation and Optimization (CGO), 2006. Google ScholarDigital Library
- C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, USA, 1995. Google ScholarDigital Library
- B. Calder, D. Grunwald, M. Jones, D. Lindsay, J. Martin, M. Mozer, and B. Zorn. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems (TOPLAS), 19(1):188--222, 1997. Google ScholarDigital Library
- J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. F. P. O'Boyle, and O. Temam. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the 2007 International Symposium on Code Generation and Optimization (CGO), pages 185--197, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
- E. Chung, L. Benini, and G. De Micheli. Dynamic power management using adaptive learning tree. In ICCAD '99: Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design. Google ScholarDigital Library
- K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. ACME: adaptive compilation made efficient, 2005.Google Scholar
- K. D. Cooper and T. Waterman. Investigating adaptive compilation using the MIPSPro compiler. In In Proc. of the Symp. of the Los Alamos Computer Science Institute, 2003.Google Scholar
- Y. Ding and K. Newman. Automatic workload characterization. In Proceedings of CMG '00, 2000.Google Scholar
- A. B. Downey and D. G. Feitelson. The elusive goal of workload characterization. ACM SIGMETRICS Performance Evaluation Review, 26(4):14--29, 1999. Google ScholarDigital Library
- P. Elakkumanan, L. Liu, V. Kumar Vankadara, and R. Sridhar. CHIDDAM: A data mining based technique for overcoming the memory bottleneck problem in commercial applications.Google Scholar
- K. Hoste and L. Eeckhout. Microarchitecture-independent workload characterization. The 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 27(3):63--72, 2007. Google ScholarDigital Library
- K. Hoste and L. Eeckhout. Characterizing the unique and diverse behaviors in existing and emerging general-purpose and domain-specific benchmark suites. 2008 IEEE International Symposium on Performance Analysis of Systems and software (ISPASS), pages 157--168, April 2008. Google ScholarDigital Library
- K. Hoste and L. Eekhout. COLE: Compiler Optimization Level Exploration. In CGO '08: Proceedings of the International Symposium on Code Generation and Optimization, New York, NY, USA, 2008. Google ScholarDigital Library
- K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John, and K. D. Bosschere. Performance prediction based on inherent program similarity. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 2006. Google ScholarDigital Library
- H. C. Hunter and W.-M. W. Hwu. Code coverage and input variability: effects on architecture and compiler research. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2002. Google ScholarDigital Library
- A. Joshi, A. Phansalkar, L. Eeckhout, and L. John. Measuring benchmark similarity using inherent program characteristics. IEEE Transactions on Computers, 55(6):769--782, June 2006. Google ScholarDigital Library
- Y.-H. Lee and C. Chen. An efficient code generation algorithm for non-orthogonal DSP architecture. Journal of VLSI Signal Processing Systems, 47(3):281--296, 2007. Google ScholarDigital Library
- J. M. Lin, Y. Chen, W. Li, Z. Tang, and A. Jaleel. Memory characterization of SPEC CPU2006 benchmark suite. In Proceedings of the 2008 Workshop for Computer Architecture Evaluation of Commerical Workloads (CAECW), 2008.Google Scholar
- D. Liu, B. Hua, X. Hu, and X. Tang. High-performance packet classification algorithm for many-core and multithreaded network processor. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). Google ScholarDigital Library
- S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129--137, 1982.Google ScholarDigital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2005. Google ScholarDigital Library
- A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In Artificial Intelligence: Methodology, Systems, Applications, pages 389--409. Springer Verlag, 2002. Google ScholarDigital Library
- O. I. Pentakalos, D. A. Menascz, and Y. Yesha. Automated clustering-based workload characterization. In In Proceedings of the 5th NASA Goddard Mass Storage Systems and Technologies Conference, 1996.Google Scholar
- A. Phansalkar, A. Joshi, L. Eeckhout, and L. John. Measuring program similarity: Experiments with SPEC CPU benchmark suites. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 10--20, 2005. Google ScholarDigital Library
- J. Poovey. Characterization of the EEMBC benchmark suite. http://www.eembc.org, 2007.Google Scholar
- J. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
- K. E. E. Raatikainen. Cluster analysis and workload classification. SIGMETRICS Perform. Eval. Rev., 20(4):24--30, 1993. Google ScholarDigital Library
- V. Sipkova. Efficient variable allocation to dual memory banks of DSPs. In Proceedings of the 7th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2003), 2003.Google ScholarCross Ref
- D. Skinner and W. Kramer. Understanding the causes of performance variability in HPC workloads. Proceedings of the 2005 IEEE International Symposium on Workload Characterization.Google Scholar
- M. Stephenson, S. Amarasinghe, and M. Martin. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2003. Google ScholarDigital Library
- The Embedded Microprocessor Benchmark Consortium. EEMBC benchmarks. http://www.eembc.org, 2008.Google Scholar
- J. Thomson. Using Machine Learning to Automate Compiler Optimisation. PhD thesis, School of Informatics, University of Edinburgh, 2008.Google Scholar
- G.-R. Uh, Y. Wang, D. Whalley, S. Jinturkar, C. Burns, and V. Cao. Effective exploitation of a zero overhead loop buffer. 1999.Google Scholar
- U. von Luxburg and B. S. David. Towards a statistical theory of clustering. In PASCAL Workshop on Statistics and Optimization of Clustering, 2005.Google Scholar
- K. Yan. Characterization and classification of modern micro-processor benchmarks. Master's thesis, New Mexico State University, 2004.Google Scholar
- P. S. Yu, M. syan Chen, H.-U. Heiss, and S. Lee. On workload characterization of relational database environments. IEEE Transactions on Software Engineering, 18:347--355, 1992. Google ScholarDigital Library
- Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. Proceedings of the 2004 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 39(6):255--266, 2004. Google ScholarDigital Library
Index Terms
- Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining
Recommendations
Performance Characterization of the Pentium® Pro Processor
HPCA '97: Proceedings of the 3rd IEEE Symposium on High-Performance Computer ArchitectureIn this paper, we characterize the performance of several business and technical benchmarks on a Pentium â Pro processor based system. Various architectural data are collected using a performance monitoring counter tool. Results show that the Pentium ...
Tuning Compiler Optimizations for Simultaneous Multithreading
Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part IISimultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions ...
Comments