skip to main content
10.1145/1811212.1811219acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscopesConference Proceedingsconference-collections
research-article

Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

Published:28 June 2010Publication History

ABSTRACT

Embedded systems have successfully entered a broad variety of application domains such as automotive and industrial control, telecommunications, networking, digital media, consumer equipment, office automation and many more. In this paper we investigate if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compilers. We develop an automated approach that is capable of identifying domain-specific workload characterizations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarize key resource utilization issues and enable compiler engineers to address the highlighted bottlenecks. We have evaluated our methodology against the industrial EEMBC benchmark suite and three popular embedded processors and have found that workload profiles differ significantly between application domains. We demonstrate that these characteristics can be exploited for the development of domain-specific compiler optimizations. In a case study we show average performance improvements of up to 44% for a class of networking applications.

References

  1. F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. O'Boyle, J. Thomson, M. Toussaint, and C. Williams. Using machine learning to focus iterative optimization. In Proceedings of the 2006 International Symposium on Code Generation and Optimization (CGO), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Calder, D. Grunwald, M. Jones, D. Lindsay, J. Martin, M. Mozer, and B. Zorn. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems (TOPLAS), 19(1):188--222, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. F. P. O'Boyle, and O. Temam. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the 2007 International Symposium on Code Generation and Optimization (CGO), pages 185--197, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Chung, L. Benini, and G. De Micheli. Dynamic power management using adaptive learning tree. In ICCAD '99: Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. ACME: adaptive compilation made efficient, 2005.Google ScholarGoogle Scholar
  7. K. D. Cooper and T. Waterman. Investigating adaptive compilation using the MIPSPro compiler. In In Proc. of the Symp. of the Los Alamos Computer Science Institute, 2003.Google ScholarGoogle Scholar
  8. Y. Ding and K. Newman. Automatic workload characterization. In Proceedings of CMG '00, 2000.Google ScholarGoogle Scholar
  9. A. B. Downey and D. G. Feitelson. The elusive goal of workload characterization. ACM SIGMETRICS Performance Evaluation Review, 26(4):14--29, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Elakkumanan, L. Liu, V. Kumar Vankadara, and R. Sridhar. CHIDDAM: A data mining based technique for overcoming the memory bottleneck problem in commercial applications.Google ScholarGoogle Scholar
  11. K. Hoste and L. Eeckhout. Microarchitecture-independent workload characterization. The 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 27(3):63--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Hoste and L. Eeckhout. Characterizing the unique and diverse behaviors in existing and emerging general-purpose and domain-specific benchmark suites. 2008 IEEE International Symposium on Performance Analysis of Systems and software (ISPASS), pages 157--168, April 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Hoste and L. Eekhout. COLE: Compiler Optimization Level Exploration. In CGO '08: Proceedings of the International Symposium on Code Generation and Optimization, New York, NY, USA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John, and K. D. Bosschere. Performance prediction based on inherent program similarity. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. H. C. Hunter and W.-M. W. Hwu. Code coverage and input variability: effects on architecture and compiler research. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Joshi, A. Phansalkar, L. Eeckhout, and L. John. Measuring benchmark similarity using inherent program characteristics. IEEE Transactions on Computers, 55(6):769--782, June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y.-H. Lee and C. Chen. An efficient code generation algorithm for non-orthogonal DSP architecture. Journal of VLSI Signal Processing Systems, 47(3):281--296, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. M. Lin, Y. Chen, W. Li, Z. Tang, and A. Jaleel. Memory characterization of SPEC CPU2006 benchmark suite. In Proceedings of the 2008 Workshop for Computer Architecture Evaluation of Commerical Workloads (CAECW), 2008.Google ScholarGoogle Scholar
  19. D. Liu, B. Hua, X. Hu, and X. Tang. High-performance packet classification algorithm for many-core and multithreaded network processor. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129--137, 1982.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In Artificial Intelligence: Methodology, Systems, Applications, pages 389--409. Springer Verlag, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. O. I. Pentakalos, D. A. Menascz, and Y. Yesha. Automated clustering-based workload characterization. In In Proceedings of the 5th NASA Goddard Mass Storage Systems and Technologies Conference, 1996.Google ScholarGoogle Scholar
  24. A. Phansalkar, A. Joshi, L. Eeckhout, and L. John. Measuring program similarity: Experiments with SPEC CPU benchmark suites. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 10--20, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Poovey. Characterization of the EEMBC benchmark suite. http://www.eembc.org, 2007.Google ScholarGoogle Scholar
  26. J. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. E. E. Raatikainen. Cluster analysis and workload classification. SIGMETRICS Perform. Eval. Rev., 20(4):24--30, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. V. Sipkova. Efficient variable allocation to dual memory banks of DSPs. In Proceedings of the 7th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2003), 2003.Google ScholarGoogle ScholarCross RefCross Ref
  29. D. Skinner and W. Kramer. Understanding the causes of performance variability in HPC workloads. Proceedings of the 2005 IEEE International Symposium on Workload Characterization.Google ScholarGoogle Scholar
  30. M. Stephenson, S. Amarasinghe, and M. Martin. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. The Embedded Microprocessor Benchmark Consortium. EEMBC benchmarks. http://www.eembc.org, 2008.Google ScholarGoogle Scholar
  32. J. Thomson. Using Machine Learning to Automate Compiler Optimisation. PhD thesis, School of Informatics, University of Edinburgh, 2008.Google ScholarGoogle Scholar
  33. G.-R. Uh, Y. Wang, D. Whalley, S. Jinturkar, C. Burns, and V. Cao. Effective exploitation of a zero overhead loop buffer. 1999.Google ScholarGoogle Scholar
  34. U. von Luxburg and B. S. David. Towards a statistical theory of clustering. In PASCAL Workshop on Statistics and Optimization of Clustering, 2005.Google ScholarGoogle Scholar
  35. K. Yan. Characterization and classification of modern micro-processor benchmarks. Master's thesis, New Mexico State University, 2004.Google ScholarGoogle Scholar
  36. P. S. Yu, M. syan Chen, H.-U. Heiss, and S. Lee. On workload characterization of relational database environments. IEEE Transactions on Software Engineering, 18:347--355, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. Proceedings of the 2004 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 39(6):255--266, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SCOPES '10: Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
      June 2010
      91 pages
      ISBN:9781450300841
      DOI:10.1145/1811212

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 June 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate38of79submissions,48%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader