research-article

Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

Authors:
Damon Fenacci

University of Edinburgh, United Kingdom

University of Edinburgh, United Kingdom
View Profile

,
Björn Franke

University of Edinburgh, United Kingdom

University of Edinburgh, United Kingdom
View Profile

,
John Thomson

University of Edinburgh, United Kingdom

University of Edinburgh, United Kingdom
View Profile

SCOPES '10: Proceedings of the 13th International Workshop on Software & Compilers for Embedded SystemsJune 2010Article No.: 5Pages 1–10https://doi.org/10.1145/1811212.1811219

Published:28 June 2010Publication History

SCOPES '10: Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems

Pages 1–10

ABSTRACT

Embedded systems have successfully entered a broad variety of application domains such as automotive and industrial control, telecommunications, networking, digital media, consumer equipment, office automation and many more. In this paper we investigate if there exist any fundamental differences between application domains that justify the development and tuning of domain-specific compilers. We develop an automated approach that is capable of identifying domain-specific workload characterizations and presenting them in a readily interpretable format based on decision trees. The generated workload profiles summarize key resource utilization issues and enable compiler engineers to address the highlighted bottlenecks. We have evaluated our methodology against the industrial EEMBC benchmark suite and three popular embedded processors and have found that workload profiles differ significantly between application domains. We demonstrate that these characteristics can be exploited for the development of domain-specific compiler optimizations. In a case study we show average performance improvements of up to 44% for a class of networking applications.

References

F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. O'Boyle, J. Thomson, M. Toussaint, and C. Williams. Using machine learning to focus iterative optimization. In Proceedings of the 2006 International Symposium on Code Generation and Optimization (CGO), 2006. Google ScholarDigital Library
C. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, USA, 1995. Google ScholarDigital Library
B. Calder, D. Grunwald, M. Jones, D. Lindsay, J. Martin, M. Mozer, and B. Zorn. Evidence-based static branch prediction using machine learning. ACM Transactions on Programming Languages and Systems (TOPLAS), 19(1):188--222, 1997. Google ScholarDigital Library
J. Cavazos, G. Fursin, F. Agakov, E. Bonilla, M. F. P. O'Boyle, and O. Temam. Rapidly selecting good compiler optimizations using performance counters. In Proceedings of the 2007 International Symposium on Code Generation and Optimization (CGO), pages 185--197, Washington, DC, USA, 2007. IEEE Computer Society. Google ScholarDigital Library
E. Chung, L. Benini, and G. De Micheli. Dynamic power management using adaptive learning tree. In ICCAD '99: Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design. Google ScholarDigital Library
K. D. Cooper, A. Grosul, T. J. Harvey, S. Reeves, D. Subramanian, L. Torczon, and T. Waterman. ACME: adaptive compilation made efficient, 2005.Google Scholar
K. D. Cooper and T. Waterman. Investigating adaptive compilation using the MIPSPro compiler. In In Proc. of the Symp. of the Los Alamos Computer Science Institute, 2003.Google Scholar
Y. Ding and K. Newman. Automatic workload characterization. In Proceedings of CMG '00, 2000.Google Scholar
A. B. Downey and D. G. Feitelson. The elusive goal of workload characterization. ACM SIGMETRICS Performance Evaluation Review, 26(4):14--29, 1999. Google ScholarDigital Library
P. Elakkumanan, L. Liu, V. Kumar Vankadara, and R. Sridhar. CHIDDAM: A data mining based technique for overcoming the memory bottleneck problem in commercial applications.Google Scholar
K. Hoste and L. Eeckhout. Microarchitecture-independent workload characterization. The 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 27(3):63--72, 2007. Google ScholarDigital Library
K. Hoste and L. Eeckhout. Characterizing the unique and diverse behaviors in existing and emerging general-purpose and domain-specific benchmark suites. 2008 IEEE International Symposium on Performance Analysis of Systems and software (ISPASS), pages 157--168, April 2008. Google ScholarDigital Library
K. Hoste and L. Eekhout. COLE: Compiler Optimization Level Exploration. In CGO '08: Proceedings of the International Symposium on Code Generation and Optimization, New York, NY, USA, 2008. Google ScholarDigital Library
K. Hoste, A. Phansalkar, L. Eeckhout, A. Georges, L. K. John, and K. D. Bosschere. Performance prediction based on inherent program similarity. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), 2006. Google ScholarDigital Library
H. C. Hunter and W.-M. W. Hwu. Code coverage and input variability: effects on architecture and compiler research. In Proceedings of the 2002 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), 2002. Google ScholarDigital Library
A. Joshi, A. Phansalkar, L. Eeckhout, and L. John. Measuring benchmark similarity using inherent program characteristics. IEEE Transactions on Computers, 55(6):769--782, June 2006. Google ScholarDigital Library
Y.-H. Lee and C. Chen. An efficient code generation algorithm for non-orthogonal DSP architecture. Journal of VLSI Signal Processing Systems, 47(3):281--296, 2007. Google ScholarDigital Library
J. M. Lin, Y. Chen, W. Li, Z. Tang, and A. Jaleel. Memory characterization of SPEC CPU2006 benchmark suite. In Proceedings of the 2008 Workshop for Computer Architecture Evaluation of Commerical Workloads (CAECW), 2008.Google Scholar
D. Liu, B. Hua, X. Hu, and X. Tang. High-performance packet classification algorithm for many-core and multithreaded network processor. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES). Google ScholarDigital Library
S. Lloyd. Least squares quantization in PCM. IEEE Transactions on Information Theory, 28(2):129--137, 1982.Google ScholarDigital Library
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2005. Google ScholarDigital Library
A. Monsifrot, F. Bodin, and R. Quiniou. A machine learning approach to automatic production of compiler heuristics. In Artificial Intelligence: Methodology, Systems, Applications, pages 389--409. Springer Verlag, 2002. Google ScholarDigital Library
O. I. Pentakalos, D. A. Menascz, and Y. Yesha. Automated clustering-based workload characterization. In In Proceedings of the 5th NASA Goddard Mass Storage Systems and Technologies Conference, 1996.Google Scholar
A. Phansalkar, A. Joshi, L. Eeckhout, and L. John. Measuring program similarity: Experiments with SPEC CPU benchmark suites. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 10--20, 2005. Google ScholarDigital Library
J. Poovey. Characterization of the EEMBC benchmark suite. http://www.eembc.org, 2007.Google Scholar
J. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1993. Google ScholarDigital Library
K. E. E. Raatikainen. Cluster analysis and workload classification. SIGMETRICS Perform. Eval. Rev., 20(4):24--30, 1993. Google ScholarDigital Library
V. Sipkova. Efficient variable allocation to dual memory banks of DSPs. In Proceedings of the 7th International Workshop on Software and Compilers for Embedded Systems (SCOPES 2003), 2003.Google ScholarCross Ref
D. Skinner and W. Kramer. Understanding the causes of performance variability in HPC workloads. Proceedings of the 2005 IEEE International Symposium on Workload Characterization.Google Scholar
M. Stephenson, S. Amarasinghe, and M. Martin. Meta optimization: Improving compiler heuristics with machine learning. In Proceedings of the 2003 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2003. Google ScholarDigital Library
The Embedded Microprocessor Benchmark Consortium. EEMBC benchmarks. http://www.eembc.org, 2008.Google Scholar
J. Thomson. Using Machine Learning to Automate Compiler Optimisation. PhD thesis, School of Informatics, University of Edinburgh, 2008.Google Scholar
G.-R. Uh, Y. Wang, D. Whalley, S. Jinturkar, C. Burns, and V. Cao. Effective exploitation of a zero overhead loop buffer. 1999.Google Scholar
U. von Luxburg and B. S. David. Towards a statistical theory of clustering. In PASCAL Workshop on Statistics and Optimization of Clustering, 2005.Google Scholar
K. Yan. Characterization and classification of modern micro-processor benchmarks. Master's thesis, New Mexico State University, 2004.Google Scholar
P. S. Yu, M. syan Chen, H.-U. Heiss, and S. Lee. On workload characterization of relational database environments. IEEE Transactions on Software Engineering, 18:347--355, 1992. Google ScholarDigital Library
Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. Proceedings of the 2004 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 39(6):255--266, 2004. Google ScholarDigital Library

Index Terms

Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining
1. General and reference
  1. Cross-computing tools and techniques
    1. Performance

Recommendations

Performance Characterization of the Pentium® Pro Processor
HPCA '97: Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture

In this paper, we characterize the performance of several business and technical benchmarks on a Pentium â Pro processor based system. Various architectural data are collected using a performance monitoring counter tool. Results show that the Pentium ...
Read More
Tuning Compiler Optimizations for Simultaneous Multithreading
Special issue on the 30th annual ACM/IEEE international symposium on microarchitecture, part II

Simultaneous Multithreading (SMT) is a processor architectural technique that promises to significantly improve the utilization and performance of modern wide-issue superscalar processors. An SM T processor is capable of issuing multiple instructions ...
Read More
Compiler optimizations for architectures supporting superword-level parallelism
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SCOPES '10: Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
June 2010
91 pages
ISBN:9781450300841
DOI:10.1145/1811212
General Chair:
Ed Deprettere
Leiden University, NL
,
Program Chair:
Todor Stefanov
Leiden University, NL
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data mining
decision trees
workload characterization
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate38of79submissions,48%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 228
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

SCOPES '10: Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance Characterization of the Pentium® Pro Processor

Tuning Compiler Optimizations for Simultaneous Multithreading

Compiler optimizations for architectures supporting superword-level parallelism

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

SCOPES '10: Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Performance Characterization of the Pentium® Pro Processor

Tuning Compiler Optimizations for Simultaneous Multithreading

Compiler optimizations for architectures supporting superword-level parallelism

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media