Welcome to the Tenth International Symposium on Code Generation and Optimization (CGO 2012). On behalf of the entire organizing committee, we wish you an enjoyable and enlightening conference experience in the heart of Silicon Valley. We hope you take advantage of the rare opportunity to interact with others who share an interest in technologies at the interface between software and hardware.
Proceeding Downloads
Compiling for niceness: mitigating contention for QoS in warehouse scale computers
As the class of datacenters recently coined as warehouse scale computers (WSCs) continues to leverage commodity multicore processors with increasing core counts, there is a growing need to consolidate various workloads on these machines to fully utilize ...
Compiling for automatically generated instruction set extensions
The automatic generation of instruction set extensions (ISEs) to provide application-specific acceleration for embedded processors has been a productive area of research in recent years. The use of automatic algorithms, however, results in instructions ...
Dynamic compilation of data-parallel kernels for vector processors
Modern processors enjoy augmented throughput and power efficiency through specialized functional units leveraged via instruction set extensions. These functional units accelerate performance for specific types of operations but must be programmed ...
Panacea: towards holistic optimization of MapReduce applications
MapReduce has emerged as one of the most popular programming models for data parallel enterprise applications. Despite advances in runtime, the opportunities for optimizing MapReduce applications remain largely unexplored. In this paper, we present a ...
WCET-aware static locking of instruction caches
In the past decades, embedded system designers moved from simple, predictable system designs towards complex systems equipped with caches. This step was necessary in order to bridge the increasingly growing gap between processor and memory system ...
Reconciling transactional conflicts with compiler's help
Software transactional memory(STM) is a promising programming paradigm for shared memory multithreaded programs. While STM offers the promise of being less error-prone and more programmer friendly compared to traditional lock-based synchronization, it ...
Micro-specialization: dynamic code specialization of database management systems
Database management systems (DBMSes) form a cornerstone of modern IT infrastructure, and it is essential that they have excellent performance. Much of the work to date on optimizing DBMS performance has emphasized ensuring efficient data access from ...
Scan detection and parallelization in "inherently sequential" nested loop programs
Most automatic parallelizers are based on detection of independent computations, and most of them cannot do anything if there is a true dependence between computations. However, this can be surmounted for programs that perform prefix computations (scans)...
HELIX: automatic parallelization of irregular programs for chip multiprocessing
We describe and evaluate HELIX, a new technique for automatic loop parallelization that assigns successive iterations of a loop to separate threads. We show that the inter-thread communication costs forced by loop-carried data dependences can be ...
Automatic speculative DOALL for clusters
Automatic parallelization for clusters is a promising alternative to time-consuming, error-prone manual parallelization. However, automatic parallelization is frequently limited by the imprecision of static analysis. Moreover, due to the inherent ...
HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores
- Ding-Yong Hong,
- Chun-Chen Hsu,
- Pen-Chung Yew,
- Jan-Jan Wu,
- Wei-Chung Hsu,
- Pangfeng Liu,
- Chien-Min Wang,
- Yeh-Ching Chung
Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation and security. However, there are several factors that often impede its performance: (1) emulation overhead ...
PinADX: an interface for customizable debugging with dynamic instrumentation
Dynamic binary instrumentation systems have become popular frameworks for building custom program analysis tools. For example, Pin [8], Valgrind [9], and DynamoRIO [5] have been used to build a variety of memory checking, thread checking, cache ...
DeadSpy: a tool to pinpoint program inefficiencies
Software systems often suffer from various kinds of performance inefficiencies resulting from data structure choice, lack of design for performance, and ineffective compiler optimization. Avoiding unnecessary operations, and in particular memory ...
Light-weight bounds checking
Memory errors in C and C++ programs continue to be one of the dominant sources of security problems, accounting for over a third of the high severity vulnerabilities reported in 2011. Wide-spread deployment of defenses such as address-space layout ...
Runtime asynchronous fault tolerance via speculation
Transient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software solutions. However, ...
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters
This paper develops and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with ...
Dynamically managed data for CPU-GPU architectures
GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers must ensure a consistent program state between the CPU and GPU memories by managing data. Manually managing data is tedious and error-prone. In ...
Phase guided profiling for fast cache modeling
Statistical cache models are powerful tools for understanding application behavior as a function of cache allocation. However, previous techniques have modeled only the average application behavior, which hides the effect of program variations over ...
Efficient and accurate data dependence profiling using software signatures
Speculative optimizations relax conservative constraints, like ambiguous memory-carried dependences that will rarely occur at runtime, to allow compilers to generate higher performing code. Data dependence profiling enables these techniques by providing ...
Using graph-based program characterization for predictive modeling
Using machine learning has proven effective at choosing the right set of optimizations for a particular program. For machine learning techniques to be most effective, compiler writers have to develop expressive means of characterizing the program being ...
Hierarchical overlapped tiling
This paper introduces hierarchical overlapped tiling, a transformation that applies loop tiling and fusion to conventional loops. Overlapped tiling is a useful transformation to reduce communication overhead, but it may also generate a significant ...
An automatic code overlaying technique for multicores with explicitly-managed memory hierarchies
The explicitly-managed memory hierarchies, where a hierarchy of distinct memories is exposed to the programmer and managed explicitly by software, are not only found in typical embedded processors but also found in a class of high performance multicore ...
Matching memory access patterns and data placement for NUMA systems
Many recent multicore multiprocessors are based on a nonuniform memory architecture (NUMA). A mismatch between the data access patterns of programs and the mapping of data to memory incurs a high overhead, as remote accesses have higher latency and ...
Deferred methods: accelerating dynamic program analysis on multicores
Parallelization is attractive for speeding up dynamic program analysis on multicores. However, inter-thread communication overhead may outweigh any benefit from parallel execution. We propose deferred methods, a high-level Java framework to accelerate ...
Efficient bottom-up heap analysis for symbolic path-based data access summaries
We propose a heap analysis for extracting data access summaries based on symbolic access paths (SAPs) of methods in object-oriented languages. The analysis takes advantage of the insight that typical programs access dynamic data structures in regular ...
On-demand dynamic summary-based points-to analysis
Static analyses can be typically accelerated by reducing redundancies. Modern demand-driven points-to or alias analysis techniques rest on the foundation of Context-Free Language (CFL) reachability. These techniques achieve high precision efficiently ...