skip to main content
10.1145/2259016acmconferencesBook PagePublication PagescgoConference Proceedingsconference-collections
CGO '12: Proceedings of the Tenth International Symposium on Code Generation and Optimization
ACM2012 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
CGO '12: Annual IEEE/ACM International Symposium on Code Generation and Optimization San Jose California 31 March 2012- 4 April 2012
ISBN:
978-1-4503-1206-6
Published:
31 March 2012
Sponsors:
IEEE CS uArch, SIGPLAN, SIGMICRO

Bibliometrics
Skip Abstract Section
Abstract

Welcome to the Tenth International Symposium on Code Generation and Optimization (CGO 2012). On behalf of the entire organizing committee, we wish you an enjoyable and enlightening conference experience in the heart of Silicon Valley. We hope you take advantage of the rare opportunity to interact with others who share an interest in technologies at the interface between software and hardware.

Skip Table Of Content Section
SESSION: Compilation
research-article
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

As the class of datacenters recently coined as warehouse scale computers (WSCs) continues to leverage commodity multicore processors with increasing core counts, there is a growing need to consolidate various workloads on these machines to fully utilize ...

research-article
Compiling for automatically generated instruction set extensions

The automatic generation of instruction set extensions (ISEs) to provide application-specific acceleration for embedded processors has been a productive area of research in recent years. The use of automatic algorithms, however, results in instructions ...

research-article
Dynamic compilation of data-parallel kernels for vector processors

Modern processors enjoy augmented throughput and power efficiency through specialized functional units leveraged via instruction set extensions. These functional units accelerate performance for specific types of operations but must be programmed ...

SESSION: Optimization
research-article
Panacea: towards holistic optimization of MapReduce applications

MapReduce has emerged as one of the most popular programming models for data parallel enterprise applications. Despite advances in runtime, the opportunities for optimizing MapReduce applications remain largely unexplored. In this paper, we present a ...

research-article
WCET-aware static locking of instruction caches

In the past decades, embedded system designers moved from simple, predictable system designs towards complex systems equipped with caches. This step was necessary in order to bridge the increasingly growing gap between processor and memory system ...

research-article
Reconciling transactional conflicts with compiler's help

Software transactional memory(STM) is a promising programming paradigm for shared memory multithreaded programs. While STM offers the promise of being less error-prone and more programmer friendly compared to traditional lock-based synchronization, it ...

research-article
Micro-specialization: dynamic code specialization of database management systems

Database management systems (DBMSes) form a cornerstone of modern IT infrastructure, and it is essential that they have excellent performance. Much of the work to date on optimizing DBMS performance has emphasized ensuring efficient data access from ...

SESSION: Parallelization
research-article
Scan detection and parallelization in "inherently sequential" nested loop programs

Most automatic parallelizers are based on detection of independent computations, and most of them cannot do anything if there is a true dependence between computations. However, this can be surmounted for programs that perform prefix computations (scans)...

research-article
HELIX: automatic parallelization of irregular programs for chip multiprocessing

We describe and evaluate HELIX, a new technique for automatic loop parallelization that assigns successive iterations of a loop to separate threads. We show that the inter-thread communication costs forced by loop-carried data dependences can be ...

research-article
Automatic speculative DOALL for clusters

Automatic parallelization for clusters is a promising alternative to time-consuming, error-prone manual parallelization. However, automatic parallelization is frequently limited by the imprecision of static analysis. Moreover, due to the inherent ...

research-article
HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores

Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation and security. However, there are several factors that often impede its performance: (1) emulation overhead ...

SESSION: Dynamic instrumentation and error detection
research-article
PinADX: an interface for customizable debugging with dynamic instrumentation

Dynamic binary instrumentation systems have become popular frameworks for building custom program analysis tools. For example, Pin [8], Valgrind [9], and DynamoRIO [5] have been used to build a variety of memory checking, thread checking, cache ...

research-article
DeadSpy: a tool to pinpoint program inefficiencies

Software systems often suffer from various kinds of performance inefficiencies resulting from data structure choice, lack of design for performance, and ineffective compiler optimization. Avoiding unnecessary operations, and in particular memory ...

research-article
Light-weight bounds checking

Memory errors in C and C++ programs continue to be one of the dominant sources of security problems, accounting for over a third of the high severity vulnerabilities reported in 2011. Wide-spread deployment of defenses such as address-space layout ...

research-article
Runtime asynchronous fault tolerance via speculation

Transient faults are emerging as a critical reliability concern in modern microprocessors. Redundant hardware solutions are commonly deployed to detect transient faults, but they are less flexible and cost-effective than software solutions. However, ...

SESSION: GPU optimization
research-article
Auto-generation and auto-tuning of 3D stencil codes on GPU clusters

This paper develops and evaluates search and optimization techniques for auto-tuning 3D stencil (nearest-neighbor) computations on GPUs. Observations indicate that parameter tuning is necessary for heterogeneous GPUs to achieve optimal performance with ...

research-article
Dynamically managed data for CPU-GPU architectures

GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers must ensure a consistent program state between the CPU and GPU memories by managing data. Manually managing data is tedious and error-prone. In ...

SESSION: Profiling and program characterization
research-article
Phase guided profiling for fast cache modeling

Statistical cache models are powerful tools for understanding application behavior as a function of cache allocation. However, previous techniques have modeled only the average application behavior, which hides the effect of program variations over ...

research-article
Efficient and accurate data dependence profiling using software signatures

Speculative optimizations relax conservative constraints, like ambiguous memory-carried dependences that will rarely occur at runtime, to allow compilers to generate higher performing code. Data dependence profiling enables these techniques by providing ...

research-article
Using graph-based program characterization for predictive modeling

Using machine learning has proven effective at choosing the right set of optimizations for a particular program. For machine learning techniques to be most effective, compiler writers have to develop expressive means of characterizing the program being ...

SESSION: Memory management
research-article
Hierarchical overlapped tiling

This paper introduces hierarchical overlapped tiling, a transformation that applies loop tiling and fusion to conventional loops. Overlapped tiling is a useful transformation to reduce communication overhead, but it may also generate a significant ...

research-article
An automatic code overlaying technique for multicores with explicitly-managed memory hierarchies

The explicitly-managed memory hierarchies, where a hierarchy of distinct memories is exposed to the programmer and managed explicitly by software, are not only found in typical embedded processors but also found in a class of high performance multicore ...

research-article
Matching memory access patterns and data placement for NUMA systems

Many recent multicore multiprocessors are based on a nonuniform memory architecture (NUMA). A mismatch between the data access patterns of programs and the mapping of data to memory incurs a high overhead, as remote accesses have higher latency and ...

SESSION: Program analysis
research-article
Deferred methods: accelerating dynamic program analysis on multicores

Parallelization is attractive for speeding up dynamic program analysis on multicores. However, inter-thread communication overhead may outweigh any benefit from parallel execution. We propose deferred methods, a high-level Java framework to accelerate ...

research-article
Efficient bottom-up heap analysis for symbolic path-based data access summaries

We propose a heap analysis for extracting data access summaries based on symbolic access paths (SAPs) of methods in object-oriented languages. The analysis takes advantage of the insight that typical programs access dynamic data structures in regular ...

research-article
On-demand dynamic summary-based points-to analysis

Static analyses can be typically accelerated by reducing redundancies. Modern demand-driven points-to or alias analysis techniques rest on the foundation of Context-Free Language (CFL) reachability. These techniques achieve high precision efficiently ...

Contributors
  • Microsoft Corporation
  • Hewlett-Packard Inc.
  • Capital Markets CRC Limited
  • MIT Computer Science & Artificial Intelligence Laboratory

Recommendations

Acceptance Rates

CGO '12 Paper Acceptance Rate26of90submissions,29%Overall Acceptance Rate312of1,061submissions,29%
YearSubmittedAcceptedRate
CGO '171162622%
CGO '161082523%
CGO '15882427%
CGO '141002929%
CGO '12902629%
CGO '111052827%
CGO '09702637%
CGO '08662132%
CGO '07842732%
CGO '06802936%
CGO '05752635%
CGO '04792532%
Overall1,06131229%